Why We Built Steno as a Native macOS App

All posts

When we started building Steno, the first major decision was the technology stack. We could have used Electron and shipped to macOS, Windows, and Linux from a single codebase. We could have built a web app that runs in the browser. Instead, we chose Swift and native macOS frameworks, which meant shipping on one platform only. This post explains why that tradeoff was not just acceptable but essential for the product we wanted to build.

Memory: 30 MB vs 300+ MB

Steno idles at roughly 25 to 30 MB of memory. For an Electron app of comparable complexity, the baseline is 150 to 300 MB or more, because every Electron app bundles an entire Chromium browser engine. For a menu bar utility that runs all day, this difference is not trivial.

Your Mac has a finite amount of RAM. Every megabyte consumed by a background utility is a megabyte unavailable to the apps you are actually working in. When you have multiple Electron-based tools running simultaneously (and most people do), the memory overhead compounds quickly. Steno's low memory footprint means it essentially disappears into the background. You can run it alongside memory-hungry tools like your browser, Slack, and your code editor without contributing to the pressure that triggers macOS memory compression and swap.

CPU Efficiency and Battery Impact

A dictation app has a unique performance profile. It is idle most of the time, with short bursts of intense activity when you are actually recording and processing audio. For the idle periods, the app should use as close to zero CPU as possible. Native apps achieve this naturally because the system scheduler can fully deprioritize them. There is no JavaScript engine running idle timers, no garbage collector waking up periodically, no Chromium render loop consuming cycles to draw a UI nobody is looking at.

During the active phase (recording audio), CPU efficiency matters for a different reason: battery life. Audio processing is inherently real-time. Every buffer must be captured without gaps or underruns. A native audio pipeline built on AVAudioEngine does this with minimal CPU overhead because it interfaces directly with Core Audio, the same low-level framework that powers GarageBand and Logic Pro. An Electron app accessing audio through the Web Audio API adds multiple abstraction layers, each with its own CPU cost.

On a MacBook, these differences translate directly to battery impact. Steno barely registers on the Activity Monitor's energy tab. Users routinely tell us they forget the app is running, which is exactly the goal. A background tool that drains your battery is a background tool you eventually disable.

Instant Launch Time

Steno launches in under a second, including the time to set up the audio engine and register the global hotkey. This matters because Steno is configured to launch at login. Every time you restart your Mac, every app in your login items adds to the time before your machine is usable. Native apps launch fast because there is no runtime to bootstrap. There is no JavaScript bundle to parse, no Chromium to initialize, no Node.js process to spin up. The binary loads, the Swift runtime initializes, and the app is ready.

This is not just about the clock on launch day. Fast startup means the app can be quit and restarted without friction. If you need to restart Steno for any reason, it is back in your menu bar before you finish clicking.

Native Accessibility API Access

This is the most important technical reason for going native, and it is the one that would be hardest to replicate in any cross-platform framework.

Steno inserts text by using the macOS Accessibility API to type characters directly into the focused application. This is not clipboard pasting. It is programmatic keystroke insertion that works identically to physical keyboard input from the target app's perspective. The text appears character by character, respects the app's input handling, and does not touch the user's clipboard.

The Accessibility API is a native macOS C and Objective-C API. Calling it from Swift is direct and zero-overhead. Calling it from Electron or a web wrapper requires bridging through Node.js native modules, which adds latency, complexity, and fragility. Some Electron apps achieve this, but with significantly more code and less reliability. For a feature that runs on every single dictation, zero-overhead access is not a nice-to-have. It is a requirement.

Native Audio Pipeline

Audio recording in Steno uses AVAudioEngine, which provides direct access to the hardware audio input with minimal latency. The audio buffer format, sample rate, and channel configuration are controlled precisely. There is no abstraction mismatch between what the recording API provides and what the speech recognition backend expects.

In contrast, web-based audio capture through the MediaStream API or Web Audio API goes through additional layers of abstraction. These APIs are designed for browser sandboxes where security constraints take priority over latency. They work, but they add overhead that a native app does not need to pay.

For a dictation app, audio quality is not a subjective preference. It directly affects transcription accuracy. Cleaner audio capture means better recognition results. Every unnecessary layer between the microphone and the speech recognition engine is an opportunity to degrade signal quality or introduce latency.

Deep macOS Integration

Steno integrates with macOS in ways that cross-platform frameworks struggle to replicate well:

Menu bar residence. The app lives in the system menu bar as a native NSStatusItem. The popover interface follows system conventions for appearance, positioning, and behavior. It feels like a first-party macOS utility.
System permissions. macOS requires explicit user consent for microphone access and accessibility. Steno requests these through the native permission dialogs and handles permission state changes correctly. Cross-platform frameworks often have incomplete or delayed support for new permission requirements that Apple introduces.
Global hotkey registration. The CGEvent tap that powers the hold-to-speak activation is a low-level system API. It integrates cleanly from Swift, with full support for modifier keys, key repeat behavior, and event filtering.
Login item support. Steno registers as a login item through the native ServiceManagement framework, which means it appears correctly in System Settings and respects the user's startup preferences.
System appearance. The UI automatically adapts to the user's appearance settings, including light and dark mode, accent colors, and accessibility preferences like reduced motion.

The Tradeoff: macOS-Only

The obvious cost of going native is platform exclusivity. Steno runs on macOS and only macOS. There is no Windows version, no Linux version, and no web version. This means a significant portion of potential users are excluded.

We accepted this tradeoff deliberately. A dictation app that runs all day in your menu bar, captures audio, and inserts text into other applications is deeply coupled to the operating system. Building it natively on one platform lets us deliver an experience that feels like it belongs on the system. Building it cross-platform would mean compromises in performance, reliability, and integration that directly affect the core value proposition.

We would rather build something excellent for one platform than something mediocre for three. If demand warrants it, we will consider native builds for other platforms in the future. But each would be built natively for that platform, not ported through a compatibility layer.

The Bottom Line

Every design decision in Steno serves one goal: make voice-to-text feel instantaneous and invisible. Native development is not glamorous. It does not produce impressive cross-platform demos. What it produces is an app that uses 30 MB of memory, launches in under a second, captures audio through the fastest available pipeline, and inserts text with zero side effects. For a tool that runs quietly in the background all day, every day, those numbers are the only ones that matter.