A Dictation App That Works Everywhere on Mac: How Steno Uses Accessibility APIs

All posts

You have probably had this experience: you find a dictation tool that seems promising, install it, start using it, and then discover it does not work in the application you actually need it in. Maybe it only functions inside a browser. Maybe it works in TextEdit but not in your IDE. Maybe it fails silently in Electron apps like Slack, Notion, or VS Code. This compatibility gap is the single biggest reason people abandon dictation tools, and it is the problem Steno was specifically built to solve.

Steno works in every application on your Mac. Not most applications. Not just Apple's applications. Every single one. It achieves this through macOS Accessibility APIs, the same system-level framework that powers screen readers and other assistive technologies. This article explains how that works, why it matters, and what it means for your daily workflow.

The Compatibility Problem with Other Dictation Tools

To understand why Steno's approach is different, it helps to understand how other dictation tools insert text and where they fail.

Browser-Based Tools

Services like Otter.ai, Speechnotes, and Google Docs voice typing run inside a web browser. They capture audio through the browser's Web Audio API and display transcribed text in a web page. To use the text elsewhere, you have to copy it from the browser and paste it into your target application. This copy-paste workflow adds friction to every single dictation, and it means you cannot dictate directly into the application where your text belongs.

Clipboard-Based Desktop Apps

Some desktop dictation tools transcribe your speech and then paste the result using the system clipboard (Cmd+V). This approach works across more applications than browser tools, but it has a critical flaw: it overwrites your clipboard contents. If you had something important copied, it is gone. It also fails in applications that intercept paste events or have custom paste behavior, and it can trigger unintended formatting when pasting into rich text editors.

Apple Dictation

Apple's built-in dictation uses the macOS text input system (NSTextInputClient protocol). This works well in applications built with Apple's frameworks (AppKit and SwiftUI) but is inconsistent in Electron apps, Java applications, and other cross-platform software. Many developers have experienced Apple Dictation failing in VS Code, IntelliJ, or other non-native applications.

Steno's Approach: Accessibility APIs

Steno takes a fundamentally different approach. Instead of using the clipboard or the text input system, Steno uses macOS Accessibility APIs to simulate keyboard events at the system level. When Steno inserts transcribed text, it generates the same events that your physical keyboard produces when you press keys. From the perspective of the receiving application, there is no difference between text typed by your fingers and text inserted by Steno.

How Accessibility APIs Work

macOS includes a comprehensive Accessibility framework designed to support assistive technologies like screen readers (VoiceOver), switch controls, and alternative input devices. This framework provides APIs that allow applications to read the state of the user interface, interact with UI elements, and simulate user input events.

Steno uses a specific subset of these APIs: the ability to post keyboard events to the system event stream. When you release the Steno hotkey after speaking, the transcribed text is converted into a sequence of keyboard events and posted to the system. The frontmost application receives these events exactly as it would receive events from a physical keyboard.

This approach has several important properties:

Universal compatibility. Any application that accepts keyboard input works with Steno. This includes native Mac apps, Electron apps, Java apps, games, terminal emulators, virtual machines, and remote desktop clients.
No clipboard interference. Your clipboard contents are never touched. Whatever you had copied before dictating remains available for pasting after.
Correct input handling. Because the text arrives as keyboard events, applications process it through their normal input pipelines. Auto-complete, spell check, and input methods all work as expected.
No formatting surprises. Keyboard events carry only plain text, so you never get unexpected rich text formatting, font changes, or style inheritance from clipboard paste operations.

The Permission Model

Because Accessibility APIs are powerful (they can control any application on the system), macOS requires explicit user permission before an app can use them. When you first install Steno, it asks you to grant Accessibility permission in System Settings. This is a one-time setup step, and it is the same permission that tools like Alfred, Karabiner, and other keyboard utilities require.

The permission prompt might give some users pause, but it is actually a sign of a well-designed system. macOS is protecting you by ensuring that only applications you explicitly trust can simulate keyboard input. Steno needs this permission to function, and Apple's gatekeeper ensures it cannot be granted silently or without your knowledge.

Where This Matters in Practice

The universal compatibility of Accessibility-based text insertion matters most in the applications where other dictation tools fail. Here are specific examples from real user workflows.

Code Editors and IDEs

VS Code, IntelliJ, Sublime Text, and other development environments are where many knowledge workers spend hours daily. These applications often have custom text input handling that breaks Apple Dictation and clipboard-based tools. Steno's keyboard event approach works perfectly because the editor simply sees keystrokes. Developers use Steno for writing comments, documentation strings, commit messages, code review feedback, and pull request descriptions.

Terminal Emulators

iTerm2, Terminal.app, and other shell environments are notoriously incompatible with dictation tools. The terminal expects raw keyboard input and does not support the rich text input protocols that Apple Dictation relies on. Steno works seamlessly in terminals because it generates the same raw keyboard events that your physical keyboard does. This is useful for typing long commands, writing git commit messages, or composing email in terminal-based clients like mutt or neomutt.

Electron Applications

Slack, Notion, Discord, Figma, Obsidian, and dozens of other popular Mac applications are built on Electron, a framework that wraps web technologies in a desktop shell. Electron's text input handling is inconsistent with macOS conventions, which is why Apple Dictation sometimes fails in these apps. Steno bypasses Electron's input layer entirely by posting keyboard events at the system level, making it reliable in every Electron app.

Design and Creative Tools

Figma, Sketch, Adobe Creative Suite, and other design tools often have text input fields embedded in complex canvas-based interfaces. Dictation tools that rely on standard text input protocols cannot find or interact with these embedded text fields. Steno does not need to find text fields. It simply generates keyboard events that reach whatever input context is currently focused.

Virtual Machines and Remote Desktop

If you run Windows or Linux in a virtual machine (Parallels, VMware Fusion, UTM) or connect to remote machines (Microsoft Remote Desktop, Jump Desktop), keyboard events from Steno pass through to the guest operating system. You can effectively dictate into applications running inside a VM or on a remote server, something no clipboard-based or text-input-based tool can do reliably.

The Technical Details

For readers interested in the implementation: Steno uses CGEvent APIs from the Core Graphics framework to create and post keyboard events. Each character in the transcribed text is converted to a key-down and key-up event pair using the appropriate virtual key code and modifier flags. Unicode characters that do not map to standard US keyboard keys are handled through the macOS Unicode input method.

The events are posted to the system event stream using CGEvent.post(), which injects them at the same point in the event pipeline where hardware keyboard events enter. This means they pass through all the same processing stages: key remapping, input methods, application-level key bindings, and text input handling.

The result is indistinguishable from physical typing at the application level. No special support from the receiving application is needed. If it accepts keyboard input, it works with Steno.

Why "Works Everywhere" Changes the Habit

The practical impact of universal compatibility is not just convenience. It is what determines whether dictation becomes a habit or remains an occasional experiment. If you have to think about whether dictation will work in your current application, you will default to typing. If you know it works everywhere without exception, you build the muscle memory of reaching for the hotkey whenever you have text to enter.

This is the difference between a dictation app and a dictation habit. Steno works everywhere on your Mac, and that universal reliability is what turns voice-to-text from a feature into a workflow.

Download Steno from stenofast.com and try it in the applications where other dictation tools have failed you. Hold the hotkey, speak, release, and watch the text appear exactly where you need it. Every app. Every time.