All posts

A text speech app is any tool that lets you use your voice to produce text input — turning spoken words into characters that appear in your document, email, chat window, or any other text field. Despite how straightforward the concept sounds, there is meaningful variation in how these apps work, where they work, and how well they work. Finding the right one for your workflow requires understanding what the different approaches trade off against each other.

Two Architectures: In-App vs. System-Level

The most important architectural distinction in text speech apps is whether they work inside a single application or across your entire operating system.

In-App Text Speech

Some applications include voice input as a built-in feature. Google Docs Voice Typing is the best-known example — it works within the Docs editor and inserts dictated text at your cursor. Microsoft Word has a Dictate feature in the ribbon that does the same. Notion, Bear, and some other note-taking apps have microphone buttons that allow voice input within the app.

In-app voice input has the advantage of tight integration: the app may offer voice commands specific to its own functionality (formatting, navigation, document structure), and the voice input is usually activated from an obvious, discoverable location within the interface. The obvious disadvantage is that it only works in that one application. When you switch to your email client, your project management tool, or your messaging app, the voice input is gone and you are back to typing.

System-Level Text Speech

System-level text speech apps operate outside any individual application, at the operating system layer. They use a global hotkey or trigger that works regardless of which app is currently in focus. When you activate them, they listen for speech and insert the resulting text wherever your cursor currently is — in any application, any text field, on any screen.

This architecture is dramatically more useful for anyone who works across multiple applications throughout the day. You do not need to remember which apps have built-in voice input and which do not. You do not need to manage different voice input experiences in different apps. You learn one interaction model and it works everywhere.

macOS's built-in Dictation feature is a system-level text speech tool — it uses a keyboard trigger to activate voice input in any focused text field. Steno is another example, operating via a menu bar icon and a global hotkey that works across all Mac applications.

Evaluating a Text Speech App

Coverage: Where Does It Work?

The first question to ask is where the tool works. Does it work system-wide, or only in specific applications? If system-wide, does it work in all applications or just native macOS apps? Does it work in browser-based apps (Notion in Chrome, Gmail in Chrome) or only in standalone applications? These distinctions matter more than most product marketing acknowledges.

Accuracy on Your Vocabulary

General accuracy numbers are less useful than accuracy on your specific vocabulary. If you work in healthcare, your dictation needs to handle medical terminology accurately. If you are a software developer, the tool should handle technical terms, product names, and framework names correctly. Most tools allow you to test before committing — use a representative sample of your typical vocabulary to evaluate any candidate tool.

Latency

How quickly does text appear after you finish speaking? The best tools respond in under a second. Tools with multi-second latency feel disconnected in a way that breaks the flow of dictation. Latency depends on whether processing is on-device or server-side, and on the model size. Smaller, on-device models are faster but may be less accurate. Larger, server-side models may be more accurate but introduce network round-trip time.

Activation Model

Push-to-talk (hold a key while speaking) tends to produce the best results because you only send audio when you consciously choose to speak. Toggle mode (press once to start, press again to stop) is more prone to accidental recordings or forgetting to stop. Auto-detect (the tool listens continuously and tries to detect when you are speaking to it) is the most hands-free but can introduce false activations and accidental transcriptions.

Custom Vocabulary

Can you add custom terms — proper nouns, brand names, technical vocabulary — to improve accuracy on words the base model handles poorly? This is particularly important for professionals in specialized fields where domain vocabulary is high-frequency but unlikely to appear in general speech training data.

Text Speech Apps on Mac

macOS Dictation

Built-in, free, private (on-device), and works everywhere on the Mac. Accuracy is adequate for general vocabulary. Custom vocabulary can be added under System Settings > Keyboard > Dictation. Toggle activation. The main limitations are accuracy on technical terms and inconsistent behavior in some non-native applications.

Steno

A dedicated Mac text speech app optimized for knowledge worker workflows. Push-to-talk via a global hotkey, high accuracy via a cloud-based backend, smart reformatting (proper capitalization, punctuation, and formatting of numbers, dates, and addresses), and a searchable history of everything you dictated. Works in every application including Electron apps, browser-based tools, and all native Mac apps. Free tier available at stenofast.com.

Dragon Professional

The enterprise-grade dictation solution for Mac and Windows. Extremely high accuracy, especially after training on your voice. Expensive (several hundred dollars) and resource-intensive. Appropriate for professionals with demanding accuracy requirements — legal, medical, academic — who dictate for extended periods daily. Overkill for most knowledge workers who primarily want to replace routine typing.

Text Speech Apps on iPhone

iOS Built-In Dictation

Tap the microphone button on the iOS keyboard to activate voice input in any text field. Works across all iPhone apps. On-device processing since iOS 15, so it works offline and is private. Accuracy is good for standard English and many other languages. Auto-stops after a few seconds of silence, making it better for short inputs than long-form dictation.

Steno Keyboard for iPhone

The Steno iPhone keyboard app provides the same hold-to-speak push-to-talk model as the Mac version, with higher-accuracy transcription than the built-in iOS dictation. Install as a third-party keyboard and switch to it when you want to dictate longer content. Particularly useful for email composition and longer notes where the built-in dictation's auto-stop behavior is frustrating.

Choosing the Right Text Speech App for You

The decision largely comes down to your primary use case and how often you need to dictate:

Most people benefit from having two tools: a system-level dictation app for daily typing replacement, and an in-app tool for longer sessions in their primary writing environment. The combination covers nearly all use cases without significant overlap.

The right text speech app is the one you actually use. Accuracy matters less than friction — a slightly less accurate tool that you reach for automatically beats a slightly more accurate one that requires navigating to a specific application first.