All posts

There are more ways to turn speech into text today than at any point in history — and the quality gap between the best and worst approaches has never been larger. If you are evaluating your options, whether for personal productivity, professional documentation, or accessibility purposes, understanding what each approach actually offers helps you make a decision you will not need to revisit in six months.

This guide covers every major method for converting speech to text, from the tools built into your operating system to purpose-built dictation apps, with an honest assessment of where each one shines and where it falls short.

Method 1: Built-In OS Dictation

Both macOS and iOS include native speech-to-text capabilities that require no additional software or accounts. On a Mac, you can activate system dictation with a keyboard shortcut (double-press Fn by default) and begin speaking. On iPhone, tapping the microphone icon on the keyboard activates on-device voice input.

The main advantages of built-in OS dictation are convenience and privacy — on recent devices, much of the processing happens on-device, which means your audio does not leave your machine. Apple's on-device speech recognition has improved substantially, handling everyday vocabulary with reasonable accuracy.

The limitations: accuracy degrades significantly on specialized vocabulary, there is limited customization available, the activation mechanism is less ergonomic than a dedicated hotkey, and there is no cross-device workflow. If you dictate heavily and professionally, built-in OS tools are a starting point but rarely the destination.

Method 2: In-App Dictation Features

Many applications include their own voice input features. Microsoft Word has a Dictate button. Google Docs has Voice Typing. Notion, some email clients, and various other productivity apps have integrated speech input.

These in-app features are convenient for users of those specific applications and often work well within their native context. The problem is fragmentation. Each app has its own dictation activation, its own quality level, its own vocabulary handling, and its own interface. Switching between apps means switching between different dictation experiences. There is no muscle memory that carries across your workflow.

In-app dictation is best for users who do nearly all of their writing in a single application and want minimal setup. For anyone whose work spans multiple apps — which is most knowledge workers — it creates unnecessary friction.

Method 3: Browser-Based Voice Tools

Web-based speech-to-text tools let you open a URL, click a button, speak, and copy the resulting transcript. They require no installation and work on any operating system with a compatible browser.

These tools are useful for one-off transcription tasks, quick experiments, and environments where installing software is not possible. Their limitations for regular use are substantial: they only work inside the browser, they require a copy-paste step to move text to where you actually need it, browser audio latency makes real-time feedback sluggish, and there is no integration with your existing workflows.

Method 4: Transcription File Services

For turning recorded audio into text — meeting recordings, interviews, podcast episodes, voice memos — file-based transcription services are the right tool. You upload an audio or video file, the service processes it, and returns a transcript, often with speaker diarization and timestamps.

This approach is not for live dictation. There is inherent latency in the upload-process-download cycle. But for post-facto transcription of recorded content, file services offer high accuracy, rich output formatting, and time-coding that makes it easy to navigate long recordings.

Method 5: Purpose-Built Dictation Apps

Dedicated dictation applications designed for professional use represent the most capable approach for high-volume voice typing. These tools operate at the system level, work in every application through a universal hotkey, optimize the audio pipeline for minimum latency, and offer customization features like personal vocabulary lists, voice profiles, and smart formatting.

The quality difference between a purpose-built dictation app and a browser or in-app tool is most apparent at high speaking speeds, with technical vocabulary, and in noisy environments. Dedicated tools are built around the specific challenge of reliably converting speech to text at professional quality and speed.

Steno is built around this premise. It runs as a lightweight Mac menu bar app and an iPhone keyboard extension, activated by a single hotkey, and types text directly into whatever application you are using. There is no workflow change required — you just start using your voice in the same places you would otherwise type. The result is a consistent, fast, accurate dictation experience that becomes invisible in daily use.

Method 6: Voice Assistants for Text Entry

Siri, Google Assistant, and similar voice assistants can technically transcribe speech, but they are optimized for command-response interactions rather than extended dictation. They tend to perform poorly for long-form text entry, do not handle professional vocabulary well, and often attempt to interpret dictation as commands rather than text to be recorded verbatim.

Voice assistants are not the right tool for turning speech into text for professional writing, documentation, or any use case requiring more than a sentence or two of dictation.

Choosing the Right Method for Your Situation

The decision is simpler than the options make it appear. Ask yourself two questions: How often do I need to dictate? And how many different applications do I dictate into?

If you dictate occasionally and primarily in one app, built-in or in-app dictation is probably fine. If you dictate regularly across multiple applications, a purpose-built app like Steno will give you significantly better results with less friction. If you need to transcribe audio recordings rather than live speech, a file transcription service is the right tool.

The most common mistake is trying to use a one-off browser tool as a regular workflow tool, or using an in-app feature when you need something that works across your entire workflow. Match the tool to the actual use case and you will avoid most of the frustration that gives voice-to-text a bad reputation among people who have tried it half-heartedly.

The best way to turn speech into text is the one that integrates so smoothly into your existing workflow that you forget there is a tool in between you and the words on screen.

Download Steno free at stenofast.com and experience system-wide voice input on your Mac in under a minute. For a deeper look at what makes real-time dictation feel instant, see our article on real-time speech to text.