From Speech to Text: A Complete Guide to Voice Dictation on Mac and iPhone

All posts

The journey from speech to text has never been smoother. In 2026, the technology has reached a level of maturity where voice dictation is a genuine productivity upgrade rather than a novelty — but using it effectively still requires understanding which tool fits which workflow, and how to build habits that stick. This guide covers everything you need to go from curious to confident with voice dictation.

Why Speech to Text Has a Compound Return

The most obvious benefit of converting speech to text is speed. The average person types 40-60 words per minute under normal conditions. The average speaking rate is 130-150 words per minute. This means dictation can produce text two to three times faster than typing, all else being equal.

But the real advantage is not the raw speed multiplier — it is the cognitive effect. Typing forces you to hold multiple things in mind simultaneously: the idea you want to express, the sentence structure you are building, and the physical act of finding and pressing keys. This cognitive load fragments thinking. Dictation externalizes the physical layer entirely, letting your mind focus on ideas and language rather than motor commands.

People who adopt voice dictation consistently report not just faster writing but qualitatively different writing. Their first drafts have more energy, more natural voice, and more complete thinking. The act of speaking, which is the same cognitive channel as spontaneous conversation, bypasses the self-editing that makes typed first drafts stilted and tentative.

The Steps: How Speech Becomes Text

Understanding what happens between your voice and the text on screen helps set realistic expectations and troubleshoot issues when they arise.

Step 1: Audio Capture

Your microphone converts sound waves into a digital audio signal. The quality of this capture — microphone quality, distance from your mouth, background noise level — significantly affects the accuracy of every subsequent step. A close-microphone setup (AirPods, a headset, or a quality USB mic) produces dramatically cleaner audio than speaking across a room to a laptop's built-in microphone.

Step 2: Audio Processing

Before speech recognition, the audio signal is preprocessed — noise reduction, normalization, and acoustic modeling help clean up the signal. This processing can happen on-device or in the cloud depending on the tool.

Step 3: Speech Recognition

The core AI model analyzes the audio and produces a probability distribution over possible words for each sound segment. Context from surrounding words refines these probabilities, resolving ambiguities like "to," "too," and "two" based on grammatical context.

Step 4: Text Insertion or Post-Processing

The recognized text is either inserted directly into the active application or passed through a post-processing step that cleans it up — removing filler words, fixing formatting, and polishing the output before insertion. Steno's Smart Rewrite feature handles this post-processing step, producing cleaner text than raw transcription output.

Building Your First Speech-to-Text Workflow

Choose Your Primary Use Case

Before downloading anything, identify the one or two contexts where you most want to speak instead of type. Common starting points include email replies, daily notes or journaling, meeting notes, and first drafts of documents. Starting with one focused use case builds the habit faster than trying to dictate everything at once.

Set Up the Tool

On Mac, Steno installs as a menu bar app and is ready to use in under a minute. Download it from stenofast.com, set your preferred hotkey, and you are ready to dictate in any application. On iPhone, install the keyboard extension from the App Store, enable it in Settings, and hold the microphone button to dictate.

Practice with Low-Stakes Text First

Start with notes to yourself, quick reminders, or informal messages. The first few dictation sessions feel slightly awkward because you are simultaneously speaking and monitoring the transcription for errors. This monitoring attention goes away quickly — within a few sessions, most people stop consciously tracking the transcription and just speak, glancing occasionally to verify accuracy.

Develop Your Dictation Style

Effective dictation sounds different from natural conversation. Short pauses between ideas help the system segment sentences correctly. Clear pronunciation of word endings reduces errors. Speaking at a moderate pace rather than rushing produces more accurate output. These adjustments happen naturally with practice — you do not need to consciously work on them.

From Speech to Text: Platform-Specific Notes

Mac

On Mac, the hold-to-speak model that Steno uses is the most practical for mixed typing-and-dictating workflows. Hold the hotkey, speak, release. The key is down only while you are speaking, which makes it obvious from physical sensation when you are and are not dictating. This eliminates the accidental dictation problem that affects toggle-based systems.

iPhone

On iPhone, keyboard space is limited and precise targeting of small buttons is difficult. Steno's custom keyboard makes the dictation trigger much easier to hit and hold reliably compared to the standard iOS dictation microphone. This matters especially when trying to dictate while holding the phone in one hand.

Common Mistakes and How to Avoid Them

Trying to dictate everything immediately. Building the habit gradually — starting with one specific context and expanding — leads to faster, more durable adoption than an all-or-nothing approach.

Using a poor microphone. The single biggest impact on accuracy after choosing the tool is microphone quality and placement. AirPods in your ears while at your desk produce dramatically better results than a built-in laptop microphone across the room.

Expecting zero corrections. Even at 95%+ accuracy, you will occasionally need to correct a word. This is normal and still represents a large net time saving compared to typing everything. Calibrate expectations to "fewer corrections than I expected" rather than "zero corrections."

Dictating in run-on streams. Sentences and natural pauses help the recognition system correctly segment and format output. Speak in complete thoughts, pause briefly between ideas, and avoid long unbroken monologues in your early sessions.

Getting Started Today

The path from speech to text starts with installing a tool and making one dictation attempt in real conditions. Download Steno at stenofast.com, open your email client or notes app, hold the hotkey, and say your first sentence. Everything else follows from that first attempt.

Voice dictation is one of the few productivity tools where the learning curve and the benefits both happen on the same day. You do not need weeks of practice to get immediate value — just the willingness to speak the first sentence.