Type What You Hear: Real-Time Captioning, Transcription, and Live Dictation on Mac

All posts

The phrase "type what you hear" covers a surprisingly wide range of use cases. Sometimes it means you want your Mac to transcribe someone else's speech — a speaker at a presentation, a person talking to you, or audio playing through your computer — turning it into on-screen captions or a text log. Other times it means you want to dictate your own thoughts and have them appear as typed text. And sometimes it means you are deaf or hard of hearing and need real-time captioning to follow speech-heavy situations.

Each of these scenarios needs a different tool, and conflating them leads to frustration. This guide separates them clearly.

Scenario 1: Typing Your Own Speech (Live Dictation)

If your goal is to replace keyboard typing with speaking — composing emails, documents, messages, and notes by speaking — live dictation is what you need. In this scenario, the "audio you hear" is your own voice, and the goal is to see your words appear on screen as fast as possible in whatever app you are working in.

This is the most common productivity use case. Modern live dictation apps handle it with impressive accuracy and low latency. The key requirements are universal app support (text appears wherever your cursor is, not in a separate panel), fast transcription that keeps up with normal speaking pace, and smart formatting that handles capitalization and punctuation automatically.

Steno is built for exactly this use case on Mac and iPhone. Hold the hotkey, speak, release — text appears in any app. It is the fastest way to get your thoughts out of your head and into a document.

Scenario 2: Captioning Someone Else's Speech

If you want to see text captions of someone else speaking — a teacher, a colleague, a presenter, or anyone you are listening to — macOS has a built-in feature for this. Live Captions in macOS (available from Ventura onward) can transcribe any audio playing through your Mac's microphone or system audio in real time, displaying the captions in a floating overlay window.

To enable Live Captions on Mac: System Settings, Accessibility, Live Captions, toggle on. Once enabled, a caption window appears that follows any audio source — your microphone, a video call, a YouTube video, a podcast. The captions are displayed at the bottom of the window and update in real time.

Live Captions on Apple Silicon Macs process audio entirely on-device, which means the captions work offline and no audio is sent to external servers. This is particularly valuable for private conversations and confidential meetings where you want transcription without external data exposure.

The caption window can be resized, repositioned, and styled. You can increase the text size for easier reading, change the background opacity, and pin the window to stay on top of other applications so it remains visible during video calls, presentations, or any other activity.

Scenario 3: Accessibility Captioning for Deaf and Hard of Hearing Users

For users who are deaf or hard of hearing, real-time captioning is not a productivity tool — it is access. The difference matters because accessibility captioning needs higher reliability, lower tolerance for errors, and often more specialized vocabulary than productivity transcription.

macOS Live Captions serves this use case reasonably well for everyday situations. For professional contexts — legal proceedings, medical appointments, educational settings — Communication Access Realtime Translation (CART) services with human stenographers remain the gold standard for accuracy, though technology-based systems are narrowing the gap.

iPhone also includes Live Captions for phone calls, FaceTime, and microphone input. In Settings, Accessibility, Live Captions, you can enable captioning for phone calls and FaceTime calls. The captions appear during the call and are not shared with the other party — they are processed locally on your device.

Scenario 4: Transcribing System Audio to Text

A less common but useful scenario is wanting to transcribe audio playing through your computer speakers or headphones — a podcast episode, a recorded lecture, or a video — into a text document you can search and reference later.

macOS Live Captions can do this in real time, but it displays captions rather than saving a transcript. To create a saved transcript from system audio, you need a tool that captures system audio and runs it through a transcription pipeline. This requires routing your audio through a virtual audio device and into a transcription tool, which is more technically involved than most users need.

For most practical purposes, if you want a text version of a podcast or video, it is easier to look for an existing transcript (most major podcasts now publish them) or to use a file-based transcription service after downloading the audio.

Choosing the Right Approach

Match the tool to the direction of the audio:

Your own voice to text — use a dictation app like Steno that injects typed text into any app
Someone else's speech to text captions — use macOS Live Captions or iPhone's accessibility captions
A recorded audio file to text — use a file-based transcription service
System audio to live captions — use macOS Live Captions with a system audio source

For the dictation use case, download Steno at stenofast.com to start typing with your voice in any Mac application. For accessibility captioning, the macOS Live Captions feature in System Settings requires no additional software and works immediately on Apple Silicon hardware.

Audio is everywhere. The tools to turn it into searchable, editable text now cover every direction — yours to the screen, theirs to your screen, and any recording to your clipboard.