How to Get Text From Audio: Every Method That Actually Works

All posts

Getting text from audio is one of the most universally useful things modern technology can do. Whether you have a podcast episode to reference, a meeting recording to search, an interview to quote, or simply want to dictate faster than you can type, converting audio into text unlocks content that would otherwise stay locked inside an audio file.

The challenge is that "get text from audio" covers a huge range of scenarios, and the best approach differs depending on what kind of audio you have and what you need to do with the text. This guide covers every practical method available on Mac and iPhone.

Scenario 1: You Have an Audio File and Need a Transcript

This is the most common scenario: a recording from Zoom, a voice memo, an interview, a podcast episode, or a downloaded audio file. You need the spoken content as searchable, editable text.

Option A: Online Transcription Services

The fastest path for most people is an online service. You upload the audio file, specify the language, and receive a transcript within a few minutes. The best services handle punctuation automatically, can detect multiple speakers, and export results in multiple formats (TXT, DOCX, SRT). Most offer a free tier sufficient for occasional use.

Key things to check before uploading: the file size limit, whether your format (MP3, M4A, OGG) is supported, and the service's data retention policy if the content is sensitive.

Option B: Local Transcription Software

For users who process audio frequently or work with sensitive content, local transcription software runs entirely on your Mac without sending audio to any server. Recent Mac hardware — particularly Apple Silicon — handles neural transcription models efficiently, often processing audio faster than real time.

The trade-off is that on-device models are smaller than cloud-hosted alternatives. Accuracy is generally excellent for clear speech but may fall behind cloud services on challenging audio.

Option C: Drag and Drop to a Dedicated App

Several Mac apps accept audio file drops directly, transcribe on upload, and keep the text organized in a searchable library. This workflow is particularly good if you regularly need to reference transcripts — the app becomes a searchable archive of everything you have ever transcribed.

Scenario 2: You Want to Capture Your Own Speech in Real Time

This scenario is about dictation — composing text faster than you type by speaking it directly into a document, message, or note. The audio here is your own live voice, not a pre-recorded file.

Option A: Mac Built-in Dictation

Press fn twice (or use the shortcut you configure in System Settings → Keyboard → Dictation) to activate Apple's built-in dictation. The microphone activates, you speak, and text appears at the cursor. This works in every Mac application. It is free, requires no installation, and has reasonable accuracy for standard English.

Limitations: no custom hotkey, limited to Apple's recognition model, no transcription history, and relatively limited formatting intelligence.

Option B: Third-Party Dictation Apps

Apps like Steno provide a more refined real-time dictation experience. You hold a customizable hotkey anywhere on your Mac, speak, and release — the transcribed text appears at the cursor in whatever app you are working in. This hold-to-speak model is faster and more natural than the start/stop model of built-in dictation, and the underlying recognition engine typically produces higher-accuracy output with better formatting.

Steno also maintains a history of your recent dictations, which is invaluable for recovering something you said but did not get into the right place the first time.

Option C: iPhone Keyboard Microphone

On iPhone, every text field in every app has access to dictation through the microphone button on the keyboard. Tap it, speak, tap it again to stop. This uses iOS's built-in speech recognition, which on recent iPhones runs entirely on-device for supported languages. The accuracy is good for typical messages and notes.

Scenario 3: You Want to Transcribe Audio Playing on Your Mac

Sometimes you need to transcribe audio that is playing through your Mac's speakers — a conference call, a streaming video, a podcast. The challenge is that most transcription tools only accept microphone input or file uploads, not system audio.

Option A: System Audio Capture

Virtual audio devices (tools like BlackHole or Loopback) route system audio as a virtual microphone input. You set the virtual device as your microphone source in a transcription app, and the app captures what is playing through your speakers. This is the most technically capable approach but requires installation and configuration of additional software.

Option B: Record Then Transcribe

Record the system audio to a file using QuickTime Player (File → New Audio Recording, then set the input to your system audio source), then upload the resulting file to a transcription service. This adds a step but avoids the complexity of real-time system audio routing.

Scenario 4: You Want to Extract Quotes From Recorded Meetings

Meeting recordings are among the most common audio-to-text requests. You need to find who said what, extract specific quotes, and share action items.

For this use case, a transcription service with speaker diarization is essential. Services that identify speakers label each segment in the transcript, making it straightforward to search for a specific person's contributions or extract quoted remarks accurately. Without diarization, a long meeting transcript is a dense wall of undifferentiated text.

Matching Method to Content Type

The right method depends on your content type:

Voice memos and personal notes: Upload to any online service or use a local app. Quality expectations can be relaxed since the content is your own voice in controlled conditions.
Client or customer interviews: Use a service with diarization and consider the privacy implications of uploading the audio to a third-party server.
Meeting recordings: Diarization is essential. Most meeting tools (Zoom, Teams, Google Meet) now offer built-in transcription that is worth using before resorting to external services.
Your own live speech: Use a real-time dictation tool rather than record-then-transcribe. The efficiency gain is significant.
Podcast episodes or lectures: Batch transcription services handle long-form content well. Export as DOCX for easy editing and search.

Getting text from audio is almost always worth doing. A searchable text transcript is more useful than the audio file in almost every context.

For more detail on specific audio formats and how to prepare your files for transcription, see our guide on how to transcribe audio files into text.