Voice to Text API Free: Options for Developers and Power Users

All posts

The question of which voice to text API is truly free comes up constantly among developers building speech features, researchers automating transcription workflows, and power users trying to wire together custom dictation pipelines. The honest answer is nuanced: several options exist, but "free" almost always means "free up to a limit" or "free if you self-host."

This guide covers the main free and low-cost voice to text API options, what their actual limits are, and which approach suits different types of users and projects.

Apple's SFSpeechRecognizer: Free for Mac and iOS Apps

If you are building a native Mac or iOS application, Apple's Speech framework provides a free, built-in speech recognition API that requires no external service. The SFSpeechRecognizer class handles both audio file transcription and real-time microphone transcription, with no API key, no usage limits for on-device processing, and no cost.

The capabilities include:

Real-time speech recognition from microphone input
Batch transcription from audio files
On-device processing (no network required) on Apple Silicon Macs and recent iPhones
Support for around 60 languages
Basic punctuation and capitalization in output

The limitations are accuracy-related. The on-device model is smaller than cloud-based alternatives and produces less accurate results on noisy audio, unusual accents, or domain-specific vocabulary. Cloud-based processing (which Apple performs automatically for devices that do not support on-device recognition) has per-device request limits, though Apple has not published precise numbers.

For Mac developers building productivity tools, dictation utilities, or accessibility features, SFSpeechRecognizer is frequently the right choice — zero cost, native integration, and no dependency on external services.

Web Speech API: Free in Browsers

The Web Speech API, supported in Chrome and some other browsers, provides free real-time speech recognition for web applications. Developers can use it to add voice input to web apps without any API key or cost. It works well for simple use cases: voice search, dictation fields, voice commands.

The limitations are significant for serious applications. The Web Speech API only works in browsers — you cannot use it in a native app, a command-line tool, or a server-side process. It requires a network connection (Chrome sends audio to Google's servers). And it has no support for audio file transcription — only real-time microphone input from a browser context.

Open-Source Self-Hosted Options

For developers who want free transcription without usage limits and with full control over their data, self-hosting an open-source speech recognition model is the most powerful option.

Running Models Locally

Several high-quality open-source speech recognition models can run locally on Mac hardware. On Apple Silicon (M1/M2/M3/M4), these models can process audio faster than real time — meaning a 10-minute recording transcribes in under a minute. The accuracy of the best open-source models approaches that of leading commercial APIs for clean audio.

The upfront investment is the setup work: downloading model weights, installing dependencies, and building a pipeline that accepts your audio format and returns text. For developers comfortable with command-line tools and Python, this is straightforward. For non-developers, it is a significant barrier.

Self-Hosted Server APIs

Some open-source projects wrap local speech recognition models in an HTTP API that mimics the interface of commercial services. You run the server locally or on a machine you control, and your applications call it exactly like a commercial API — but with no per-minute cost and no data leaving your infrastructure. For organizations with strict data sovereignty requirements, this pattern is increasingly common.

Commercial Free Tiers Worth Knowing

Several commercial speech recognition services offer free tiers adequate for development and light use:

AssemblyAI: Offers a free tier with limited monthly transcription minutes, good accuracy, and a well-documented REST API. Suitable for prototyping and low-volume applications.
Deepgram: Free credit on signup, real-time and batch transcription, developer-friendly API. One of the best options for developers who need streaming transcription.
Rev.ai: Limited free tier focused on batch transcription, useful for evaluating accuracy on your specific audio type before committing to a paid plan.

All of these commercial services will eventually charge based on usage volume. The economics of running large speech recognition models at scale mean that truly unlimited free tiers are not sustainable — budget free tiers are tools for evaluation and development, not for production use without a payment plan.

The Cost of Building vs. Buying

For developers evaluating whether to use a free API or build their own transcription capability, the key question is volume and control. If you need to transcribe a few hours per month, a commercial free tier is fine. If you need to transcribe hundreds of hours and want to control your costs, self-hosting scales better. If you need specialized accuracy for domain-specific vocabulary, custom-trained models outperform general-purpose APIs on your specific use case.

For individual users who just want to dictate more efficiently — without building anything — the right answer is not an API at all. Steno abstracts all of this complexity behind a simple hotkey: hold to speak, release to transcribe, text appears at the cursor. No API keys, no quotas, no setup beyond installation.

Privacy and Data Considerations

Free APIs come with implicit costs in data. When you use a commercial free tier, your audio typically flows through that company's infrastructure. The terms of service govern how that audio is stored, whether it is used for model improvement, and how long it is retained. Before integrating any speech API into an application that processes sensitive content, review the provider's data processing agreement carefully.

On-device APIs and self-hosted open-source solutions avoid this entirely — your audio never leaves hardware you control.

The best free voice to text API is the one that matches your constraints: Apple's built-in API for native apps, the Web Speech API for browser prototypes, and self-hosted models for volume workloads or privacy-sensitive applications.

For a broader overview of the speech recognition landscape on Mac, see our guide on speech recognition APIs for Mac developers.