Private Voice to Text on Mac: How Steno Protects Your Audio Data

All posts

When you use a voice-to-text tool, you are handing over one of the most personal forms of data possible: your voice. Your voice contains not just the words you speak, but biometric information — vocal patterns that are as unique as your fingerprint. It reveals your accent, your emotional state, your speech patterns. Trusting a dictation app with this data requires knowing exactly what happens to it.

Steno was designed with a privacy-first architecture. Your audio is sent directly to the transcription provider, is not stored after processing, and is never used to train models. This article explains exactly how Steno handles your voice data at every step of the pipeline.

The Audio Data Lifecycle

Understanding Steno's privacy model starts with understanding the complete lifecycle of your audio data — from the moment you press the hotkey to the moment text appears at your cursor.

Step 1: Local Capture

When you hold the hotkey, Steno captures audio from your microphone using AVFoundation. The audio data exists only in memory — it is never written to disk on your Mac. There are no temporary audio files created in your home directory, no cache files, no log files containing audio data.

The audio buffer is held in RAM for the duration of your dictation. When you release the hotkey, this buffer is used to create the transcription request. After the request is sent, the buffer is immediately deallocated. Swift's Automatic Reference Counting ensures this deallocation is deterministic — the memory is freed the instant the last reference to it is released, not at some undefined future point when a garbage collector runs.

Step 2: Transmission

The audio data is transmitted over HTTPS to the transcription API. The connection uses TLS 1.3, which provides state-of-the-art encryption for data in transit. No intermediate server sees the audio — it goes directly from your Mac to the transcription endpoint.

Steno does not route audio through its own servers. The backend at stenofast.com handles device registration and API key management, but audio data never touches Steno's infrastructure. This is a deliberate architectural decision that minimizes the number of systems that have access to your voice data.

Step 3: Transcription

The transcription provider (Groq) processes the audio and returns text. According to Groq's data handling policies, audio data submitted through their API is not stored after processing and is not used to train or improve their models. The audio exists in their processing pipeline only for the duration of the transcription — typically under two seconds — and is then discarded.

Step 4: Text Delivery

The transcribed text is returned to Steno over the same encrypted connection. Steno injects the text at your cursor position and optionally stores a record of the transcription (text only, never audio) in your local history at ~/.steno/stats.json. This history file exists only on your Mac and is never uploaded anywhere.

What Steno Does Not Do

It is equally important to understand what Steno explicitly does not do with your data.

No Audio Storage

Steno never saves audio files to your disk. Not temporarily, not in a cache, not in a debug log. When the transcription is complete, the audio data exists nowhere — not on your Mac, not on Steno's servers, not in the transcription provider's storage.

No Background Listening

Steno captures audio only while you are holding the hotkey. The microphone is not accessed at any other time. You can verify this yourself — macOS shows an orange indicator dot in the menu bar whenever any application is accessing the microphone. That dot appears only when you hold the Steno hotkey and disappears the moment you release it.

There is no ambient listening, no wake word detection, no "always on" microphone access. Steno is architecturally incapable of listening when you are not actively holding the hotkey, because the audio capture session is created on keydown and destroyed on keyup.

No Voice Biometric Collection

Steno does not build voice profiles from your dictation sessions. When you use the optional voice enrollment feature (which helps Steno distinguish your voice in noisy environments), the voice profile data is stored locally in ~/.steno/voice_profile.json and never leaves your Mac. The enrollment process uses on-device DSP through Apple's Accelerate framework — no audio is sent to any server during enrollment.

No Analytics on Speech Content

Steno does not analyze, categorize, or index the content of your transcriptions. It does not know whether you are dictating a legal brief, a love letter, or a shopping list. The text passes through the application on its way to your cursor and is optionally logged in your local history. No semantic analysis, no content classification, no keyword extraction.

Comparing Privacy Models

To put Steno's privacy architecture in context, it helps to understand the alternative approaches taken by other voice-to-text tools.

Cloud Dictation with Data Retention

Some major tech companies' dictation services store your audio recordings on their servers for "quality improvement." This means your voice data sits in cloud storage indefinitely, potentially accessible to human reviewers who listen to recordings to improve transcription accuracy. Several companies have faced public scrutiny for this practice, with reports of contractors listening to sensitive personal recordings.

On-Device with Model Training

Some tools perform transcription on-device but collect anonymized usage data that may include speech patterns, vocabulary frequency, and error rates. While this data is less sensitive than raw audio, it still reveals information about how and what you dictate.

Steno's Approach: Transient Processing

Steno's model is what we call transient processing. Audio exists only for the duration of processing — typically under three seconds — and is then gone. No storage, no retention, no secondary use. This is the minimum data handling necessary to provide the service: audio must be processed to become text, but it does not need to be stored, analyzed, or retained after that processing is complete.

Local Data: What Stays on Your Mac

Steno stores some data locally on your Mac in the ~/.steno/ directory. Here is a complete inventory of what is stored and why.

config.json — Your preferences (hotkey choice, audio settings, display options). Contains no personal data beyond your configuration choices.
stats.json — Usage statistics and transcription history (last 100 entries). Contains the text of your transcriptions but never audio data. You can delete this file at any time to clear your history.
voice_profile.json — If you use voice enrollment, this contains your voice profile data (spectral features, not raw audio). Used locally for voice isolation, never uploaded.
.onboarded — A zero-byte flag file indicating you have completed onboarding. Contains no data.

Your API key is stored in the macOS Keychain, protected by the operating system's encryption and access control mechanisms. It is not stored in a plain text file or in UserDefaults.

macOS Privacy Protections

Steno works within macOS's privacy framework, which provides additional layers of protection.

Microphone access requires explicit user permission. The first time Steno tries to access the microphone, macOS prompts you with a system dialog asking for permission. You can revoke this permission at any time in System Settings under Privacy and Security. Without microphone permission, Steno cannot capture audio — the operating system enforces this at the kernel level.

The orange menu bar indicator provides a real-time, tamper-proof signal of microphone access. This indicator is controlled by macOS, not by Steno, so it cannot be suppressed or hidden. If you see the orange dot when you are not holding the Steno hotkey, something is wrong — and it is not Steno, because Steno only accesses the microphone during active recording.

Open Questions and Honest Limitations

No privacy architecture is perfect, and we believe in being transparent about the limitations of ours.

The primary limitation is that Steno sends audio to an external API for transcription. This means your audio does travel over the internet, encrypted, to a third-party server. While we have selected a transcription provider with strong data handling practices, you are ultimately trusting both Steno and the provider to handle your data as described.

For users who need the absolute maximum privacy guarantee, the only fully private option is on-device transcription with no network access. Steno supports this direction and we are evaluating local transcription capabilities that would keep all data on your Mac. The tradeoff is accuracy — current on-device models are less accurate than cloud-based ones — but the gap is closing rapidly.

For most users, Steno's transient processing model provides a strong privacy guarantee: your audio is encrypted in transit, processed in under three seconds, and immediately discarded. No storage, no training, no retention.

If privacy matters to you — and it should — try Steno and see how voice-to-text can work without compromising your data.