All posts

A therapist's most important tool is attention. When a client is describing a panic attack at work, recounting a childhood memory, or sitting in silence gathering the courage to say something difficult, the therapist's presence is the intervention. Looking down at a keyboard to type notes fractures that presence. It signals, even subtly, that documentation matters more than the person in the room.

This is the fundamental tension of clinical documentation. Insurance requires it. Licensing boards require it. Good clinical practice requires it. But the act of producing notes during a session undermines the therapeutic relationship that makes the session effective in the first place. Voice to text offers a way out of this bind.

The Documentation Burden in Therapy

The average therapist sees between 20 and 30 clients per week. Each session requires documentation: presenting issues, clinical observations, interventions used, client responses, treatment plan updates, and risk assessments. A typical progress note takes 10 to 15 minutes to write after a session. That adds up to four to seven hours per week of unpaid documentation work.

Many therapists respond by taking notes during sessions. They keep a laptop open, glance down to type key phrases, and flesh out the full note later. But even brief glances create micro-ruptures in the therapeutic alliance. Research on nonverbal communication consistently shows that eye contact, head nodding, and facial mirroring are core components of empathic attunement. Every time a therapist looks at a screen, those signals pause.

Some therapists avoid in-session notes entirely and rely on memory, writing everything after the last client leaves. This preserves the therapeutic relationship but introduces a different problem: note quality degrades rapidly with time. By the fifth client of the day, the details of the first session have blurred. Was it the second client or the third who mentioned suicidal ideation in passing? Memory is unreliable, and unreliable notes create clinical and legal risk.

How Voice to Text Changes the Workflow

Voice to text solves this problem by allowing therapists to dictate notes without looking at a screen or touching a keyboard. With an app like Steno, the workflow looks like this: the session ends, the client leaves, and the therapist holds a hotkey, speaks their clinical observations aloud, and releases the key. The transcribed text appears in whatever app they use for documentation, whether that is an EHR like SimplePractice or TherapyNotes, a word processor, or even a plain text file.

The entire note can be dictated in two to three minutes. Speaking is roughly three times faster than typing for most people, and clinical observations are especially well-suited to dictation because therapists are already trained to think in narrative terms. Describing what happened in a session comes naturally when spoken aloud. It often feels more like a verbal case consultation than a documentation chore.

Between-Session Documentation

The most practical approach for most therapists is dictating notes in the five-minute gap between sessions. The client has just left, observations are fresh, and the next client has not yet arrived. Hold the hotkey, speak for two minutes, release. The note is done before the next knock on the door.

This is where Steno's hold-to-speak model is particularly useful. There is no dictation mode to toggle on and off, no button to find and click. You hold a key, talk, and let go. If you need to pause and think mid-note, you simply release the key. When you are ready to continue, hold it again. The text from each segment appears sequentially at your cursor.

SOAP Notes by Voice

Many therapists structure their progress notes using the SOAP format: Subjective, Objective, Assessment, and Plan. Voice dictation maps naturally to this structure. You can dictate each section in sequence, and Steno's Smart Rewrite feature will clean up the language, capitalize properly, and format the note for clinical readability.

A dictated SOAP note might sound like this: "Subjective. Client reports increased anxiety over the past week, particularly related to a conflict with her supervisor. She describes difficulty sleeping, racing thoughts at night, and avoidance of morning meetings. Objective. Client appeared alert and oriented. Affect was anxious with constricted range. Speech was rapid but coherent. No psychomotor agitation observed. Assessment. Symptoms consistent with generalized anxiety disorder, exacerbated by workplace stressors. Plan. Continue weekly CBT sessions. Introduce progressive muscle relaxation as a coping strategy. Client will track anxiety triggers in a journal before next session."

That takes roughly 45 seconds to speak. Typing the same note would take five to eight minutes.

Privacy and Confidentiality Considerations

Therapists rightly have heightened concerns about privacy. Client information is protected by HIPAA, state licensing laws, and professional ethics codes. Any tool that handles clinical content must meet strict confidentiality standards.

Steno addresses this in several ways. Audio is processed and immediately discarded. No recordings are stored on Steno's servers or anywhere else. The transcription happens in a single pass, and the audio data is deleted the moment the text is returned. There is no audio log to subpoena and no recording archive to breach.

The transcribed text goes directly into whatever application the therapist is using. If that application is an EHR system that is itself HIPAA-compliant, then the text lives within that compliant system from the moment it appears. Steno never stores the output text. It functions as a conduit, not a repository.

For therapists in private practice who want an additional layer of security, Steno also offers an offline mode that processes speech entirely on-device using a local Whisper model. No audio leaves the Mac at all. This is the most conservative option for clinicians who want to eliminate any possibility of data transmission.

Maintaining Therapeutic Presence

The deeper benefit of voice to text for therapists is not just time savings. It is the preservation of therapeutic presence during sessions. When you know that documentation will take two minutes instead of fifteen, and that it requires only your voice rather than your eyes and hands, you can be fully present with your client during the session itself.

No more mental note-taking during emotional moments. No more dividing attention between listening and typing. No more choosing between clinical accuracy and relational attunement. You listen fully during the session, then capture everything in a quick dictation afterwards.

Some therapists report that dictating notes actually improves their clinical thinking. Speaking observations aloud engages a different cognitive process than typing. It feels more like a verbal case formulation, which can surface patterns and insights that might not emerge when staring at a blank progress note template.

Getting Started

If you are a therapist considering voice to text for your documentation workflow, the setup is simple. Download Steno from stenofast.com, grant microphone and accessibility permissions, and choose a hotkey. The free tier gives you enough to try it with a few clients. Pro, at $4.99 per month, unlocks unlimited dictation and Smart Rewrite, which polishes your spoken notes into clean clinical prose.

Start by dictating one progress note after your next session. Hold the key, speak your observations, release. If two minutes of talking replaces fifteen minutes of typing, the math speaks for itself.

The best clinical documentation is the kind that does not interfere with clinical work. Voice to text lets therapists be therapists first and documentarians second.