I Built a Voice Dictation App Because the Existing Ones Frustrated Me
I think faster than I type. Always have.
My brain runs at 150 words per minute. My fingers max out at 60 on a good day. That gap - 90 words per minute of lost thought - has frustrated me for years.
So I tried every voice dictation app I could find. Apple’s built-in dictation. Superwhisper. Wispr Flow. A half-dozen others. And every single one frustrated me in a different way.
The Problems I Kept Running Into
Apple’s built-in dictation stops after 60 seconds. Just… stops. Mid-sentence. If you’re on an Intel Mac, it sends your audio to Apple’s servers. The accuracy is decent but the workflow is clunky - you have to position your cursor, activate dictation, speak, wait, then manually edit.
Cloud-based apps like Wispr Flow are impressive technically. The accuracy is great, the style-matching is clever. But they upload your voice to a server. Every word you speak. Every thought you dictate. Every half-formed idea. I’m not paranoid, but I am pragmatic - I don’t want my voice data on someone else’s server, period.
Local apps like Superwhisper are closer to what I want. On-device Whisper, good accuracy, no cloud. But Superwhisper gives you so many options, modes, and configurations that using it feels like operating a recording studio. I don’t want to choose between “voice mode” and “super mode” and “custom mode.” I want to hold a key, talk, and see my words appear.
That’s the gap I kept coming back to: no app was simultaneously private, simple, and invisible.
The Fn Key Revelation
Most dictation apps use keyboard shortcuts - ⌥Space, ⌘⇧D, double-tap Ctrl. The problem is these shortcuts inevitably conflict with something. Your text editor. Your window manager. Some app you installed three years ago.
Then I realized: the Fn key sits in the corner of every Mac keyboard, and almost nobody uses it for anything meaningful.
Hold Fn → start recording. Release Fn → stop recording, transcribe, paste.
No modifier combos. No conflicts. One key. It felt right immediately.
What I Actually Built
LexaWrite is built around three convictions:
1. Your voice never leaves your Mac.
OpenAI’s Whisper model runs entirely on-device via whisper.cpp with Metal GPU acceleration. The audio is captured, transcribed, and discarded locally. Zero network requests during transcription. Not even a DNS lookup.
I chose whisper.cpp over CoreML because it gives users control over model size - from the 75MB “tiny” model (fast, good enough for quick notes) to the 3GB “large” model (incredibly accurate, 99 languages). You pick the tradeoff that works for your hardware.
2. Dictation should be invisible.
When you’re dictating into Slack, you shouldn’t have to leave Slack. When you’re writing code in Cursor, you shouldn’t have to switch windows.
LexaWrite transcribes and auto-pastes into whatever app is in the foreground. It preserves your clipboard (saves what was there, pastes the transcription, restores the original). A small floating widget shows recording status - that’s it. No window to manage, no app to switch to.
3. Your words should sound like you.
Raw Whisper output is… robotic. It’s accurate but it doesn’t match how you actually write. So LexaWrite includes style matching - it adjusts the transcription to fit the context. Casual for iMessage. Professional for email. Technical for docs.
And for domain-specific terms that Whisper consistently gets wrong (your company name, technical jargon, that colleague with an unusual name), there’s a custom dictionary that automatically replaces misheard words.
The Hard Parts
Building a local-first voice app for Mac is harder than it looks.
Audio format conversion: Whisper requires 16kHz mono Float32 PCM. Mac microphones capture at 44.1kHz or 48kHz. Getting the conversion right without latency or quality loss took more iterations than I’d like to admit.
Model download and management: Whisper models range from 75MB to 3GB. Downloading them, storing them, switching between them, handling interrupted downloads - all of this needs to work seamlessly. Users shouldn’t need to think about model management.
The paste mechanism: Auto-pasting into the foreground app sounds simple. It’s not. You need to save the current clipboard, put the transcription on the clipboard, simulate ⌘V, wait for the paste, then restore the original clipboard - all without the user noticing. Race conditions everywhere.
Minimum recording length: Whisper needs at least 0.5 seconds of audio to produce meaningful output. If someone taps and releases Fn too quickly, you need to handle that gracefully instead of producing garbage text.
Why Now
Two things made this possible in 2025-2026 that weren’t possible before:
-
whisper.cpp maturity: Georgi Gerganov’s C/C++ port of Whisper is now stable, fast, and well-optimized for Apple Silicon. Running a “small” model on an M1 transcribes faster than real-time. That wasn’t true two years ago.
-
Apple Silicon ubiquity: Metal GPU acceleration makes on-device inference practical on every modern Mac. The “tiny” model runs in real-time even on base M1 hardware.
What’s Next
LexaWrite is free to start. I’m an indie developer building this because I use it every day - this is the dictation app I wanted to exist.
If you think faster than you type, give it a try.