Which Whisper Model Should You Use? Tiny vs Base vs Small vs Medium vs Large
OpenAI’s Whisper comes in five sizes. Pick too small and you’ll get sloppy transcriptions. Pick too large and you’ll wait seconds for every sentence. The right model depends on your hardware, your language, and what you’re dictating.
This guide cuts through the confusion.
The Five Models at a Glance
| Model | Parameters | Disk Size | RAM Usage | Relative Speed | English Accuracy |
|---|---|---|---|---|---|
| Tiny | 39M | 75MB | ~400MB | 1x (baseline) | ~88% |
| Base | 74M | 142MB | ~500MB | ~0.8x | ~91% |
| Small | 244M | 466MB | ~1GB | ~0.5x | ~95% |
| Medium | 769M | 1.5GB | ~2.5GB | ~0.25x | ~97% |
| Large | 1.55B | 3GB | ~4.5GB | ~0.1x | ~98% |
Speed is relative: 1x means real-time (1 second of audio takes 1 second to process). 0.5x means twice real-time (1 second of audio takes 0.5 seconds). Lower is faster. Benchmarks on M1 MacBook Air with Metal GPU acceleration via whisper.cpp.
Quick Decision Tree
Not sure which to pick? Answer one question:
What do you mostly dictate?
- Quick messages, Slack, notes → Small (fast enough to feel instant, accurate enough for everyday English)
- Long-form writing, emails, documents → Small or Medium (accuracy matters more for longer content)
- Technical terms, jargon-heavy → Medium (better at uncommon words)
- Non-English languages → Medium or Large (smaller models are English-biased)
- Maximum accuracy, don’t care about speed → Large
- Older Mac or limited disk space → Tiny or Base
TL;DR: Start with Small. It’s the right answer for 80% of people.
Model-by-Model Breakdown
Tiny (75MB)
The smallest Whisper model. It runs near-instantly on any Mac, including base M1 with 8GB RAM.
Accuracy: Good for simple, clear English. Struggles with:
- Technical terms and proper nouns
- Accented speech
- Background noise
- Non-English languages (accuracy drops significantly)
Speed: Transcribes ~10x faster than real-time on Apple Silicon. A 30-second dictation processes in about 3 seconds.
Best for:
- Quick Slack messages and chat
- Simple notes and reminders
- Macs with limited storage
- When you need instant results above all else
Not recommended for:
- Professional writing
- Technical or medical dictation
- Non-English languages
- Noisy environments
Base (142MB)
A modest step up from Tiny. Noticeably better accuracy with minimal speed penalty.
Accuracy: Handles everyday English well. Better than Tiny at:
- Common proper nouns
- Slightly accented speech
- Standard punctuation
Speed: ~8x faster than real-time on Apple Silicon.
Best for:
- Daily dictation on older hardware
- Users who find Tiny too inaccurate but don’t want to wait for Small
Honest take: Base is the awkward middle child. For most users, the jump from Base to Small is worth the extra 324MB of disk space. If your Mac can handle Small, skip Base.
Small (466MB) - The Sweet Spot
This is the model we recommend to almost everyone. It hits the intersection of “fast enough to feel responsive” and “accurate enough for real work.”
Accuracy: Excellent for English. Handles:
- Most proper nouns and brand names
- Light technical vocabulary
- Moderate background noise
- Clear non-English speech (good, not great)
Speed: ~4-5x faster than real-time on Apple Silicon. A 30-second dictation processes in about 6-7 seconds. In practice, text starts appearing while you’re still speaking.
Best for:
- Everyday dictation - emails, messages, documents
- English-primary users
- Apple Silicon Macs with 8GB+ RAM
- Users who want a “set it and forget it” model
The case for Small: It’s the model where further accuracy gains require disproportionate increases in size and processing time. Small to Medium is a 3x increase in model size for maybe 2% more accuracy in English. That tradeoff only makes sense if you have specific needs.
Medium (1.5GB)
The professional-grade model. Noticeably more accurate than Small, especially for non-English and technical content, at the cost of speed and disk space.
Accuracy: Near-professional transcription quality:
- Excellent with technical terminology
- Handles accented English very well
- Good multilingual support (especially European and East Asian languages)
- Better at disambiguating similar-sounding words
- More reliable punctuation and formatting
Speed: ~2x faster than real-time on M1/M2, ~3x on M3/M4. A 30-second dictation processes in about 15 seconds. There’s a noticeable pause after you stop speaking.
Best for:
- Professional writing - articles, reports, documentation
- Medical or legal terminology (pair with a custom dictionary)
- Multilingual dictation
- Apple Silicon Macs with 16GB+ RAM
- Users who don’t mind a short wait for better quality
Trade-off: The 15-second wait after a 30-second dictation breaks the “instant” feel. If you dictate in short bursts (messages, quick notes), this lag is frustrating. If you dictate in longer sessions (paragraphs, emails), it’s acceptable.
Large (3GB)
The full-size Whisper model. Maximum accuracy, maximum resource usage.
Accuracy: The best Whisper can offer:
- 98%+ English accuracy
- Best multilingual performance across all 99 languages
- Most reliable with rare words, names, and technical terms
- Best at noisy environments and accented speech
Speed: ~1x real-time on M1/M2 (just barely keeping up), ~1.5-2x on M3/M4 Pro. A 30-second dictation takes 15-30 seconds to process.
Best for:
- Professional transcription of recorded audio
- Languages where smaller models underperform
- Academic or highly technical content
- M3/M4 Pro or Max Macs with 32GB+ RAM
Honest take: For live dictation (speak → get text), Large is usually overkill. The wait time undermines the workflow. Large shines for batch transcription - processing recorded audio files where you don’t need instant results.
Hardware Recommendations
Apple Silicon (M1/M2/M3/M4)
| Mac | RAM | Recommended Model | Notes |
|---|---|---|---|
| MacBook Air (M1, 8GB) | 8GB | Small | Comfortable for dictation |
| MacBook Pro (M1/M2, 16GB) | 16GB | Small or Medium | Medium works well |
| MacBook Pro (M3/M4, 16GB+) | 16-36GB | Medium | Fast enough to feel snappy |
| Mac Studio / Mac Pro | 32GB+ | Medium or Large | Can handle anything |
Intel Macs
| Mac | Recommended Model | Notes |
|---|---|---|
| Any Intel Mac | Tiny or Base | No Metal GPU - much slower inference |
| Intel with discrete GPU | Base or Small | Slightly better, still slow vs. AS |
Rule of thumb: On Apple Silicon, your model choice is limited by patience, not hardware. On Intel, your hardware limits you to smaller models.
The Custom Dictionary Factor
Raw Whisper accuracy isn’t the whole story. Every model - even Tiny - gets dramatically better with a custom dictionary.
If Whisper consistently mishears your company name, a colleague’s name, or domain jargon, adding those terms to a custom dictionary fixes the problem regardless of model size. A Small model with a good dictionary often outperforms a Large model without one for domain-specific dictation.
Practical tip: After your first week of dictation, review your transcripts for repeated errors. Add the correct terms to your dictionary. This 10-minute investment pays off permanently.
How to Switch Models
In LexaWrite, switching models takes about 30 seconds:
- Open LexaWrite settings
- Go to the Model section
- Select a new model size
- Wait for the download (one-time, models are cached)
- Start dictating
You can keep multiple models downloaded and switch between them. Use Small for quick messages and Medium for long-form writing.
Our Recommendation
Start with Small. Use it for a week. If you find yourself frequently correcting transcription errors that a larger model would catch, try Medium. If Small feels perfect, stay there - there’s no prize for using a bigger model.
Most LexaWrite users never leave Small. It’s that good.
Download LexaWrite and try for yourself - model switching is free.