Whisper Model Comparison: Tiny vs Base vs Small vs Medium vs Large (2026)

OpenAI’s Whisper comes in five sizes. Pick too small and you’ll get sloppy transcriptions. Pick too large and you’ll wait seconds for every sentence. The right model depends on your hardware, your language, and what you’re dictating.

This guide cuts through the confusion.

The Five Models at a Glance

Model	Parameters	Disk Size	RAM Usage	Relative Speed	English Accuracy
Tiny	39M	75MB	~400MB	1x (baseline)	~88%
Base	74M	142MB	~500MB	~0.8x	~91%
Small	244M	466MB	~1GB	~0.5x	~95%
Medium	769M	1.5GB	~2.5GB	~0.25x	~97%
Large	1.55B	3GB	~4.5GB	~0.1x	~98%

Speed is relative: 1x means real-time (1 second of audio takes 1 second to process). 0.5x means twice real-time (1 second of audio takes 0.5 seconds). Lower is faster. Benchmarks on M1 MacBook Air with Metal GPU acceleration via whisper.cpp.

Quick Decision Tree

Not sure which to pick? Answer one question:

What do you mostly dictate?

Quick messages, Slack, notes → Small (fast enough to feel instant, accurate enough for everyday English)
Long-form writing, emails, documents → Small or Medium (accuracy matters more for longer content)
Technical terms, jargon-heavy → Medium (better at uncommon words)
Non-English languages → Medium or Large (smaller models are English-biased)
Maximum accuracy, don’t care about speed → Large
Older Mac or limited disk space → Tiny or Base

TL;DR: Start with Small. It’s the right answer for 80% of people.

Model-by-Model Breakdown

Tiny (75MB)

The smallest Whisper model. It runs near-instantly on any Mac, including base M1 with 8GB RAM.

Accuracy: Good for simple, clear English. Struggles with:

Technical terms and proper nouns
Accented speech
Background noise
Non-English languages (accuracy drops significantly)

Speed: Transcribes ~10x faster than real-time on Apple Silicon. A 30-second dictation processes in about 3 seconds.

Best for:

Quick Slack messages and chat
Simple notes and reminders
Macs with limited storage
When you need instant results above all else

Not recommended for:

Professional writing
Technical or medical dictation
Non-English languages
Noisy environments

Base (142MB)

A modest step up from Tiny. Noticeably better accuracy with minimal speed penalty.

Accuracy: Handles everyday English well. Better than Tiny at:

Common proper nouns
Slightly accented speech
Standard punctuation

Speed: ~8x faster than real-time on Apple Silicon.

Best for:

Daily dictation on older hardware
Users who find Tiny too inaccurate but don’t want to wait for Small

Honest take: Base is the awkward middle child. For most users, the jump from Base to Small is worth the extra 324MB of disk space. If your Mac can handle Small, skip Base.

Small (466MB) - The Sweet Spot

This is the model we recommend to almost everyone. It hits the intersection of “fast enough to feel responsive” and “accurate enough for real work.”

Accuracy: Excellent for English. Handles:

Most proper nouns and brand names
Light technical vocabulary
Moderate background noise
Clear non-English speech (good, not great)

Speed: ~4-5x faster than real-time on Apple Silicon. A 30-second dictation processes in about 6-7 seconds. In practice, text starts appearing while you’re still speaking.

Best for:

Everyday dictation - emails, messages, documents
English-primary users
Apple Silicon Macs with 8GB+ RAM
Users who want a “set it and forget it” model

The case for Small: It’s the model where further accuracy gains require disproportionate increases in size and processing time. Small to Medium is a 3x increase in model size for maybe 2% more accuracy in English. That tradeoff only makes sense if you have specific needs.

Medium (1.5GB)

The professional-grade model. Noticeably more accurate than Small, especially for non-English and technical content, at the cost of speed and disk space.

Accuracy: Near-professional transcription quality:

Excellent with technical terminology
Handles accented English very well
Good multilingual support (especially European and East Asian languages)
Better at disambiguating similar-sounding words
More reliable punctuation and formatting

Speed: ~2x faster than real-time on M1/M2, ~3x on M3/M4. A 30-second dictation processes in about 15 seconds. There’s a noticeable pause after you stop speaking.

Best for:

Professional writing - articles, reports, documentation
Medical or legal terminology (pair with a custom dictionary)
Multilingual dictation
Apple Silicon Macs with 16GB+ RAM
Users who don’t mind a short wait for better quality

Trade-off: The 15-second wait after a 30-second dictation breaks the “instant” feel. If you dictate in short bursts (messages, quick notes), this lag is frustrating. If you dictate in longer sessions (paragraphs, emails), it’s acceptable.

Large (3GB)

The full-size Whisper model. Maximum accuracy, maximum resource usage.

Accuracy: The best Whisper can offer:

98%+ English accuracy
Best multilingual performance across all 99 languages
Most reliable with rare words, names, and technical terms
Best at noisy environments and accented speech

Speed: ~1x real-time on M1/M2 (just barely keeping up), ~1.5-2x on M3/M4 Pro. A 30-second dictation takes 15-30 seconds to process.

Best for:

Professional transcription of recorded audio
Languages where smaller models underperform
Academic or highly technical content
M3/M4 Pro or Max Macs with 32GB+ RAM

Honest take: For live dictation (speak → get text), Large is usually overkill. The wait time undermines the workflow. Large shines for batch transcription - processing recorded audio files where you don’t need instant results.

Hardware Recommendations

Apple Silicon (M1/M2/M3/M4)

Mac	RAM	Recommended Model	Notes
MacBook Air (M1, 8GB)	8GB	Small	Comfortable for dictation
MacBook Pro (M1/M2, 16GB)	16GB	Small or Medium	Medium works well
MacBook Pro (M3/M4, 16GB+)	16-36GB	Medium	Fast enough to feel snappy
Mac Studio / Mac Pro	32GB+	Medium or Large	Can handle anything

Intel Macs

Mac	Recommended Model	Notes
Any Intel Mac	Tiny or Base	No Metal GPU - much slower inference
Intel with discrete GPU	Base or Small	Slightly better, still slow vs. AS

Rule of thumb: On Apple Silicon, your model choice is limited by patience, not hardware. On Intel, your hardware limits you to smaller models.

The Custom Dictionary Factor

Raw Whisper accuracy isn’t the whole story. Every model - even Tiny - gets dramatically better with a custom dictionary.

If Whisper consistently mishears your company name, a colleague’s name, or domain jargon, adding those terms to a custom dictionary fixes the problem regardless of model size. A Small model with a good dictionary often outperforms a Large model without one for domain-specific dictation.

Practical tip: After your first week of dictation, review your transcripts for repeated errors. Add the correct terms to your dictionary. This 10-minute investment pays off permanently.

How to Switch Models

In LexaWrite, switching models takes about 30 seconds:

Open LexaWrite settings
Go to the Model section
Select a new model size
Wait for the download (one-time, models are cached)
Start dictating

You can keep multiple models downloaded and switch between them. Use Small for quick messages and Medium for long-form writing.

Our Recommendation

Start with Small. Use it for a week. If you find yourself frequently correcting transcription errors that a larger model would catch, try Medium. If Small feels perfect, stay there - there’s no prize for using a bigger model.

Most LexaWrite users never leave Small. It’s that good.

Download LexaWrite and try for yourself - model switching is free.

The Five Models at a Glance

Quick Decision Tree

Model-by-Model Breakdown

Tiny (75MB)

Base (142MB)

Small (466MB) - The Sweet Spot

Medium (1.5GB)

Large (3GB)

Hardware Recommendations

Apple Silicon (M1/M2/M3/M4)

Intel Macs

The Custom Dictionary Factor

How to Switch Models

Our Recommendation

How Whisper.cpp Actually Works (and Why It's Fast on Apple Silicon)

What Is Whisper AI? A Plain English Explanation