How it works Features Pricing Blog Tools
Blog

Which Whisper Model Should You Use? Tiny vs Base vs Small vs Medium vs Large

OpenAI’s Whisper comes in five sizes. Pick too small and you’ll get sloppy transcriptions. Pick too large and you’ll wait seconds for every sentence. The right model depends on your hardware, your language, and what you’re dictating.

This guide cuts through the confusion.

The Five Models at a Glance

ModelParametersDisk SizeRAM UsageRelative SpeedEnglish Accuracy
Tiny39M75MB~400MB1x (baseline)~88%
Base74M142MB~500MB~0.8x~91%
Small244M466MB~1GB~0.5x~95%
Medium769M1.5GB~2.5GB~0.25x~97%
Large1.55B3GB~4.5GB~0.1x~98%

Speed is relative: 1x means real-time (1 second of audio takes 1 second to process). 0.5x means twice real-time (1 second of audio takes 0.5 seconds). Lower is faster. Benchmarks on M1 MacBook Air with Metal GPU acceleration via whisper.cpp.

Quick Decision Tree

Not sure which to pick? Answer one question:

What do you mostly dictate?

  • Quick messages, Slack, notesSmall (fast enough to feel instant, accurate enough for everyday English)
  • Long-form writing, emails, documentsSmall or Medium (accuracy matters more for longer content)
  • Technical terms, jargon-heavyMedium (better at uncommon words)
  • Non-English languagesMedium or Large (smaller models are English-biased)
  • Maximum accuracy, don’t care about speedLarge
  • Older Mac or limited disk spaceTiny or Base

TL;DR: Start with Small. It’s the right answer for 80% of people.

Model-by-Model Breakdown

Tiny (75MB)

The smallest Whisper model. It runs near-instantly on any Mac, including base M1 with 8GB RAM.

Accuracy: Good for simple, clear English. Struggles with:

  • Technical terms and proper nouns
  • Accented speech
  • Background noise
  • Non-English languages (accuracy drops significantly)

Speed: Transcribes ~10x faster than real-time on Apple Silicon. A 30-second dictation processes in about 3 seconds.

Best for:

  • Quick Slack messages and chat
  • Simple notes and reminders
  • Macs with limited storage
  • When you need instant results above all else

Not recommended for:

  • Professional writing
  • Technical or medical dictation
  • Non-English languages
  • Noisy environments

Base (142MB)

A modest step up from Tiny. Noticeably better accuracy with minimal speed penalty.

Accuracy: Handles everyday English well. Better than Tiny at:

  • Common proper nouns
  • Slightly accented speech
  • Standard punctuation

Speed: ~8x faster than real-time on Apple Silicon.

Best for:

  • Daily dictation on older hardware
  • Users who find Tiny too inaccurate but don’t want to wait for Small

Honest take: Base is the awkward middle child. For most users, the jump from Base to Small is worth the extra 324MB of disk space. If your Mac can handle Small, skip Base.


Small (466MB) - The Sweet Spot

This is the model we recommend to almost everyone. It hits the intersection of “fast enough to feel responsive” and “accurate enough for real work.”

Accuracy: Excellent for English. Handles:

  • Most proper nouns and brand names
  • Light technical vocabulary
  • Moderate background noise
  • Clear non-English speech (good, not great)

Speed: ~4-5x faster than real-time on Apple Silicon. A 30-second dictation processes in about 6-7 seconds. In practice, text starts appearing while you’re still speaking.

Best for:

  • Everyday dictation - emails, messages, documents
  • English-primary users
  • Apple Silicon Macs with 8GB+ RAM
  • Users who want a “set it and forget it” model

The case for Small: It’s the model where further accuracy gains require disproportionate increases in size and processing time. Small to Medium is a 3x increase in model size for maybe 2% more accuracy in English. That tradeoff only makes sense if you have specific needs.


Medium (1.5GB)

The professional-grade model. Noticeably more accurate than Small, especially for non-English and technical content, at the cost of speed and disk space.

Accuracy: Near-professional transcription quality:

  • Excellent with technical terminology
  • Handles accented English very well
  • Good multilingual support (especially European and East Asian languages)
  • Better at disambiguating similar-sounding words
  • More reliable punctuation and formatting

Speed: ~2x faster than real-time on M1/M2, ~3x on M3/M4. A 30-second dictation processes in about 15 seconds. There’s a noticeable pause after you stop speaking.

Best for:

  • Professional writing - articles, reports, documentation
  • Medical or legal terminology (pair with a custom dictionary)
  • Multilingual dictation
  • Apple Silicon Macs with 16GB+ RAM
  • Users who don’t mind a short wait for better quality

Trade-off: The 15-second wait after a 30-second dictation breaks the “instant” feel. If you dictate in short bursts (messages, quick notes), this lag is frustrating. If you dictate in longer sessions (paragraphs, emails), it’s acceptable.


Large (3GB)

The full-size Whisper model. Maximum accuracy, maximum resource usage.

Accuracy: The best Whisper can offer:

  • 98%+ English accuracy
  • Best multilingual performance across all 99 languages
  • Most reliable with rare words, names, and technical terms
  • Best at noisy environments and accented speech

Speed: ~1x real-time on M1/M2 (just barely keeping up), ~1.5-2x on M3/M4 Pro. A 30-second dictation takes 15-30 seconds to process.

Best for:

  • Professional transcription of recorded audio
  • Languages where smaller models underperform
  • Academic or highly technical content
  • M3/M4 Pro or Max Macs with 32GB+ RAM

Honest take: For live dictation (speak → get text), Large is usually overkill. The wait time undermines the workflow. Large shines for batch transcription - processing recorded audio files where you don’t need instant results.

Hardware Recommendations

Apple Silicon (M1/M2/M3/M4)

MacRAMRecommended ModelNotes
MacBook Air (M1, 8GB)8GBSmallComfortable for dictation
MacBook Pro (M1/M2, 16GB)16GBSmall or MediumMedium works well
MacBook Pro (M3/M4, 16GB+)16-36GBMediumFast enough to feel snappy
Mac Studio / Mac Pro32GB+Medium or LargeCan handle anything

Intel Macs

MacRecommended ModelNotes
Any Intel MacTiny or BaseNo Metal GPU - much slower inference
Intel with discrete GPUBase or SmallSlightly better, still slow vs. AS

Rule of thumb: On Apple Silicon, your model choice is limited by patience, not hardware. On Intel, your hardware limits you to smaller models.

The Custom Dictionary Factor

Raw Whisper accuracy isn’t the whole story. Every model - even Tiny - gets dramatically better with a custom dictionary.

If Whisper consistently mishears your company name, a colleague’s name, or domain jargon, adding those terms to a custom dictionary fixes the problem regardless of model size. A Small model with a good dictionary often outperforms a Large model without one for domain-specific dictation.

Practical tip: After your first week of dictation, review your transcripts for repeated errors. Add the correct terms to your dictionary. This 10-minute investment pays off permanently.

How to Switch Models

In LexaWrite, switching models takes about 30 seconds:

  1. Open LexaWrite settings
  2. Go to the Model section
  3. Select a new model size
  4. Wait for the download (one-time, models are cached)
  5. Start dictating

You can keep multiple models downloaded and switch between them. Use Small for quick messages and Medium for long-form writing.

Our Recommendation

Start with Small. Use it for a week. If you find yourself frequently correcting transcription errors that a larger model would catch, try Medium. If Small feels perfect, stay there - there’s no prize for using a bigger model.

Most LexaWrite users never leave Small. It’s that good.

Download LexaWrite and try for yourself - model switching is free.

S
Written by Salih Caglar Ispirli

Independent developer and creator of LexaWrite. Building privacy-first Mac apps with Swift and on-device AI.