Industry-Leading Benchmarks
Voxtral sets new standards in speech recognition accuracy

Comprehensive performance comparison across multiple speech recognition models and benchmarks
Beyond Traditional Speech Recognition
Voxtral combines transcription, understanding, and action in a single unified model
State-of-the-Art Transcription
Outperforms Whisper large-v3 across all benchmarks with superior accuracy in noisy environments
Native Understanding
Built-in Q&A and summarization without needing separate ASR and language models
Multilingual Intelligence
Automatic language detection with high accuracy for English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian
Voice-to-Function Execution
Revolutionary ability to trigger backend functions and API calls directly from spoken commands
Flexible Deployment
Run locally on-premise, via API, or integrate with existing infrastructure. Available in 24B and 3B sizes
Enterprise Ready
Apache 2.0 license with support for private deployment, domain-specific fine-tuning, and secure on-premise hosting
Choose the Right Model for Your Needs
Two powerful models optimized for different use cases
- 24 billion parameters
- Best-in-class accuracy
- Multi-GPU deployment
- ~55 GB GPU RAM required
- Competes with GPT-4o-mini
Comprehensive Benchmark Results
Voxtral consistently outperforms industry leaders across all standard datasets
Dataset | Whisper large-v3 | Voxtral 24B |
---|---|---|
LibriSpeech Clean | 1.9% | 1.2% |
LibriSpeech Other | 3.4% | 2.1% |
GigaSpeech | 5.8% | 3.9% |
VoxPopuli | 4.1% | 2.6% |
CHiME-4 (noisy) | 9.7% | 6.4% |
Language | Whisper large-v3 | Voxtral 24B |
---|---|---|
French | 4.9% | 3.2% |
German | 5.7% | 3.8% |
Spanish | 5.2% | 3.4% |
Italian | 6.5% | 4.1% |
Portuguese | 4.6% | 2.9% |
Dutch | 6.0% | 3.9% |
Hindi | 11.4% | 7.8% |
●Lower WER (Word Error Rate) is better
Quick Start with Voxtral
Get up and running with Voxtral in minutes
# Install vLLM with audio support
uv pip install -U "vllm[audio]" --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
# Install Mistral Common
pip install --upgrade mistral_common[audio]
Real-World Use Cases
Transform your applications with Voxtral's advanced speech intelligence
Meeting Intelligence
Transcribe, summarize, and extract action items from business meetings in real-time
Customer Support Analytics
Analyze calls, identify sentiment, and generate coaching insights automatically
Real-Time Translation
Build seamless communication across language barriers with native multilingual support
Voice Assistants
Create intelligent agents that understand context and execute complex workflows
Medical Transcription
Secure on-premise deployment for HIPAA-compliant medical documentation
Educational Tools
Transcribe lectures, generate study notes, and create accessible content
Frequently Asked Questions
Everything you need to know about Voxtral