Open-Source • Apache 2.0 License

Voxtral: Open-Source Speech Intelligence by Mistral AI

The first open-source audio model family that bridges the gap between accessibility and performance. Outperforms Whisper and rivals GPT-4o at half the cost.

Mistral AI24B & 3B Models32k Context8+ Languages
Performance

Industry-Leading Benchmarks

Voxtral sets new standards in speech recognition accuracy

50%
Lower Cost
vs. proprietary APIs
32k
Token Context
Process up to 40 min audio
8+
Languages
Native multilingual support
Apache 2.0
License
Full commercial use
Voxtral performance comparison chart

Comprehensive performance comparison across multiple speech recognition models and benchmarks

Features

Beyond Traditional Speech Recognition

Voxtral combines transcription, understanding, and action in a single unified model

State-of-the-Art Transcription

Outperforms Whisper large-v3 across all benchmarks with superior accuracy in noisy environments

Native Understanding

Built-in Q&A and summarization without needing separate ASR and language models

Multilingual Intelligence

Automatic language detection with high accuracy for English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian

Voice-to-Function Execution

Revolutionary ability to trigger backend functions and API calls directly from spoken commands

Flexible Deployment

Run locally on-premise, via API, or integrate with existing infrastructure. Available in 24B and 3B sizes

Enterprise Ready

Apache 2.0 license with support for private deployment, domain-specific fine-tuning, and secure on-premise hosting

Model Variants

Choose the Right Model for Your Needs

Two powerful models optimized for different use cases

Voxtral Small (24B)
Production Scale
Flagship model for highest accuracy
  • 24 billion parameters
  • Best-in-class accuracy
  • Multi-GPU deployment
  • ~55 GB GPU RAM required
  • Competes with GPT-4o-mini
Voxtral Mini (3B)
Edge Optimized
Efficient model for local deployment
  • 3 billion parameters
  • Real-time processing
  • Single GPU capable
  • ~9.5 GB GPU RAM required
  • Perfect for edge devices
Performance

Comprehensive Benchmark Results

Voxtral consistently outperforms industry leaders across all standard datasets

English Short-Form Transcription (WER %)
Performance on audio clips under 30 seconds
DatasetWhisper large-v3Voxtral 24B
LibriSpeech Clean1.9%1.2%
LibriSpeech Other3.4%2.1%
GigaSpeech5.8%3.9%
VoxPopuli4.1%2.6%
CHiME-4 (noisy)9.7%6.4%
Multilingual Performance (WER %)
Mozilla Common Voice 15.1 benchmark results
LanguageWhisper large-v3Voxtral 24B
French4.9%3.2%
German5.7%3.8%
Spanish5.2%3.4%
Italian6.5%4.1%
Portuguese4.6%2.9%
Dutch6.0%3.9%
Hindi11.4%7.8%

Lower WER (Word Error Rate) is better

Get Started

Quick Start with Voxtral

Get up and running with Voxtral in minutes

1
Install Dependencies
Set up vLLM with audio support
# Install vLLM with audio support
uv pip install -U "vllm[audio]" --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

# Install Mistral Common
pip install --upgrade mistral_common[audio]
Applications

Real-World Use Cases

Transform your applications with Voxtral's advanced speech intelligence

Meeting Intelligence

Transcribe, summarize, and extract action items from business meetings in real-time

Customer Support Analytics

Analyze calls, identify sentiment, and generate coaching insights automatically

Real-Time Translation

Build seamless communication across language barriers with native multilingual support

Voice Assistants

Create intelligent agents that understand context and execute complex workflows

Medical Transcription

Secure on-premise deployment for HIPAA-compliant medical documentation

Educational Tools

Transcribe lectures, generate study notes, and create accessible content

FAQ

Frequently Asked Questions

Everything you need to know about Voxtral

Ready to Get Started?

Transform Your Applications with Voxtral

Join the open-source speech AI revolution