Open-Source • Apache 2.0 License

Voxtral: Open-Source Speech Intelligence by Mistral AI

The first open-source audio model family that bridges the gap between accessibility and performance. Outperforms Whisper and rivals GPT-4o at half the cost.

Mistral AI24B & 3B Models32k Context8+ Languages

Performance

Industry-Leading Benchmarks

Voxtral sets new standards in speech recognition accuracy

50%

Lower Cost

vs. proprietary APIs

32k

Token Context

Process up to 40 min audio

Languages

Native multilingual support

Apache 2.0

License

Full commercial use

Comprehensive performance comparison across multiple speech recognition models and benchmarks

Features

Beyond Traditional Speech Recognition

Voxtral combines transcription, understanding, and action in a single unified model

State-of-the-Art Transcription

Outperforms Whisper large-v3 across all benchmarks with superior accuracy in noisy environments

Native Understanding

Built-in Q&A and summarization without needing separate ASR and language models

Multilingual Intelligence

Automatic language detection with high accuracy for English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian

Voice-to-Function Execution

Revolutionary ability to trigger backend functions and API calls directly from spoken commands

Flexible Deployment

Run locally on-premise, via API, or integrate with existing infrastructure. Available in 24B and 3B sizes

Enterprise Ready

Apache 2.0 license with support for private deployment, domain-specific fine-tuning, and secure on-premise hosting

Model Variants

Choose the Right Model for Your Needs

Two powerful models optimized for different use cases

Voxtral Small (24B)

Production Scale

Flagship model for highest accuracy

24 billion parameters
Best-in-class accuracy
Multi-GPU deployment
~55 GB GPU RAM required
Competes with GPT-4o-mini

Voxtral Mini (3B)

Edge Optimized

Efficient model for local deployment

3 billion parameters
Real-time processing
Single GPU capable
~9.5 GB GPU RAM required
Perfect for edge devices

Performance

Comprehensive Benchmark Results

Voxtral consistently outperforms industry leaders across all standard datasets

English Short-Form Transcription (WER %)

Performance on audio clips under 30 seconds

Dataset	Whisper large-v3	Voxtral 24B
LibriSpeech Clean	1.9%	1.2%
LibriSpeech Other	3.4%	2.1%
GigaSpeech	5.8%	3.9%
VoxPopuli	4.1%	2.6%
CHiME-4 (noisy)	9.7%	6.4%

Multilingual Performance (WER %)

Mozilla Common Voice 15.1 benchmark results

Language	Whisper large-v3	Voxtral 24B
French	4.9%	3.2%
German	5.7%	3.8%
Spanish	5.2%	3.4%
Italian	6.5%	4.1%
Portuguese	4.6%	2.9%
Dutch	6.0%	3.9%
Hindi	11.4%	7.8%

●Lower WER (Word Error Rate) is better

Get Started

Quick Start with Voxtral

Get up and running with Voxtral in minutes

Install Dependencies

Set up vLLM with audio support

# Install vLLM with audio support
uv pip install -U "vllm[audio]" --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

# Install Mistral Common
pip install --upgrade mistral_common[audio]

Applications

Real-World Use Cases

Transform your applications with Voxtral's advanced speech intelligence

Meeting Intelligence

Transcribe, summarize, and extract action items from business meetings in real-time

Customer Support Analytics

Analyze calls, identify sentiment, and generate coaching insights automatically

Real-Time Translation

Build seamless communication across language barriers with native multilingual support

Voice Assistants

Create intelligent agents that understand context and execute complex workflows

Medical Transcription

Secure on-premise deployment for HIPAA-compliant medical documentation

Educational Tools

Transcribe lectures, generate study notes, and create accessible content

FAQ

Frequently Asked Questions

Everything you need to know about Voxtral

Ready to Get Started?

Transform Your Applications with Voxtral

Join the open-source speech AI revolution