Voxtral TTS
Powered by Mistral AI — 32+ languages supported

Mistral Text‑to‑Speech.
In your hands.

Voxtral TTS transforms text into natural, expressive speech with voice cloning, emotion control, and broadcast-quality audio. The best Mistral TTS alternative to ElevenLabs and Kokoro TTS.

32+
Languages
<200ms
Realtime Latency
5s
Voice Clone
44.1kHz
Max Quality

Mistral Voxtral TTS Features — What It Can Do for You

Everything you need for production-grade Mistral text to speech, in one API. A powerful alternative to ElevenLabs, Kokoro TTS, and Ollama voice generation.

Zero-Shot Voice Cloning

Clone any voice from just 5 seconds of reference audio — like Mistral Voxtral TTS voice cloning. No training, no fine-tuning, instant replication.

Emotion Control

Choose from 7 emotions — happy, calm, sad, angry, fearful, disgusted, surprised. More expressive than Kokoro TTS or ElevenLabs.

32+ Languages — Multilingual TTS

Native support for English, Chinese, Japanese, Korean, Spanish, French, German, Arabic. Wider language coverage than Voxtral Mini.

Voxtral Realtime — Ultra-Low Latency

Under 200ms median latency with streaming support. Voxtral realtime voice synthesis for live agents and applications.

Natural Interjections

Add (laughs), (sighs), (coughs), and 20+ human sounds that render naturally — a feature missing in Ollama and other local TTS tools.

Broadcast Quality Audio

Studio-grade output up to 44.1kHz. Ranked #1 on Artificial Analysis and Hugging Face TTS Arena, outperforming Kokoro.

Fine-Grained Control

Adjust speed (0.5x–2x), pitch (-12 to +12), volume, custom pauses. Compatible with vLLM Omni and vLLM serving pipelines.

Production Ready — Mistral AI Powered

Enterprise-grade Mistral AI TTS API with high throughput. Deploy via Hugging Face, vLLM, or our managed cloud.

Mistral TTS Playground — Try Voxtral Text to Speech

Type or paste text, pick a voice or clone your own, and hear Mistral text to speech come to life. Free to use — no API key required.

152 / 10,000

Voxtral TTS Pricing — Mistral Text to Speech Plans

Start free. Scale as you grow. More affordable than ElevenLabs with better quality than Kokoro TTS.

Free

$0forever

Get started with text-to-speech for personal projects.

  • 10,000 characters/month
  • 5 preset voices
  • 3 languages
  • MP3 output
  • Community support
Start Free
Most Popular

Pro

$29/month

For developers and creators who need more power.

  • 500,000 characters/month
  • All preset voices
  • 32+ languages
  • Voice cloning (5 voices)
  • Emotion control
  • Streaming API
  • All audio formats
  • Priority support
Get Started

Enterprise

Custom

Unlimited scale with dedicated infrastructure.

  • Unlimited characters
  • Unlimited voice clones
  • Custom fine-tuning
  • On-premise deployment
  • SLA guarantee
  • Dedicated account manager
  • SSO & audit logs
  • Custom integrations
Contact Sales

Mistral Voxtral TTS API — Quick Start in Minutes

Three lines of code to generate your first speech with the Voxtral TTS API. Works with vLLM, vLLM Omni, or our hosted endpoint.

generate_speech.py
import requests, base64

response = requests.post(
    "https://voxtralttsai.com/api/tts",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "text": "Hello! Welcome to Voxtral TTS.",
        "voice": "casual_male",
        "emotion": "happy",
        "format": "mp3"
    }
)

audio = base64.b64decode(response.json()["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio)

REST API

Simple HTTP endpoints with JSON payloads. Compatible with Mistral AI ecosystem.

Voxtral Realtime Streaming

WebSocket & SSE for real-time audio delivery. Under 200ms latency.

SDKs & vLLM Omni

Python, TypeScript, and cURL examples. Deploy with vLLM or Hugging Face.

Deploy Voxtral TTS with Hugging Face & vLLM

Self-host Mistral Voxtral TTS on your own infrastructure using Hugging Face open weights and vLLM Omni. The model runs on a single GPU with 16GB+ VRAM — no Ollama required. Alternatively, use our managed API for zero-setup deployment, or compare with Kokoro TTS and ElevenLabs on the playground above.

The next chapter of Mistral voice AI
is yours.

Start building with Voxtral TTS today. Free tier available with full Mistral text to speech capabilities — no credit card required. Outperforms Kokoro TTS, Ollama, and Gemini 3.1 Flash Live in voice quality benchmarks.