Production-grade voice AI.Priced like a startup.Open like a community.
Ship lifelike speech, voice cloning, and transcription with one API. Official Python and TypeScript SDKs. Sub-second latency. Pay-as-you-go pricing from your first call.

S2 Pro running live. Pick a voice, type a line, hear it back. The same model behind HeyGen, Retell, and Sanas in production — no signup, no sales call, no demo environment.
# The same call. The (direction) tags travel with the text.
curl https://api.fish.audio/v1/tts \
-H "Authorization: Bearer $FISH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "[chuckle] When you’re creating something new, there’s this [emphasis] beautiful mix of wonder and fear.",
"reference_id": "933563129e564b19a115bedd57b7406a",
"format": "mp3"
}' --output speech.mp3From signup to first audio in 5 minutes.
No sales call required. Get an API key, install the SDK, and ship.
# Text to speech in one callcurl -X POST \ https://api.fish.audio/v1/tts \-H "Authorization: Bearer $FISH_API_KEY" \-H "Content-Type: application/json" \-H "model: s1" \-d '{"text": "Hello! Welcome to Fish Audio."}' \--output welcome.mp3
# Text to speech with the Python SDKfrom fish_audio_sdk import Session, TTSRequestsession = Session("YOUR_API_KEY")request = TTSRequest(text="Hello! Welcome to Fish Audio.")with open("welcome.mp3", "wb") as f:for chunk in session.tts(request):f.write(chunk)
What teams ship on Fish.
Voice that holds up on camera
# Avatar videoLip-syncable, emotion-aware TTS for AI avatar products. Inline direction tags drive performance, not just words.
Realtime conversational AI
# Voice agentSub-second turn-taking over WebSocket. Streaming TTS and ASR in one stack. Interruption-aware.
Dynamic spoken content.
# Audio content & companionsNotes-to-audio, prep tools, AI companions. Per-character pricing that scales with usage, not seats.
Clone in 30 seconds. Or skip cloning entirely.
# Character APPsIVC from 30 seconds of audio. PVC for studio-grade replicas. Or browse the voice library and ship without cloning.
Built for the realtime stack.
Open weights. Paid commercial license.
Our open-source models — fish-speech, S1, and S2 — ship as open weights with a paid commercial license. Self-host in your VPC, on-prem, sovereign cloud, or air-gapped environment when production demands it. Self-host is an Enterprise-tier engagement — see below.
15,000+ direction tags. Inline in any call.
[warm], [near-whisper], [reassuring] — direction travels with the text itself. No separate parameter, no list to pick from, no schema migration when the tag set grows.
Audio Turing Test: 0.515.
Listeners can't reliably distinguish S2 Pro from human in blind evaluation. 581 head-to-head comparisons. Methodology and raw audio published.
$15 per million characters. From your first call.
Same model behind HeyGen, Pictoria, Dubbing AI, and Plaud. Pay-as-you-go from your first call. No "contact us" for production rates.
Use our API. Or self-host the model
Cloud API for any team building today. Self-host as a premium Enterprise engagement when production demands it.
Hosted API · Any team
Cloud API, pay-as-you-go, $15 per million characters. The fastest path to production for teams that don't need to operate the model themselves.
- WebSocket streaming, REST, Python + TypeScript SDKs
- $15 / 1M UTF-8 bytes — no commit
- Inline direction syntax in every call
- Same model that ships open-weight
Self-host the model.
Our open-source models — fish-speech, S1, S2 — ship as open weights with a paid commercial license. Deploy to your VPC, data center, sovereign cloud, or air-gapped environment. A premium engagement for high-volume teams needing data residency, fine-tuning, or regulated deployment.
- WebSocket streaming, REST, Python + TypeScript SDKs
- $10k/month
- Effective floor: $120–150K/year
- Direct access to our research team
Pricing thatdoesn't punish growth
Pay-as-you-go from day one. No seat fees. No annual commits. No "contact us" for production rates.
See full pricingFrequently Asked Questions
Coming from ElevenLabs, Cartesia, or Rime?
Head-to-head breakdowns by capability, price, and contract terms. Same-shape API; most production migrations finish in under a week.
The benchmarks, methodology, and raw audio
Audio Turing Test results, blind-evaluation methodology, and the open-weights license. The proofs behind every claim on this page.
Get to production this weekend
Free credits to start. No card required. Same tier from prototype to scale.