Production-grade voice AI.Priced like a startup.Open like a community.

Ship lifelike speech, voice cloning, and transcription with one API. Official Python and TypeScript SDKs. Sub-second latency. Pay-as-you-go pricing from your first call.

Get API key Read the docs

S2 Pro running live. Pick a voice, type a line, hear it back. The same model behind HeyGen, Retell, and Sanas in production — no signup, no sales call, no demo environment.

# The same call. The (direction) tags travel with the text.
curl https://api.fish.audio/v1/tts \
  -H "Authorization: Bearer $FISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "[chuckle] When you’re creating something new, there’s this [emphasis] beautiful mix of wonder and fear.",
    "reference_id": "933563129e564b19a115bedd57b7406a",
    "format": "mp3"
  }' --output speech.mp3

Trusted by teams building voice in production

Voice Agents & Conversational AI

Video Voiceover, Dubbing & Music

Interactive & Social

Education & Learning

From signup to first audio in 5 minutes.

No sales call required. Get an API key, install the SDK, and ship.

CURL · TEXT TO SPEECH

# Text to speech in one call
curl -X POST \ https://api.fish.audio/v1/tts \
-H "Authorization: Bearer $FISH_API_KEY" \
-H "Content-Type: application/json" \
-H "model: s1" \
-d '{"text": "Hello! Welcome to Fish Audio."}' \
--output welcome.mp3

PYTHON SDK

# Text to speech with the Python SDK
from fish_audio_sdk import Session, TTSRequest
 
session = Session("YOUR_API_KEY")
request = TTSRequest(text="Hello! Welcome to Fish Audio.")
with open("welcome.mp3", "wb") as f:
    for chunk in session.tts(request):
        f.write(chunk)

What teams ship on Fish.

Voice that holds up on camera

# Avatar video

Lip-syncable, emotion-aware TTS for AI avatar products. Inline direction tags drive performance, not just words.

Realtime conversational AI

# Voice agent

Sub-second turn-taking over WebSocket. Streaming TTS and ASR in one stack. Interruption-aware.

Dynamic spoken content.

# Audio content & companions

Notes-to-audio, prep tools, AI companions. Per-character pricing that scales with usage, not seats.

Clone in 30 seconds. Or skip cloning entirely.

# Character APPs

IVC from 30 seconds of audio. PVC for studio-grade replicas. Or browse the voice library and ship without cloning.

Built for the realtime stack.

Open weights. Paid commercial license.

Our open-source models — fish-speech, S1, and S2 — ship as open weights with a paid commercial license. Self-host in your VPC, on-prem, sovereign cloud, or air-gapped environment when production demands it. Self-host is an Enterprise-tier engagement — see below.

Read the licensing terms

15,000+ direction tags. Inline in any call.

[warm], [near-whisper], [reassuring] — direction travels with the text itself. No separate parameter, no list to pick from, no schema migration when the tag set grows.

Browse the direction library

Audio Turing Test: 0.515.

Listeners can't reliably distinguish S2 Pro from human in blind evaluation. 581 head-to-head comparisons. Methodology and raw audio published.

Read the research

$15 per million characters. From your first call.

Same model behind HeyGen, Pictoria, Dubbing AI, and Plaud. Pay-as-you-go from your first call. No "contact us" for production rates.

See full pricing

Use our API. Or self-host the model

Cloud API for any team building today. Self-host as a premium Enterprise engagement when production demands it.

Hosted API · Any team

Cloud API, pay-as-you-go, $15 per million characters. The fastest path to production for teams that don't need to operate the model themselves.

WebSocket streaming, REST, Python + TypeScript SDKs
$15 / 1M UTF-8 bytes — no commit
Inline direction syntax in every call
Same model that ships open-weight

Get an API key

Self-host the model.

Our open-source models — fish-speech, S1, S2 — ship as open weights with a paid commercial license. Deploy to your VPC, data center, sovereign cloud, or air-gapped environment. A premium engagement for high-volume teams needing data residency, fine-tuning, or regulated deployment.