Limited Time Offer- 50% OFF YEARLYRedeem

Production-grade voice AI.Priced like a startup.Open like a community.

Ship lifelike speech, voice cloning, and transcription with one API. Official Python and TypeScript SDKs. Sub-second latency. Pay-as-you-go pricing from your first call.

S2 Pro running live. Pick a voice, type a line, hear it back. The same model behind HeyGen, Retell, and Sanas in production — no signup, no sales call, no demo environment.

Trusted by teams building voice in production

Voice Agents & Conversational AI
Video Voiceover, Dubbing & Music
Interactive & Social
Education & Learning

From signup to first audio in 5 minutes.

No sales call required. Get an API key, install the SDK, and ship.

CURL · TEXT TO SPEECH
# Text to speech in one call
curl -X POST \ https://api.fish.audio/v1/tts \
-H "Authorization: Bearer $FISH_API_KEY" \
-H "Content-Type: application/json" \
-H "model: s1" \
-d '{"text": "Hello! Welcome to Fish Audio."}' \
--output welcome.mp3
PYTHON SDK
# Text to speech with the Python SDK
from fish_audio_sdk import Session, TTSRequest
 
session = Session("YOUR_API_KEY")
request = TTSRequest(text="Hello! Welcome to Fish Audio.")
with open("welcome.mp3", "wb") as f:
for chunk in session.tts(request):
f.write(chunk)

What teams ship on Fish.

Voice that holds up on camera

# Avatar video

Lip-syncable, emotion-aware TTS for AI avatar products. Inline direction tags drive performance, not just words.

HeyGen
VIGGLE
Pictoria

Realtime conversational AI

# Voice agent

Sub-second turn-taking over WebSocket. Streaming TTS and ASR in one stack. Interruption-aware.

Retell
Sanas
Dubbing AI

Dynamic spoken content.

# Audio content & companions

Notes-to-audio, prep tools, AI companions. Per-character pricing that scales with usage, not seats.

PLAUD
Final Round AI

Clone in 30 seconds. Or skip cloning entirely.

# Character APPs

IVC from 30 seconds of audio. PVC for studio-grade replicas. Or browse the voice library and ship without cloning.

OpenArt

Use our API. Or self-host the model

Cloud API for any team building today. Self-host as a premium Enterprise engagement when production demands it.

Hosted API · Any team

Cloud API, pay-as-you-go, $15 per million characters. The fastest path to production for teams that don't need to operate the model themselves.

  • WebSocket streaming, REST, Python + TypeScript SDKs
  • $15 / 1M UTF-8 bytes — no commit
  • Inline direction syntax in every call
  • Same model that ships open-weight

Self-host the model.

Our open-source models — fish-speech, S1, S2 — ship as open weights with a paid commercial license. Deploy to your VPC, data center, sovereign cloud, or air-gapped environment. A premium engagement for high-volume teams needing data residency, fine-tuning, or regulated deployment.

  • WebSocket streaming, REST, Python + TypeScript SDKs
  • $10k/month
  • Effective floor: $120–150K/year
  • Direct access to our research team

Pricing thatdoesn't punish growth

Pay-as-you-go from day one. No seat fees. No annual commits. No "contact us" for production rates.

See full pricing
Model
TTS
TTS
ASR
Model name
S2 Pro
S1
Transcribe-1
Pricing
$15 / 1M UTF-8 bytes
$15 / 1M UTF-8 bytes
$0.36 / hour

Frequently Asked Questions

Coming from ElevenLabs, Cartesia, or Rime?

Head-to-head breakdowns by capability, price, and contract terms. Same-shape API; most production migrations finish in under a week.

See the comparison

The benchmarks, methodology, and raw audio

Audio Turing Test results, blind-evaluation methodology, and the open-weights license. The proofs behind every claim on this page.

Read the research
Fish Audio

Get to production this weekend

Free credits to start. No card required. Same tier from prototype to scale.