Skip to content
clusters: prooflayer · edgemarket · edgefinance · synthforge · mediakit · wordmint · webprobe · locale · comppoint · rollforge · bestiary · statline · matchpoint · retail · agentops · browserworkflow · modelrouter · compose
$ man text-to-speech

/text-to-speech

agentutility / synthforge / text-to-speech
PRICE / CALL
$0.05
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
synthforge
CATEGORY
ai
STATUS
live
NAME
text-to-speech converts text to speech with 30+ voices and 5 audio formats
SYNOPSIS
POST https://x402.agentutility.ai/text-to-speech
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }
↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.
DESCRIPTION

Converts text to speech with 30+ voices and 5 audio formats. Morpheus primary for Kokoro, Venice fallback and alternate TTS models (xAI / ElevenLabs / Orpheus / MiniMax / Gemini), with fal.ai storage for hosted audio URLs. Use it as a TTS API or voice generator.

INPUTrequest schema
propertytypedescriptionreq?
textstringrequired
voicestringoptional
modelstringoptional
speednumberoptional
formatstring
enum: mp3 · wav · opus · aac · flac
optional
OUTPUTresponse shape
fieldtypedescription
audio_urlstringHosted MP3 URL pointing to the generated speech audio file.
file_size_bytesnumberSize of the generated MP3 file in bytes.
content_typestringMIME type of the audio file, typically audio/mpeg for MP3 output.
formatstringAudio container format returned, one of 6 supported formats (mp3, opus, aac, flac, wav, pcm).
voicestringVoice identifier used for synthesis, drawn from the 30+ available voices.
modelstringTTS model that produced the audio (Kokoro, xAI, ElevenLabs, Orpheus, MiniMax, or Gemini).
speednumberPlayback speed multiplier applied during synthesis, where 1.0 is normal pace.
input_charsnumberCharacter count of the input text that was synthesized into speech.
EXAMPLEStwo ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.agentutility.ai/text-to-speech \
  -H 'Content-Type: application/json' \
  -d '{ }'
first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.
EXAMPLE 2 · mcp
# Install the MCP package for this endpoint's cluster
npx -y @agentutility/mcp-<cluster>

# Required: EVM private key with USDC on Base
export X402_PRIVATE_KEY=0x...

# Then call the text-to-speech tool from your MCP-aware agent.
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
tags
ttsspeechaudiovoiceai
env
VENICE_API_KEY · FAL_KEY
methods
POST
cluster
synthforge
price
$0.05 USDC per call
ADJACENTother endpoints in synthforge
endpointdescriptionprice
music-generateGenerates music from a text prompt via Venice using the minimax-music-v26 model.$0.05
voiceConverts text to speech with 30+ voices and MP3/WAV/OPUS/AAC/FLAC output.$0.05
image-generate-proPremium text-to-image generation across margin-safe Venice models at a competitive $0.06/call.$0.06
recraftGenerates SFW design and illustration images with Venice's recraft-v4 model on a dedicated endpoint.$0.06
seedreamGenerates SFW images with Venice's seedream-v4 model on a dedicated endpoint.$0.06
flux-2-proGenerates SFW images with Venice's flux-2-pro model on a dedicated endpoint.$0.04
qwen-imageGenerates SFW images with Venice's qwen-image model on a dedicated endpoint.$0.04
background-removeRemoves the background from a public image URL and returns the subject with alpha transparency.$0.08
SEE ALSO
agentutility · synthforge · x402 · mcp · llms.txt · registry.json · bazaar.x402.org