$ man text-to-speech
/text-to-speech
PRICE / CALL
$0.05
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
synthforgeCATEGORY
ai
STATUS
● live
NAME
text-to-speech — converts text to speech with 30+ voices and 5 audio formats
SYNOPSIS
POST https://x402.agentutility.ai/text-to-speech
Content-Type: application/json
X-PAYMENT: <signed-transferWithAuthorization>
{ ... }↳ first call →
402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.DESCRIPTION
Converts text to speech with 30+ voices and 5 audio formats. Morpheus primary for Kokoro, Venice fallback and alternate TTS models (xAI / ElevenLabs / Orpheus / MiniMax / Gemini), with fal.ai storage for hosted audio URLs. Use it as a TTS API or voice generator.
INPUT — request schema
| property | type | description | req? |
|---|---|---|---|
| text | string | — | required |
| voice | string | — | optional |
| model | string | — | optional |
| speed | number | — | optional |
| format | string | — enum: mp3 · wav · opus · aac · flac | optional |
OUTPUT — response shape
| field | type | description |
|---|---|---|
| audio_url | string | Hosted MP3 URL pointing to the generated speech audio file. |
| file_size_bytes | number | Size of the generated MP3 file in bytes. |
| content_type | string | MIME type of the audio file, typically audio/mpeg for MP3 output. |
| format | string | Audio container format returned, one of 6 supported formats (mp3, opus, aac, flac, wav, pcm). |
| voice | string | Voice identifier used for synthesis, drawn from the 30+ available voices. |
| model | string | TTS model that produced the audio (Kokoro, xAI, ElevenLabs, Orpheus, MiniMax, or Gemini). |
| speed | number | Playback speed multiplier applied during synthesis, where 1.0 is normal pace. |
| input_chars | number | Character count of the input text that was synthesized into speech. |
EXAMPLES — two ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.agentutility.ai/text-to-speech \
-H 'Content-Type: application/json' \
-d '{ }'first response =
402 Payment Required with payment requirements; sign + retry with X-PAYMENT.EXAMPLE 2 · mcp
# Install the MCP package for this endpoint's cluster npx -y @agentutility/mcp-<cluster> # Required: EVM private key with USDC on Base export X402_PRIVATE_KEY=0x... # Then call the text-to-speech tool from your MCP-aware agent.
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
- tags
- ttsspeechaudiovoiceai
- env
- VENICE_API_KEY · FAL_KEY
- methods
- POST
- cluster
- synthforge
- price
- $0.05 USDC per call
ADJACENT — other endpoints in synthforge
| endpoint | description | price |
|---|---|---|
| music-generate | Generates music from a text prompt via Venice using the minimax-music-v26 model. | $0.05 |
| voice | Converts text to speech with 30+ voices and MP3/WAV/OPUS/AAC/FLAC output. | $0.05 |
| image-generate-pro | Premium text-to-image generation across margin-safe Venice models at a competitive $0.06/call. | $0.06 |
| recraft | Generates SFW design and illustration images with Venice's recraft-v4 model on a dedicated endpoint. | $0.06 |
| seedream | Generates SFW images with Venice's seedream-v4 model on a dedicated endpoint. | $0.06 |
| flux-2-pro | Generates SFW images with Venice's flux-2-pro model on a dedicated endpoint. | $0.04 |
| qwen-image | Generates SFW images with Venice's qwen-image model on a dedicated endpoint. | $0.04 |
| background-remove | Removes the background from a public image URL and returns the subject with alpha transparency. | $0.08 |
SEE ALSO