Skip to content
clusters: prooflayer · edgemarket · edgefinance · synthforge · mediakit · wordmint · webprobe · locale · comppoint · rollforge · bestiary · statline · matchpoint · retail · agentops · browserworkflow · modelrouter · compose
$ man audio-transcribe

/audio-transcribe

agentutility / mediakit / audio-transcribe
PRICE / CALL
$0.01
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
mediakit
CATEGORY
uncategorized
STATUS
live
NAME
audio-transcribe transcribes audio to text with whisper-large-v3
SYNOPSIS
POST https://x402.agentutility.ai/audio-transcribe
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }
↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.
DESCRIPTION

Transcribes audio to text with whisper-large-v3. Server-side fetches the audio URL (max 25 MB), relays it to Venice's audio/transcriptions endpoint, and returns the transcript with detected language, duration, and per-segment timestamps when response_format='verbose_json' (default). Also supports raw text, SRT, and VTT outputs. Use it as a speech-to-text or multi-language ASR endpoint with OpenAI Whisper API compatibility.

INPUTrequest schema
propertytypedescriptionreq?
audio_urlstringPublic http(s) URL of the audio file (mp3, wav, m4a, ogg, flac, webm). Up to 25 MB.required
languagestringBCP-47 language hint (e.g. 'en', 'es'). 'auto' or omitted = auto-detect.optional
modelstringOverride the model. Default 'openai/whisper-large-v3'.optional
response_formatstringOutput format. Default 'verbose_json' (transcript + segments).
enum: json · text · verbose_json · srt · vtt
optional
OUTPUTresponse shape
fieldtypedescription
transcriptstringFull transcribed text of the audio, concatenated across all detected speech segments.
language_detectedstringISO 639-1 code of the language Whisper auto-detected in the audio (e.g. 'en', 'es', 'fr').
duration_secondsstringLength of the source audio in seconds, as reported by Whisper after decoding.
segmentsstringArray of per-segment objects with start/end timestamps and text, present when response_format is verbose_json.
response_formatstringOutput format used: verbose_json (default), json, text, srt, or vtt.
modelstringWhisper model used for transcription, fixed to 'whisper-large-v3' via Venice's audio/transcriptions endpoint.
bytes_instringSize in bytes of the audio file fetched from the source URL before relay to Whisper.
sourcestringOriginal audio URL the server fetched and transcribed (echoed back from the request).
EXAMPLEStwo ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.agentutility.ai/audio-transcribe \
  -H 'Content-Type: application/json' \
  -d '{ }'
first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.
EXAMPLE 2 · mcp
# Install the MCP package for this endpoint's cluster
npx -y @agentutility/mcp-<cluster>

# Required: EVM private key with USDC on Base
export X402_PRIVATE_KEY=0x...

# Then call the audio-transcribe tool from your MCP-aware agent.
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
tags
mediakitaudiotranscriptionspeech-to-textasrwhispersubtitleswhisper-large-v3
methods
POST
cluster
mediakit
price
$0.01 USDC per call
ADJACENTother endpoints in mediakit
endpointdescriptionprice
csv-to-icsConverts a CSV of events into an RFC 5545 compliant ICS calendar file (VCALENDAR/VEVENT) for Google Calendar, Outlook, and Apple Calendar…$0.01
image-convertUniversal image format converter (PNG, JPG, WEBP, AVIF, GIF, BMP, TIFF, ICO, HEIC, HEIF, PSD, SVG).$0.01
image-format-convertImage converter.$0.01
merge-pdfCombines 2-50 input PDFs from URLs into one PDF, preserving bookmarks.$0.01
movie-databaseFinds movies or TV shows by title, with optional year and region, and returns release year, poster, overview, and language.$0.01
movie-database-apiSearches movies and TV shows by title and optional year, returning release date, rating, popularity, overview, poster URLs, TMDB links, a…$0.01
movie-infoLooks up movie and TV metadata: title, release year, rating, overview, poster, and optional streaming providers.$0.01
pdf-mergeMerges 2-50 PDFs from URLs into a single PDF, preserving bookmarks.$0.01
SEE ALSO
agentutility · mediakit · x402 · mcp · llms.txt · registry.json · bazaar.x402.org