$ man audio-transcribe
/audio-transcribe
PRICE / CALL
$0.01
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
mediakitCATEGORY
uncategorized
STATUS
● live
NAME
audio-transcribe — transcribes audio to text with whisper-large-v3
SYNOPSIS
POST https://x402.agentutility.ai/audio-transcribe
Content-Type: application/json
X-PAYMENT: <signed-transferWithAuthorization>
{ ... }↳ first call →
402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.DESCRIPTION
Transcribes audio to text with whisper-large-v3. Server-side fetches the audio URL (max 25 MB), relays it to Venice's audio/transcriptions endpoint, and returns the transcript with detected language, duration, and per-segment timestamps when response_format='verbose_json' (default). Also supports raw text, SRT, and VTT outputs. Use it as a speech-to-text or multi-language ASR endpoint with OpenAI Whisper API compatibility.
INPUT — request schema
| property | type | description | req? |
|---|---|---|---|
| audio_url | string | Public http(s) URL of the audio file (mp3, wav, m4a, ogg, flac, webm). Up to 25 MB. | required |
| language | string | BCP-47 language hint (e.g. 'en', 'es'). 'auto' or omitted = auto-detect. | optional |
| model | string | Override the model. Default 'openai/whisper-large-v3'. | optional |
| response_format | string | Output format. Default 'verbose_json' (transcript + segments). enum: json · text · verbose_json · srt · vtt | optional |
OUTPUT — response shape
| field | type | description |
|---|---|---|
| transcript | string | Full transcribed text of the audio, concatenated across all detected speech segments. |
| language_detected | string | ISO 639-1 code of the language Whisper auto-detected in the audio (e.g. 'en', 'es', 'fr'). |
| duration_seconds | string | Length of the source audio in seconds, as reported by Whisper after decoding. |
| segments | string | Array of per-segment objects with start/end timestamps and text, present when response_format is verbose_json. |
| response_format | string | Output format used: verbose_json (default), json, text, srt, or vtt. |
| model | string | Whisper model used for transcription, fixed to 'whisper-large-v3' via Venice's audio/transcriptions endpoint. |
| bytes_in | string | Size in bytes of the audio file fetched from the source URL before relay to Whisper. |
| source | string | Original audio URL the server fetched and transcribed (echoed back from the request). |
EXAMPLES — two ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.agentutility.ai/audio-transcribe \
-H 'Content-Type: application/json' \
-d '{ }'first response =
402 Payment Required with payment requirements; sign + retry with X-PAYMENT.EXAMPLE 2 · mcp
# Install the MCP package for this endpoint's cluster npx -y @agentutility/mcp-<cluster> # Required: EVM private key with USDC on Base export X402_PRIVATE_KEY=0x... # Then call the audio-transcribe tool from your MCP-aware agent.
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
- tags
- mediakitaudiotranscriptionspeech-to-textasrwhispersubtitleswhisper-large-v3
- methods
- POST
- cluster
- mediakit
- price
- $0.01 USDC per call
ADJACENT — other endpoints in mediakit
| endpoint | description | price |
|---|---|---|
| csv-to-ics | Converts a CSV of events into an RFC 5545 compliant ICS calendar file (VCALENDAR/VEVENT) for Google Calendar, Outlook, and Apple Calendar… | $0.01 |
| image-convert | Universal image format converter (PNG, JPG, WEBP, AVIF, GIF, BMP, TIFF, ICO, HEIC, HEIF, PSD, SVG). | $0.01 |
| image-format-convert | Image converter. | $0.01 |
| merge-pdf | Combines 2-50 input PDFs from URLs into one PDF, preserving bookmarks. | $0.01 |
| movie-database | Finds movies or TV shows by title, with optional year and region, and returns release year, poster, overview, and language. | $0.01 |
| movie-database-api | Searches movies and TV shows by title and optional year, returning release date, rating, popularity, overview, poster URLs, TMDB links, a… | $0.01 |
| movie-info | Looks up movie and TV metadata: title, release year, rating, overview, poster, and optional streaming providers. | $0.01 |
| pdf-merge | Merges 2-50 PDFs from URLs into a single PDF, preserving bookmarks. | $0.01 |
SEE ALSO