$ man video-to-text
/video-to-text
NAME
video-to-text — transcribe any video url to text with whisper v3 large — audio is extracted internally
SYNOPSIS
POST https://x402.agentutility.ai/video-to-text
Content-Type: application/json
X-PAYMENT: <signed-transferWithAuthorization>
{ ... }↳ first call →
402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.DESCRIPTION
Transcribe any video URL to text with Whisper v3 large — audio is extracted internally. Auto-detects 90+ languages, offers a translate-to-English mode and optional speaker diarization, and handles files up to 60 minutes / 500MB. Covers video transcription, video speech-to-text, and video ASR in one call.
INPUT — request schema
| property | type | description | req? |
|---|---|---|---|
| media_url | string | — | required |
| language | string | — | optional |
| task | string | — enum: transcribe · translate | optional |
| diarize | boolean | — | optional |
OUTPUT — response shape
| field | type | description |
|---|---|---|
| text | string | Full transcript text |
| chunks | array | Time-segmented chunks with timestamps |
| detected_languages | array | Languages auto-detected in the audio |
| duration_seconds | number | Source media duration in seconds |
| task | string | Echo of the task performed |
| source_url | string | Echo of the input URL |
EXAMPLES — two ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.agentutility.ai/video-to-text \
-H 'Content-Type: application/json' \
-d '{ }'first response =
402 Payment Required with payment requirements; sign + retry with X-PAYMENT.EXAMPLE 2 · mcp
# Install the MCP package for this endpoint's cluster npx -y @agentutility/mcp-<cluster> # Required: EVM private key with USDC on Base export X402_PRIVATE_KEY=0x... # Then call the video-to-text tool from your MCP-aware agent.
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
- tags
- transcriptionwhispervideoaudiosubtitles
- methods
- POST
- cluster
- mediakit
- price
- $0.10 USDC per call
ADJACENT — other endpoints in mediakit
| endpoint | description | price |
|---|---|---|
| doc-to-json | Converts any document (PDF, DOCX, PPT, XLSX, or image) into structured JSON matching a caller-supplied schema. | $0.10 |
| extract-tables | Detects and extracts every table from a PDF document, returning structured JSON or CSV per table. | $0.10 |
| pdf-extract-tables | Extracts every table from a PDF, digital or scanned, and returns row-by-column text matrices page-by-page. | $0.10 |
| pdf-table-extract | Extracts tables from digital or scanned PDFs, returning row/column matrices, CSV output, page numbers, and optional cell boxes. | $0.10 |
| pdf-table-extractor | Finds tables in digital or scanned PDFs and returns row-by-column matrices, page numbers, and optional cell bounding boxes. | $0.10 |
| pdf-to-jpg | Converts a PDF to JPG, PNG, or WEBP images, rendering every page at configurable DPI (36-600) and returning one image URL per page. | $0.10 |
| speaker-diarize | Speaker diarization / who-said-what transcription. | $0.10 |
| transcribe | Transcribe video to text. | $0.10 |
SEE ALSO