Name: speaker-diarize
Price: 0.10 USDC
Availability: InStock

$ man speaker-diarize

agentutility / mediakit / speaker-diarize

PRICE / CALL

$0.10

USDC · base mainnet · scheme: exact

METHOD

POST

CLUSTER

mediakit

CATEGORY

STATUS

● live

NAME

speaker-diarize — speaker diarization / who-said-what transcription

SYNOPSIS

POST https://x402.agentutility.ai/speaker-diarize
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }

↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.

DESCRIPTION

Speaker diarization / who-said-what transcription. Whisper v3 + speaker labels. Returns utterances grouped by speaker, plus per-speaker stats (count, seconds, words). 60 min max.

INPUT — request schema

property	type	description	req?
media_url	string	—	required
language	string	—	optional
num_speakers	number	—	optional

OUTPUT — response shape

field	type	description
text	string	Full transcript as a single string with all speaker turns concatenated in chronological order.
utterances	array	Array of speaker turns, each with speaker label, start/end timestamps, and the spoken text.
speaker_count	number	Number of distinct speakers detected in the audio.
speaker_stats	array	Per-speaker rollup with speaker label, utterance count, total seconds spoken, and word count.
duration_seconds	number	Total length of the input media in seconds.
detected_languages	array	Array of ISO-639-1 language codes Whisper detected across the audio.
source_url	string	Echo of the media_url that was transcribed, for request/response correlation.

EXAMPLES — two ways to call

EXAMPLE 1 · curl

curl -X POST https://x402.agentutility.ai/speaker-diarize \
  -H 'Content-Type: application/json' \
  -d '{ }'

first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.

EXAMPLE 2 · mcp

# Install the MCP package for this endpoint's cluster
npx -y @agentutility/mcp-<cluster>

# Required: EVM private key with USDC on Base
export X402_PRIVATE_KEY=0x...

# Then call the speaker-diarize tool from your MCP-aware agent.

MCP server handles payment automatically — your coding agent just calls the tool by name.

METADATA

tags: transcriptiondiarizationspeakerswhisperpodcastmeeting
env: FAL_KEY_TRANSCRIBE
methods: POST
cluster: mediakit
price: $0.10 USDC per call

ADJACENT — other endpoints in mediakit

endpoint	description	price
doc-to-json	Converts any document (PDF, DOCX, PPT, XLSX, or image) into structured JSON matching a caller-supplied schema.	$0.10
extract-tables	Detects and extracts every table from a PDF document, returning structured JSON or CSV per table.	$0.10
pdf-extract-tables	Extracts every table from a PDF, digital or scanned, and returns row-by-column text matrices page-by-page.	$0.10
pdf-table-extract	Extracts tables from digital or scanned PDFs, returning row/column matrices, CSV output, page numbers, and optional cell boxes.	$0.10
pdf-table-extractor	Finds tables in digital or scanned PDFs and returns row-by-column matrices, page numbers, and optional cell bounding boxes.	$0.10
pdf-to-jpg	Converts a PDF to JPG, PNG, or WEBP images, rendering every page at configurable DPI (36-600) and returning one image URL per page.	$0.10
transcribe	Transcribe video to text.	$0.10
video-summarize	Summarizes videos, podcasts, and lectures in one call: Whisper v3 transcribes, then Mistral summarizes.	$0.10