$ man describe-image
/describe-image
NAME
describe-image — describes images with a vision llm across five modes: describe, alt_text (accessibility, <=125 chars), ocr (extract visible text), tags (…
SYNOPSIS
POST https://x402.agentutility.ai/describe-image
Content-Type: application/json
X-PAYMENT: <signed-transferWithAuthorization>
{ ... }↳ first call →
402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.DESCRIPTION
Describes images with a vision LLM across five modes: describe, alt_text (accessibility, <=125 chars), OCR (extract visible text), tags (8-15 keywords), and caption (single-sentence). Use it as an AI image descriptor or describe-image endpoint.
INPUT — request schema
| property | type | description | req? |
|---|---|---|---|
| image_url | string | — | required |
| mode | string | — enum: describe · alt_text · ocr · tags · caption | optional |
| prompt | string | — | optional |
OUTPUT — response shape
| field | type | description |
|---|---|---|
| text | string | Generated output for the selected mode: prose description, alt text, extracted OCR text, keyword list, or caption. |
| mode | string | Mode used to generate the output: describe, alt_text, ocr, tags, or caption. |
| image_url | string | URL of the source image that was analyzed by the vision LLM. |
| model | string | Vision LLM model name that produced the description (e.g. claude-haiku-4-5). |
EXAMPLES — two ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.agentutility.ai/describe-image \
-H 'Content-Type: application/json' \
-d '{ }'first response =
402 Payment Required with payment requirements; sign + retry with X-PAYMENT.EXAMPLE 2 · mcp
# Install the MCP package for this endpoint's cluster npx -y @agentutility/mcp-<cluster> # Required: EVM private key with USDC on Base export X402_PRIVATE_KEY=0x... # Then call the describe-image tool from your MCP-aware agent.
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
- tags
- imagevisionocralt-textcaptionaillm
- env
- VENICE_API_KEY
- methods
- POST
- cluster
- wordmint
- price
- $0.02 USDC per call
ADJACENT — other endpoints in wordmint
| endpoint | description | price |
|---|---|---|
| alt-text-generator | Alt text generator / accessibility image description API. | $0.02 |
| classify | Zero-shot text classifier. | $0.02 |
| classify-text | Classifies text into caller-supplied labels (2-25), with multi-label mode. | $0.02 |
| detect-pii | Detects PII in text: emails, phones, SSNs, credit cards, addresses, names, IPs, and API tokens. | $0.02 |
| email-draft | Writes emails with AI: subject, body, salutation, and sign-off. | $0.02 |
| extract | Named entity extractor / NER. | $0.02 |
| image-describe | Returns detailed image descriptions, short captions, alt text, OCR text, or tags from a public image URL. | $0.02 |
| image-description | Takes a public image URL and returns an AI vision description, alt text, OCR text, tags, or caption depending on mode. | $0.02 |
SEE ALSO