Name: pdf-to-text
Price: 0.0025 USDC
Availability: InStock

$ man pdf-to-text

agentutility / mediakit / pdf-to-text

PRICE / CALL

$0.0025

USDC · base mainnet · scheme: exact

METHOD

POST

CLUSTER

mediakit

CATEGORY

uncategorized

STATUS

● live

NAME

pdf-to-text — extracts text from digital or scanned pdfs, preserving reading order across multi-column layouts with an ai + ocr pipeline (datalab marker)

synonym alias of pdf-to-markdown — reuses the canonical handler.

SYNOPSIS

POST https://x402.agentutility.ai/pdf-to-text
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }

↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.

DESCRIPTION

Extracts text from digital or scanned PDFs, preserving reading order across multi-column layouts with an AI + OCR pipeline (Datalab Marker). Recognizes scanned pages and returns Markdown by default (clean text with structure) or HTML / JSON. 30 pages max. Works as a PDF to plain text converter, pdftotext or pdf2txt replacement, PDF text extractor, scanned PDF OCR, or read/parse PDF content extractor.

INPUT — request schema

property	type	description	req?
pdf_url	string	Public URL of a PDF file (http or https). Must be directly fetchable, not behind auth or a viewer redirect. Max 30 pages.	required
output_format	string	'markdown' (default — best for LLM downstream), 'html' (preserves more layout structure), or 'json' (per-page blocks with type + bbox). enum: markdown · html · json	optional

OUTPUT — response shape

field	type	description
markdown	string	Extracted text from the PDF as plain text or layout-aware markdown preserving tables, equations, and columns.
page_count	string	Number of pages processed from the source PDF (capped at the 30-page per-call limit).
source_url	string	URL of the PDF that was fetched and processed for text extraction.

EXAMPLES — two ways to call

EXAMPLE 1 · curl

curl -X POST https://x402.agentutility.ai/pdf-to-text \
  -H 'Content-Type: application/json' \
  -d '{ }'

first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.

EXAMPLE 2 · mcp

# Install the MCP package for this endpoint's cluster
npx -y @agentutility/mcp-<cluster>

# Required: EVM private key with USDC on Base
export X402_PRIVATE_KEY=0x...

# Then call the pdf-to-text tool from your MCP-aware agent.

MCP server handles payment automatically — your coding agent just calls the tool by name.

METADATA

tags: pdfpdf-to-texttext-extractionocrdocument-parsingmarkdownmediakitdatalab-marker
methods: POST
cluster: mediakit
price: $0.0025 USDC per call

ADJACENT — other endpoints in mediakit

endpoint	description	price
convert-pdf	Converts PDFs to Markdown, HTML, JSON, or structured text with the Datalab Marker AI pipeline, preserving headings, tables, equations, an…	$0.0025
ocr	Runs OCR on scanned PDFs and image-based documents, returning clean Markdown or plain text.	$0.0025
pdf-parser-api	Parses a public PDF URL into Markdown, HTML, or JSON blocks with layout-aware text, headings, tables, and equations.	$0.0025
pdf-text-extractor	Extracts clean Markdown, HTML, or structured JSON from digital or scanned PDFs while preserving reading order, tables, and equations.	$0.0025
pdf-to-markdown	Converts digital or scanned PDFs to clean Markdown with AI-powered, layout-aware extraction on the Datalab Marker engine.	$0.0025
pdf-to-markdown-api	Converts a public PDF URL into clean Markdown, HTML, or structured JSON while preserving headings, tables, equations, and reading order.	$0.0025
pdf-to-text-api	Extracts text from digital and scanned PDFs as Markdown, plain text, HTML, or JSON with layout-aware reading order.	$0.0025
compress-pdf	PDF compressor / PDF size reducer.	$0.005