Integrate PDF to JSON conversion into your applications with our RESTful API
Get started with our API in minutes. All requests require authentication via either JWT Bearer token or API key.
# Base URL
https://api.example.com/v1
# Authentication
Authorization: Bearer YOUR_JWT_TOKEN
# OR
X-API-Key: YOUR_API_KEY For web application users, authenticate using a JWT token obtained from Supabase Auth.
curl -X POST https://api.example.com/v1/upload \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-F "file=@document.pdf" For programmatic access, use an API key. Create API keys in your dashboard under API Settings.
curl -X POST https://api.example.com/v1/upload \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@document.pdf" /v1/upload Upload a PDF file for processing. The file will be queued and processed according to your specified options.
| Parameter | Type | Required | Description |
|---|---|---|---|
extraction_mode | string | No | text, tables, ocr, or hybrid (default: text) |
structure_normalize | boolean | No | Enable AI-powered structure normalization (default: false) |
llm_provider | string | No | LLM provider: openai or deepseek (default: openai) |
repair_tables | boolean | No | Enable AI-powered table repair to fix split rows and wrapped cells (default: false) |
detect_pii | boolean | No | Enable PII detection (default: false) |
redact_pii | boolean | No | Enable PII redaction (requires detect_pii=true, default: false) |
infer_schema | boolean | No | Enable automatic schema inference and document type detection (default: false) |
Multipart form data with PDF file:
Content-Type: multipart/form-data
file: [PDF file] {
"id": "uuid",
"owner_id": "uuid",
"status": "pending",
"file_name": "document.pdf",
"file_size": 123456,
"page_count": null,
"upload_path": "path/to/file",
"result_path": null,
"extraction_mode": "text",
"structure_normalize": false,
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-01T00:00:00Z"
} curl -X POST "https://api.example.com/v1/upload?extraction_mode=hybrid&structure_normalize=true" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@document.pdf" import requests
url = "https://api.example.com/v1/upload"
headers = {
"X-API-Key": "YOUR_API_KEY"
}
params = {
"extraction_mode": "hybrid",
"structure_normalize": True
}
with open("document.pdf", "rb") as f:
files = {"file": f}
response = requests.post(url, headers=headers, params=params, files=files)
print(response.json()) const formData = new FormData();
formData.append('file', fileInput.files[0]);
const params = new URLSearchParams({
extraction_mode: 'hybrid',
structure_normalize: 'true'
});
fetch(`https://api.example.com/v1/upload?${params}`, {
method: 'POST',
headers: {
'X-API-Key': 'YOUR_API_KEY'
},
body: formData
})
.then(response => response.json())
.then(data => console.log(data)); API requests are rate-limited to ensure fair usage and system stability.
Note: When rate limit is exceeded, you'll receive a 429 Too Many Requests response.
/v1/documents/{id}/summarize Generate an AI-powered summary of a processed document. Includes executive summary, key points, document type classification, and extracted entities.
| Parameter | Type | Required | Description |
|---|---|---|---|
llm_provider | string | No | LLM provider to use (default: system default) |
force | boolean | No | Force regenerate even if summary exists (default: false) |
{
"summary": "Executive summary text...",
"key_points": ["Point 1", "Point 2"],
"document_type": "invoice",
"entities": {
"dates": ["2024-01-01"],
"names": ["John Doe"],
"amounts": ["$100.00"],
"references": ["INV-001"]
},
"word_count": 1500,
"language": "en"
} /v1/documents/{id}/summary Retrieve an existing document summary. Returns the same structure as the POST endpoint.
/v1/chat/documents/{id}/chat Send a chat message about a document. Uses RAG (Retrieval-Augmented Generation) to answer questions based on the document content.
Rate Limit: 10 requests per minute, 100 requests per hour (stricter limits due to LLM usage)
{
"message": "What is the total amount?",
"conversation_id": "optional-conversation-id"
} {
"message": "AI response text...",
"conversation_id": "uuid",
"timestamp": "2024-01-01T00:00:00Z"
} /v1/chat/documents/{id}/chat Get chat history for a document.
/v1/chat/documents/{id}/chat Clear chat history for a document.
/v1/schemas Create a custom schema with defined fields for structured data extraction.
{
"name": "Invoice Schema",
"description": "Schema for invoice documents",
"fields": [
{
"name": "invoice_number",
"type": "string",
"required": true
},
{
"name": "total_amount",
"type": "number",
"required": true
}
]
} /v1/schemas List all your custom schemas.
/v1/schemas/map Map a document's extracted data to a custom schema. Uses AI to intelligently match fields.
{
"document_id": "uuid",
"schema_id": "uuid",
"use_llm": true
} {
"mapping_id": "uuid",
"document_id": "uuid",
"schema_id": "uuid",
"mappings": {
"invoice_number": {
"source_field": "document.invoice_no",
"confidence": 0.95
}
},
"mapped_data": {
"invoice_number": "INV-001",
"total_amount": 100.00
}
} Manage your API keys through the dashboard or API endpoints:
POST /v1/api-keys - Create a new API keyGET /v1/api-keys - List your API keysGET /v1/api-keys/{id} - Get API key detailsPATCH /v1/api-keys/{id} - Update API key (name, rate limits)DELETE /v1/api-keys/{id} - Revoke an API keySecurity: API keys are only shown once when created. Store them securely and never commit them to version control.
Explore the full API with our interactive OpenAPI documentation. Try out endpoints directly in your browser:
The Swagger UI allows you to test API endpoints directly. ReDoc provides a clean, readable documentation format.
Need help? Check out our resources: