API Documentation

Integrate PDF to JSON conversion into your applications with our RESTful API

Quick Start

Get started with our API in minutes. All requests require authentication via either JWT Bearer token or API key.

# Base URL
https://api.example.com/v1

# Authentication
Authorization: Bearer YOUR_JWT_TOKEN
# OR
X-API-Key: YOUR_API_KEY        

Authentication

JWT Bearer Token

For web application users, authenticate using a JWT token obtained from Supabase Auth.

curl -X POST https://api.example.com/v1/upload \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "file=@document.pdf"

API Key

For programmatic access, use an API key. Create API keys in your dashboard under API Settings.

curl -X POST https://api.example.com/v1/upload \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@document.pdf"

Upload Document

POST /v1/upload

Upload a PDF file for processing. The file will be queued and processed according to your specified options.

Query Parameters

Parameter Type Required Description
extraction_mode string No text, tables, ocr, or hybrid (default: text)
structure_normalize boolean No Enable AI-powered structure normalization (default: false)
llm_provider string No LLM provider: openai or deepseek (default: openai)
repair_tables boolean No Enable AI-powered table repair to fix split rows and wrapped cells (default: false)
detect_pii boolean No Enable PII detection (default: false)
redact_pii boolean No Enable PII redaction (requires detect_pii=true, default: false)
infer_schema boolean No Enable automatic schema inference and document type detection (default: false)

Request Body

Multipart form data with PDF file:

Content-Type: multipart/form-data

file: [PDF file]

Response

{
  "id": "uuid",
  "owner_id": "uuid",
  "status": "pending",
  "file_name": "document.pdf",
  "file_size": 123456,
  "page_count": null,
  "upload_path": "path/to/file",
  "result_path": null,
  "extraction_mode": "text",
  "structure_normalize": false,
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-01T00:00:00Z"
}

cURL Example

curl -X POST "https://api.example.com/v1/upload?extraction_mode=hybrid&structure_normalize=true" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@document.pdf"

Python Example

import requests

url = "https://api.example.com/v1/upload"
headers = {
    "X-API-Key": "YOUR_API_KEY"
}
params = {
    "extraction_mode": "hybrid",
    "structure_normalize": True
}

with open("document.pdf", "rb") as f:
    files = {"file": f}
    response = requests.post(url, headers=headers, params=params, files=files)
    print(response.json())

JavaScript Example

const formData = new FormData();
formData.append('file', fileInput.files[0]);

const params = new URLSearchParams({
  extraction_mode: 'hybrid',
  structure_normalize: 'true'
});

fetch(`https://api.example.com/v1/upload?${params}`, {
  method: 'POST',
  headers: {
    'X-API-Key': 'YOUR_API_KEY'
  },
  body: formData
})
  .then(response => response.json())
  .then(data => console.log(data));

Rate Limiting

API requests are rate-limited to ensure fair usage and system stability.

  • Default limits: 60 requests per minute, 1000 requests per hour
  • API keys: Can have custom rate limits configured
  • Response headers: Rate limit information is included in response headers

Note: When rate limit is exceeded, you'll receive a 429 Too Many Requests response.

Document Summarization

POST /v1/documents/{id}/summarize

Generate an AI-powered summary of a processed document. Includes executive summary, key points, document type classification, and extracted entities.

Query Parameters

Parameter Type Required Description
llm_provider string No LLM provider to use (default: system default)
force boolean No Force regenerate even if summary exists (default: false)

Response

{
  "summary": "Executive summary text...",
  "key_points": ["Point 1", "Point 2"],
  "document_type": "invoice",
  "entities": {
    "dates": ["2024-01-01"],
    "names": ["John Doe"],
    "amounts": ["$100.00"],
    "references": ["INV-001"]
  },
  "word_count": 1500,
  "language": "en"
}
GET /v1/documents/{id}/summary

Retrieve an existing document summary. Returns the same structure as the POST endpoint.

Document Chat

POST /v1/chat/documents/{id}/chat

Send a chat message about a document. Uses RAG (Retrieval-Augmented Generation) to answer questions based on the document content.

Rate Limit: 10 requests per minute, 100 requests per hour (stricter limits due to LLM usage)

Request Body

{
  "message": "What is the total amount?",
  "conversation_id": "optional-conversation-id"
}

Response

{
  "message": "AI response text...",
  "conversation_id": "uuid",
  "timestamp": "2024-01-01T00:00:00Z"
}
GET /v1/chat/documents/{id}/chat

Get chat history for a document.

DELETE /v1/chat/documents/{id}/chat

Clear chat history for a document.

Schema Management

POST /v1/schemas

Create a custom schema with defined fields for structured data extraction.

Request Body

{
  "name": "Invoice Schema",
  "description": "Schema for invoice documents",
  "fields": [
    {
      "name": "invoice_number",
      "type": "string",
      "required": true
    },
    {
      "name": "total_amount",
      "type": "number",
      "required": true
    }
  ]
}
GET /v1/schemas

List all your custom schemas.

POST /v1/schemas/map

Map a document's extracted data to a custom schema. Uses AI to intelligently match fields.

Request Body

{
  "document_id": "uuid",
  "schema_id": "uuid",
  "use_llm": true
}

Response

{
  "mapping_id": "uuid",
  "document_id": "uuid",
  "schema_id": "uuid",
  "mappings": {
    "invoice_number": {
      "source_field": "document.invoice_no",
      "confidence": 0.95
    }
  },
  "mapped_data": {
    "invoice_number": "INV-001",
    "total_amount": 100.00
  }
}

API Key Management

Manage your API keys through the dashboard or API endpoints:

  • POST /v1/api-keys - Create a new API key
  • GET /v1/api-keys - List your API keys
  • GET /v1/api-keys/{id} - Get API key details
  • PATCH /v1/api-keys/{id} - Update API key (name, rate limits)
  • DELETE /v1/api-keys/{id} - Revoke an API key

Security: API keys are only shown once when created. Store them securely and never commit them to version control.

Interactive Documentation

Explore the full API with our interactive OpenAPI documentation. Try out endpoints directly in your browser:

The Swagger UI allows you to test API endpoints directly. ReDoc provides a clean, readable documentation format.

Support

Need help? Check out our resources: