API Documentation

Authentication

JWT Bearer Token

For web application users, authenticate using a JWT token obtained from Supabase Auth.

curl -X POST https://api.example.com/v1/upload \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "file=@document.pdf"

API Key

For programmatic access, use an API key. Create API keys in your dashboard under API Settings.

curl -X POST https://api.example.com/v1/upload \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@document.pdf"

Upload Document

POST /v1/upload

Upload a PDF file for processing. The file will be queued and processed according to your specified options.

Query Parameters

Parameter	Type	Required	Description
`extraction_mode`	string	No	text, tables, ocr, or hybrid (default: text)
`structure_normalize`	boolean	No	Enable AI-powered structure normalization (default: false)
`llm_provider`	string	No	LLM provider: openai or deepseek (default: openai)
`repair_tables`	boolean	No	Enable AI-powered table repair to fix split rows and wrapped cells (default: false)
`detect_pii`	boolean	No	Enable PII detection (default: false)
`redact_pii`	boolean	No	Enable PII redaction (requires detect_pii=true, default: false)
`infer_schema`	boolean	No	Enable automatic schema inference and document type detection (default: false)

Request Body

Multipart form data with PDF file:

Content-Type: multipart/form-data

file: [PDF file]

Response

{
  "id": "uuid",
  "owner_id": "uuid",
  "status": "pending",
  "file_name": "document.pdf",
  "file_size": 123456,
  "page_count": null,
  "upload_path": "path/to/file",
  "result_path": null,
  "extraction_mode": "text",
  "structure_normalize": false,
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-01T00:00:00Z"
}

cURL Example

curl -X POST "https://api.example.com/v1/upload?extraction_mode=hybrid&structure_normalize=true" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@document.pdf"

Python Example

import requests

url = "https://api.example.com/v1/upload"
headers = {
    "X-API-Key": "YOUR_API_KEY"
}
params = {
    "extraction_mode": "hybrid",
    "structure_normalize": True
}

with open("document.pdf", "rb") as f:
    files = {"file": f}
    response = requests.post(url, headers=headers, params=params, files=files)
    print(response.json())

JavaScript Example

const formData = new FormData();
formData.append('file', fileInput.files[0]);

const params = new URLSearchParams({
  extraction_mode: 'hybrid',
  structure_normalize: 'true'
});

fetch(`https://api.example.com/v1/upload?${params}`, {
  method: 'POST',
  headers: {
    'X-API-Key': 'YOUR_API_KEY'
  },
  body: formData
})
  .then(response => response.json())
  .then(data => console.log(data));

Rate Limiting

API requests are rate-limited to ensure fair usage and system stability.

Default limits: 60 requests per minute, 1000 requests per hour
API keys: Can have custom rate limits configured
Response headers: Rate limit information is included in response headers

Note: When rate limit is exceeded, you'll receive a 429 Too Many Requests response.

Document Summarization

POST /v1/documents/{id}/summarize

Generate an AI-powered summary of a processed document. Includes executive summary, key points, document type classification, and extracted entities.

Query Parameters

Parameter	Type	Required	Description
`llm_provider`	string	No	LLM provider to use (default: system default)
`force`	boolean	No	Force regenerate even if summary exists (default: false)

Response

{
  "summary": "Executive summary text...",
  "key_points": ["Point 1", "Point 2"],
  "document_type": "invoice",
  "entities": {
    "dates": ["2024-01-01"],
    "names": ["John Doe"],
    "amounts": ["$100.00"],
    "references": ["INV-001"]
  },
  "word_count": 1500,
  "language": "en"
}

GET /v1/documents/{id}/summary

Retrieve an existing document summary. Returns the same structure as the POST endpoint.

Document Chat

POST /v1/chat/documents/{id}/chat

Send a chat message about a document. Uses RAG (Retrieval-Augmented Generation) to answer questions based on the document content.

Rate Limit: 10 requests per minute, 100 requests per hour (stricter limits due to LLM usage)

Request Body

{
  "message": "What is the total amount?",
  "conversation_id": "optional-conversation-id"
}

Response

{
  "message": "AI response text...",
  "conversation_id": "uuid",
  "timestamp": "2024-01-01T00:00:00Z"
}

GET /v1/chat/documents/{id}/chat

Get chat history for a document.

DELETE /v1/chat/documents/{id}/chat

Clear chat history for a document.

Schema Management

POST /v1/schemas

Create a custom schema with defined fields for structured data extraction.

Request Body

{
  "name": "Invoice Schema",
  "description": "Schema for invoice documents",
  "fields": [
    {
      "name": "invoice_number",
      "type": "string",
      "required": true
    },
    {
      "name": "total_amount",
      "type": "number",
      "required": true
    }
  ]
}

GET /v1/schemas

List all your custom schemas.

POST /v1/schemas/map

Map a document's extracted data to a custom schema. Uses AI to intelligently match fields.

Request Body

{
  "document_id": "uuid",
  "schema_id": "uuid",
  "use_llm": true
}

Response

{
  "mapping_id": "uuid",
  "document_id": "uuid",
  "schema_id": "uuid",
  "mappings": {
    "invoice_number": {
      "source_field": "document.invoice_no",
      "confidence": 0.95
    }
  },
  "mapped_data": {
    "invoice_number": "INV-001",
    "total_amount": 100.00
  }
}

API Key Management

Manage your API keys through the dashboard or API endpoints:

POST /v1/api-keys - Create a new API key
GET /v1/api-keys - List your API keys
GET /v1/api-keys/{id} - Get API key details
PATCH /v1/api-keys/{id} - Update API key (name, rate limits)
DELETE /v1/api-keys/{id} - Revoke an API key

Security: API keys are only shown once when created. Store them securely and never commit them to version control.

Quick Start

Authentication

JWT Bearer Token

API Key

Upload Document

Query Parameters

Request Body

Response

cURL Example

Python Example

JavaScript Example

Rate Limiting

Document Summarization

Query Parameters

Response

Document Chat

Request Body

Response

Schema Management

Request Body

Request Body

Response

API Key Management

Interactive Documentation

Support