Help Center

Find answers to your questions

Browse by Category

Getting Started

Learn the basics of using PDF to JSON Platform

Extraction Modes

Learn about different extraction options

API & Integration

Integrate PDF conversion into your applications

Troubleshooting

Solve common issues and errors

Frequently Asked Questions

What file formats are supported?
We support PDF files up to 50MB in size. For best results, use PDFs with selectable text. Scanned documents are also supported via our OCR extraction mode.
Which extraction mode should I use?
  • Text: Best for documents with primarily text content (reports, articles, contracts)
  • Tables: Best for spreadsheet-like documents with tabular data
  • OCR: Required for scanned documents or image-based PDFs
  • Hybrid: Combines text and table extraction - ideal for complex documents with mixed content
What is AI structure normalization?
AI structure normalization uses large language models (OpenAI or DeepSeek) to clean and organize extracted data. It corrects formatting issues, standardizes field names, and improves the overall structure of the JSON output. This is especially useful for complex documents with inconsistent formatting.
How long does processing take?
Processing time depends on the document size and extraction mode:
  • Text extraction: Typically 1-5 seconds per page
  • Table extraction: 2-10 seconds per page
  • OCR: 5-15 seconds per page
  • AI normalization: Adds 5-30 seconds depending on content size
How do I create an API key?
  1. Log in to your dashboard
  2. Navigate to Settings → API Keys
  3. Click "Create New API Key"
  4. Give it a descriptive name
  5. Copy and securely store your key (it won't be shown again)

Use your API key in requests with the header: X-API-Key: your-key-here

What are the API rate limits?
Default rate limits are:
  • 60 requests per minute
  • 1,000 requests per hour

Rate limit information is included in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

Why is my PDF extraction returning empty results?
This usually happens with scanned PDFs or image-based documents. Try:
  • Switch to OCR extraction mode
  • Use Hybrid mode for mixed content
  • Ensure the PDF isn't password-protected
  • Check that the file isn't corrupted by opening it locally
What does "Processing Failed" mean?
Processing can fail for several reasons:
  • File corruption: The PDF may be damaged or malformed
  • Password protection: Encrypted PDFs cannot be processed
  • Timeout: Very large documents may exceed processing limits
  • Unsupported features: Some PDF features may not be supported

Check the error details in your document history for specific information.

How is my data protected?
We take data security seriously:
  • All uploads are encrypted in transit (HTTPS/TLS)
  • Files are stored securely in Supabase Storage
  • Documents are only accessible to the owning account
  • API keys use secure hashing
  • Row-level security ensures data isolation
How do I use document chat?
Document chat allows you to ask questions about your processed documents using AI:
  1. Open a processed document in the dashboard
  2. Click on the "Chat" tab or button
  3. Ask questions about the document content
  4. The AI will answer based on the extracted text using RAG (Retrieval-Augmented Generation)

Note: Document embeddings must be generated first (this happens automatically when you start chatting).

How do I create and use custom schemas?
Custom schemas let you define the exact structure you want for extracted data:
  1. Go to Dashboard → Schema Management
  2. Click "Create Schema"
  3. Define your fields (name, type, required status)
  4. Save the schema
  5. When uploading documents, use field mapping to map extracted data to your schema

Schemas can be reused across multiple documents for consistent output formats.

How does PII detection work?
PII (Personally Identifiable Information) detection automatically finds sensitive data:
  • Detection: Uses pattern matching and AI to identify emails, phone numbers, SSNs, credit cards, etc.
  • Redaction strategies: Choose how to handle detected PII:
    • Mask: Replace with asterisks (e.g., ***@***.com)
    • Remove: Delete the PII entirely
    • Hash: Replace with a hash value
    • Label: Mark with a label but keep the value
  • Compliance: Set compliance flags (GDPR, HIPAA, PCI) for automatic handling

Enable PII detection when uploading documents in the Advanced Options section.

What is table repair and when should I use it?
Table repair uses AI to fix common table extraction issues:
  • Split rows: Fixes rows that are split across pages
  • Wrapped cells: Repairs cells with text that wraps incorrectly
  • Column alignment: Corrects misaligned columns
  • Missing borders: Infers table structure when borders are unclear

Use table repair when processing documents with complex tables, especially those spanning multiple pages or with irregular formatting. Enable it in Advanced Options when uploading.

How do I generate document summaries?
Document summaries provide quick insights into document content:
  1. Process a document (summaries require completed extraction)
  2. Open the document detail page
  3. Click "Generate Summary" or use the API endpoint
  4. View the summary which includes:
    • Executive summary
    • Key points
    • Document type classification
    • Extracted entities (dates, names, amounts, references)

Summaries are cached - regenerate with the "force" parameter if you want a fresh summary.

Still Need Help?

Can't find what you're looking for? Our support team is ready to assist you.