Help Center

Frequently Asked Questions

What file formats are supported?

We support PDF files up to 50MB in size. For best results, use PDFs with selectable text. Scanned documents are also supported via our OCR extraction mode.

Which extraction mode should I use?

Text: Best for documents with primarily text content (reports, articles, contracts)
Tables: Best for spreadsheet-like documents with tabular data
OCR: Required for scanned documents or image-based PDFs
Hybrid: Combines text and table extraction - ideal for complex documents with mixed content

What is AI structure normalization?

AI structure normalization uses large language models (OpenAI or DeepSeek) to clean and organize extracted data. It corrects formatting issues, standardizes field names, and improves the overall structure of the JSON output. This is especially useful for complex documents with inconsistent formatting.

How long does processing take?

Processing time depends on the document size and extraction mode:

Text extraction: Typically 1-5 seconds per page
Table extraction: 2-10 seconds per page
OCR: 5-15 seconds per page
AI normalization: Adds 5-30 seconds depending on content size

How do I create an API key?

Log in to your dashboard
Navigate to Settings → API Keys
Click "Create New API Key"
Give it a descriptive name
Copy and securely store your key (it won't be shown again)

Use your API key in requests with the header: X-API-Key: your-key-here

What are the API rate limits?

Default rate limits are:

60 requests per minute
1,000 requests per hour

Rate limit information is included in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

Why is my PDF extraction returning empty results?

This usually happens with scanned PDFs or image-based documents. Try:

Switch to OCR extraction mode
Use Hybrid mode for mixed content
Ensure the PDF isn't password-protected
Check that the file isn't corrupted by opening it locally

What does "Processing Failed" mean?

Processing can fail for several reasons:

File corruption: The PDF may be damaged or malformed
Password protection: Encrypted PDFs cannot be processed
Timeout: Very large documents may exceed processing limits
Unsupported features: Some PDF features may not be supported

Check the error details in your document history for specific information.

How is my data protected?

We take data security seriously:

All uploads are encrypted in transit (HTTPS/TLS)
Files are stored securely in Supabase Storage
Documents are only accessible to the owning account
API keys use secure hashing
Row-level security ensures data isolation

How do I use document chat?

Document chat allows you to ask questions about your processed documents using AI:

Open a processed document in the dashboard
Click on the "Chat" tab or button
Ask questions about the document content
The AI will answer based on the extracted text using RAG (Retrieval-Augmented Generation)

Note: Document embeddings must be generated first (this happens automatically when you start chatting).

How do I create and use custom schemas?

Custom schemas let you define the exact structure you want for extracted data:

Go to Dashboard → Schema Management
Click "Create Schema"
Define your fields (name, type, required status)
Save the schema
When uploading documents, use field mapping to map extracted data to your schema

Schemas can be reused across multiple documents for consistent output formats.

How does PII detection work?

PII (Personally Identifiable Information) detection automatically finds sensitive data:

Detection: Uses pattern matching and AI to identify emails, phone numbers, SSNs, credit cards, etc.
Redaction strategies: Choose how to handle detected PII:
- Mask: Replace with asterisks (e.g., ***@***.com)
- Remove: Delete the PII entirely
- Hash: Replace with a hash value
- Label: Mark with a label but keep the value
Compliance: Set compliance flags (GDPR, HIPAA, PCI) for automatic handling

Enable PII detection when uploading documents in the Advanced Options section.

What is table repair and when should I use it?

Table repair uses AI to fix common table extraction issues:

Split rows: Fixes rows that are split across pages
Wrapped cells: Repairs cells with text that wraps incorrectly
Column alignment: Corrects misaligned columns
Missing borders: Infers table structure when borders are unclear

Use table repair when processing documents with complex tables, especially those spanning multiple pages or with irregular formatting. Enable it in Advanced Options when uploading.

How do I generate document summaries?

Document summaries provide quick insights into document content:

Process a document (summaries require completed extraction)
Open the document detail page
Click "Generate Summary" or use the API endpoint
View the summary which includes:
- Executive summary
- Key points
- Document type classification
- Extracted entities (dates, names, amounts, references)

Summaries are cached - regenerate with the "force" parameter if you want a fresh summary.

Browse by Category

Getting Started

Extraction Modes

API & Integration

Troubleshooting

Frequently Asked Questions

Still Need Help?