Documentation

Learn how to use PDF to JSON Platform to convert your documents

Getting Started

1

Create an Account

Sign up for a free account to get started. You can upload and process PDFs immediately after registration.

2

Upload Your First PDF

Navigate to the dashboard and click "Upload PDF" to select and upload your document. Choose your extraction preferences and let our system process it.

3

Download Results

Once processing is complete, download your structured JSON file. You can also view the results directly in the browser.

Extraction Modes

Text Only

Extracts plain text from your PDF. Best for simple documents without tables or complex formatting.

Tables

Specialized extraction for documents containing tables. Preserves table structure and relationships.

OCR

Optical Character Recognition for scanned documents or images. Converts images to searchable text.

Hybrid

Combines text extraction, table detection, and OCR for comprehensive results. Recommended for complex documents.

AI-Powered Features

Structure Normalization

Enable AI-powered structure normalization to improve the quality and organization of your JSON output. Our AI analyzes the document structure and enhances the extracted content.

  • Intelligent content organization
  • Improved data structure
  • Better field recognition
  • Enhanced readability

Table Repair

AI-powered table repair automatically fixes common extraction issues such as split rows, wrapped cells, and misaligned columns. Perfect for complex tables with irregular formatting.

  • Fixes split rows across pages
  • Repairs wrapped cell content
  • Corrects column alignment
  • Preserves table relationships

PII Detection & Redaction

Automatically detect and redact sensitive Personally Identifiable Information (PII) to ensure compliance with GDPR, HIPAA, and other privacy regulations.

  • Detects emails, phone numbers, SSNs, and more
  • Multiple redaction strategies (mask, remove, hash, label)
  • LLM-enhanced context-aware detection
  • Compliance flags for GDPR, HIPAA, PCI

Schema Inference

Automatically detect document type and infer the appropriate schema structure. Supports invoices, receipts, contracts, reports, and more.

  • Automatic document type classification
  • Pre-built schema templates
  • LLM-powered intelligent inference
  • Confidence scoring for classifications

Document Summarization

Generate AI-powered summaries of your documents with key points, document type classification, and extracted entities.

  • Executive summaries
  • Key points extraction
  • Entity extraction (dates, names, amounts)
  • Language detection

Advanced Features

Document Chat

Chat with your documents using RAG (Retrieval-Augmented Generation). Ask questions about document content and get AI-powered answers based on the extracted text.

  • RAG-based question answering
  • Vector embeddings for semantic search
  • Conversation history
  • Context-aware responses

Schema Management

Create and manage custom schemas to define the structure you want for your extracted data. Perfect for standardizing output across multiple document types.

  • Create custom field definitions
  • Define field types and requirements
  • Manage multiple schemas
  • Reuse schemas across documents

Field Mapping

Map extracted data from documents to your custom schemas automatically. AI-powered mapping ensures accurate field matching with confidence scores.

  • Automatic field mapping
  • LLM-enhanced matching
  • Confidence scoring
  • Manual override capabilities

Frequently Asked Questions

What file formats are supported?

Currently, we support PDF files only. The maximum file size is 50MB.

How long does processing take?

Processing time varies based on document size and complexity. Most documents are processed within a few minutes.

Can I use the API?

Yes! We provide a comprehensive REST API for programmatic access. Check out our API documentation for details.

Is my data secure?

Yes. All documents are encrypted in transit and at rest. We follow industry best practices for data security and privacy.

What happens to my documents after processing?

You can delete documents at any time. Documents are stored securely and only accessible by you.

Additional Resources