VATextract uses advanced OCR technology to extract structured data from any invoice format—PDF, scanned documents, or images.Documentation Index
Fetch the complete documentation index at: https://vatextract.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Supported File Formats
| Format | Description |
|---|---|
| Native PDFs and scanned documents | |
| JPEG/PNG | High-resolution images |
| Multi-page | Documents up to 100 pages |
Extracted Fields
Header Information
| Field | Description |
|---|---|
invoiceNumber | Unique invoice identifier |
invoiceDate | Date the invoice was issued |
dueDate | Payment due date |
deliveryDate | Goods/services delivery date |
purchaseOrder | Purchase order reference |
Financial Data
| Field | Description |
|---|---|
netAmount | Pre-tax amount |
vatAmount | VAT/tax amount |
vatRate | VAT percentage |
totalAmount | Total including tax |
currency | ISO currency code (EUR, GBP, USD, etc.) |
freightAmount | Shipping/freight charges |
Supplier Details
| Field | Description |
|---|---|
supplierName | Company name |
supplierTaxId | VAT/Tax identification number |
supplierAddress | Full address |
supplierContact | Email, phone, IBAN, etc. |
Line Items
Each line item contains:VAT codes are automatically assigned based on the line item’s VAT rate and your accounting software. See VAT Codes for details.
OCR Providers
VATextract supports multiple OCR engines:- Google Document AI
- AWS Textract
Default provider. Best for European invoices and complex layouts.
- Excellent multi-language support
- Strong table extraction
- High accuracy on scanned documents
Configure your preferred OCR provider in Settings → Preferences, or set the
OCR_PROVIDER environment variable for self-hosted deployments.