Skip to main content
VATextract uses advanced OCR technology to extract structured data from any invoice format—PDF, scanned documents, or images.

Supported File Formats

FormatDescription
PDFNative PDFs and scanned documents
JPEG/PNGHigh-resolution images
Multi-pageDocuments up to 100 pages

Extracted Fields

Header Information

FieldDescription
invoiceNumberUnique invoice identifier
invoiceDateDate the invoice was issued
dueDatePayment due date
deliveryDateGoods/services delivery date
purchaseOrderPurchase order reference

Financial Data

FieldDescription
netAmountPre-tax amount
vatAmountVAT/tax amount
vatRateVAT percentage
totalAmountTotal including tax
currencyISO currency code (EUR, GBP, USD, etc.)
freightAmountShipping/freight charges

Supplier Details

FieldDescription
supplierNameCompany name
supplierTaxIdVAT/Tax identification number
supplierAddressFull address
supplierContactEmail, phone, IBAN, etc.

Line Items

Each line item contains:
{
  "description": "Product or service name",
  "quantity": 10,
  "unitPrice": 25.00,
  "amount": 250.00,
  "productCode": "SKU-12345"
}

OCR Providers

VATextract supports multiple OCR engines:
Default provider. Best for European invoices and complex layouts.
  • Excellent multi-language support
  • Strong table extraction
  • High accuracy on scanned documents
Configure your preferred OCR provider in Settings → Preferences, or set the OCR_PROVIDER environment variable for self-hosted deployments.

Extraction Confidence

Each extracted field includes a confidence score (0-100%). Low-confidence extractions are highlighted in the review interface for manual verification.

Geometry Data

For advanced integrations, VATextractprovides bounding box coordinates for each extracted field:
{
  "fieldName": "TOTAL",
  "text": "$1,250.00",
  "confidence": 98.5,
  "geometry": {
    "boundingBox": {
      "left": 0.72,
      "top": 0.85,
      "width": 0.15,
      "height": 0.02
    },
    "pageNumber": 1
  }
}
This enables document overlay highlighting and programmatic field location.