Inferences

VLM (Vision Language Model) Inference

The VLM API provides advanced document processing capabilities using Vision Language Models. It can intelligently classify and extract structured data from various document types including shipping labels, item labels, bills of lading, receipts, and invoices. The API supports both images and PDF documents with multi-page processing.

Supported File Types

Images

JPEG/JPG, PNG, BMP, TIFF, GIF, SVG, WebP, HEIC

PDF Documents

Maximum Pages: 100 pages per PDF
Multi-page Support: Each page is processed individually and returned in a structured response

Supported Document Types

The VLM API automatically classifies and processes the following document types:

Shipping Labels - Extract tracking numbers, courier information, sender/recipient details
Item Labels - Extract product information, SKUs, batch numbers, dimensions
Bills of Lading - Extract logistics information, container details, shipping data
Receipts - Extract merchant information, transaction details, itemized purchases
Invoices - Extract billing information, line items, payment terms
Other Documents - Flexible extraction for custom document types

Available Models

The VLM API supports multiple model sizes for different use cases:

Model	Description	Use Case
`orion_small`	Fast, lightweight generic model	Quick processing, high volume
`orion_medium`	Balanced performance generic model	General purpose processing
`orion_large`	High accuracy generic model	Complex documents, maximum accuracy
`vscan_small`	Specialized logistics model (small)	Logistics documents, shipping labels
`vscan_medium`	Specialized logistics model (medium)	Logistics documents, balanced performance
`vscan_large`	Specialized logistics model (large)	Complex logistics documents, maximum accuracy

New Inference

POST

`/v1/inferences/images/vlm`

Create a new VLM inference to process and extract structured data from document images.

Parameters

image string (required)
Base64 encoded data URL or public web URL of the image or PDF to process. Supports images (JPEG, PNG, etc.) and PDF documents (max 100 pages).

Image format: data:image/jpeg;base64,... or image URL
PDF format: data:application/pdf;base64,... or PDF URL

prompt string (required) (8000)
Custom prompt to guide the VLM processing. Must be between 5-8000 characters and contain meaningful content.

prompt_id string
You can pass the prompt_id from the prompts api to run the vlm inference. Either prompt or prompt_id is required.

model string (optional)
VLM model to use for processing. Defaults to vscan_small. Options: orion_small, orion_medium, orion_large, vscan_small, vscan_medium, vscan_large.

location_id string (optional)
The ID of the location to attribute this inference to for filtering and organization.

metadata object (optional)
Custom metadata to associate with the inference.

Example Request (Image)

        const data = {
  image: "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAASA...", // Base64 encoded image
  prompt: "Extract all shipping information from this label including tracking number, sender, and recipient details.",
  model: "orion_medium",
  location_id: "loc_123456789",
  metadata: {
    source: "mobile_app",
    batch_id: "batch_001"
  }
};

const response = await fetch("https://api.packagex.io/v1/inferences/images/vlm", {
  method: "POST",
  headers: {
    "PX-API-KEY": process.env.PX_API_KEY,
    "Content-Type": "application/json",
  },
  body: JSON.stringify(data),
}).then((res) => res.json());

const inference = response.data;

Example Request (PDF)

        const data = {
  image: "data:application/pdf;base64,JVBERi0xLjQK...", // Base64 encoded PDF
  prompt: "Extract shipping information from each page of this document.",
  model: "vscan_large",
  metadata: {
    document_type: "shipping_manifest"
  }
};

const response = await fetch("https://api.packagex.io/v1/inferences/images/vlm", {
  method: "POST",
  headers: {
    "PX-API-KEY": process.env.PX_API_KEY,
    "Content-Type": "application/json",
  },
  body: JSON.stringify(data),
}).then((res) => res.json());

const inference = response.data;

// model_response is always an array of data objects (one per page)
console.log(`Processed ${inference.model_response.length} page(s)`);

inference.model_response.forEach((pageData, index) => {
  console.log(`Page ${index + 1}:`, pageData);
});

Response Format

The VLM API returns structured JSON data based on the detected document type. The model_response is always an array of data objects, where each element represents extracted data from one page.

VLM Response Model (Single Page)

        {
  "object": "vlm_inference",
  "id": "vlm_1234567890abcdef",
  "organization_id": "org_1234567890abcdef",
  "location_id": "loc_1234567890abcdef",
  "organization": {
    "id": "org_1234567890abcdef",
    "name": "Acme Corp",
    "logo_url": "https://example.com/logo.png"
  },
  "location": {
    "id": "loc_1234567890abcdef",
    "name": "Main Warehouse"
  },
  "status": "completed",
  "image_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
  "image_url": "https://example.com/images/vlm_1234567890abcdef.jpg",
  "model": "vscan_medium",
  "prompt": "You are a high-performing OCR scanner and information extractor...",
  "model_response": [
    {
      "document_type": "shipping_label",
      "courier_name": "FedEx",
      "tracking_number": "1234567890123456",
      "dimensions": "12x8x6 inches",
      "weight": "2.5 lbs",
      "recipient": {
        "name": "John Doe",
        "address": {
          "line1": "123 Main St",
          "city": "New York",
          "state": "NY",
          "postal_code": "10001",
          "country": "USA"
        }
      },
      "sender": {
        "name": "Jane Smith",
        "address": {
          "line1": "456 Oak Ave",
          "city": "Los Angeles",
          "state": "CA",
          "postal_code": "90210",
          "country": "USA"
        }
      }
    }
  ],
  "token_count": 1250,
  "metadata": {},
  "created_at": "2024-01-15T10:30:00.000Z",
  "created_by": "user_1234567890abcdef",
  "updated_at": "2024-01-15T10:30:05.000Z",
  "updated_by": "user_1234567890abcdef",
  "checksum": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0"
}

Multi-Page PDF Response

When processing a PDF with multiple pages, the model_response array contains one object per page:

        {
  "object": "vlm_inference",
  "id": "vlm_1234567890abcdef",
  "status": "completed",
  "model_response": [
    {
      "document_type": "shipping_label",
      "tracking_number": "1234567890123456",
      "recipient": {
        "name": "John Doe",
        "address": {
          "line1": "123 Main St",
          "city": "New York",
          "state": "NY",
          "postal_code": "10001"
        }
      }
    },
    {
      "document_type": "shipping_label",
      "tracking_number": "9876543210987654",
      "recipient": {
        "name": "Jane Smith",
        "address": {
          "line1": "456 Oak Ave",
          "city": "Los Angeles",
          "state": "CA",
          "postal_code": "90210"
        }
      }
    }
  ]
}

Response Fields:

Field	Type	Description
`model_response`	array	Always an array of extracted data objects, one per page (index 0 = page 1, etc.)

Retrieve Inference

GET

`/v1/inferences/images/vlm/inference_id`

Retrieve a specific VLM inference by its ID.

Parameters

inference_id string (required)
The unique identifier of the VLM inference to retrieve.

Example Request

        const response = await fetch("https://api.packagex.io/v1/inferences/images/vlm/inf_vlm_123456789", {
  method: "GET",
  headers: {
    "PX-API-KEY": process.env.PX_API_KEY,
  },
}).then((res) => res.json());

const inference = response.data;

List Inferences

GET

`/v1/inferences/images/vlm`

Retrieve a paginated list of VLM inferences with optional filtering.

Query Parameters

page number (optional)
Page number for pagination. Default: 1.

limit number (optional)
Number of inferences per page. Default: 20, max: 100.

order_by string (optional)
Field to sort by. Options: created_at. Default: created_at.

location_id string (optional)
Filter inferences by location ID.

models string (optional)
Filter inferences by model used.

status string (optional)
Filter inferences by status.

Example Request

        const response = await fetch("https://api.packagex.io/v1/inferences/images/vlm?page=1&limit=10&location_id=loc_123456789", {
  method: "GET",
  headers: {
    "PX-API-KEY": process.env.PX_API_KEY,
  },
}).then((res) => res.json());

const inferences = response.data;
const pagination = response.pagination;

VLM Inference Model

object "vlm_inference"
The description of the model.

id string
Unique identifier for the VLM inference.

organization_id string
Unique identifier for the organization that owns this inference. This will always be your organization ID.

location_id string | null
The hub location to which this inference is assigned, if specified.

organization object
Details about the organization that owns this inference.

Show Details

Field	Type	Description
organization.id	string	Organization ID
organization.name	string	Organization name
organization.logo_url	string \| null	URL to organization logo

location object | null
Details about the location this inference is assigned to.

Show Details

Field	Type	Description
location.id	string	Location ID
location.name	string	Location name

status string
Processing status of the inference. Possible values: inferring, completed, error.

image_url string
The URL to the processed image used for this inference.

image_hash string
A unique hash for this image that can be used to identify duplicate processed images.

model string
The VLM model used for processing. Options: vscan_small, vscan_medium, vscan_large, orion_small, orion_medium, orion_large.

prompt string
The prompt used to guide the VLM processing.

model_response array
Always an array of extracted data objects, one per page (index 0 = page 1, etc.).

token_count number | null
Number of tokens consumed during processing.

metadata object
Key-value pairs of custom metadata associated with this inference.

created_at string
Creation timestamp in ISO 8601 format.

created_by string | null
User ID who created this inference.

updated_at string
Last update timestamp in ISO 8601 format.

updated_by string
User ID who last updated this inference.

checksum string
MD5 checksum for data integrity verification.

Best Practices

Image Quality

Use high-resolution images (minimum 300 DPI)
Ensure good lighting and contrast
Avoid blurry or distorted images
Crop images to focus on the document content

PDF Documents

Keep PDFs under 100 pages (maximum limit)
Ensure PDF pages are clear and readable
Use text-based PDFs when possible for better accuracy
Consider splitting very large documents into smaller batches
Processing time scales with page count

Prompt Engineering

Be specific about what information you want extracted
Include context about the document type
Use clear, concise language
Avoid ambiguous instructions

Model Selection

Use orion_small for high-volume, simple generic documents
Use orion_medium for balanced performance on generic documents
Use orion_large for complex generic documents requiring high accuracy
Use vscan_small for logistics documents and shipping labels
Use vscan_medium for balanced performance on logistics documents
Use vscan_large for complex logistics documents requiring maximum accuracy

Error Handling

Common Errors

Status	Code	Description
400	`pdf.too_many_pages`	PDF exceeds maximum allowed pages (100)
400	`image.invalid`	Invalid image format or corrupted file
400	`image.safety_violation`	Image content violates safety guidelines
400	`image.no_text`	No extractable text found in image
404	`prompt.not_found`	Specified prompt_id not found
408	`api.timeout`	Request timed out
429	`api.quota_exceeded`	API quota exceeded

Example Error Response

        {
  "error": {
    "message": "PDF exceeds maximum allowed pages. Maximum: 100, Found: 150",
    "code": "pdf.too_many_pages",
    "status": 400
  }
}

VLM (Vision Language Model) Inference

# Supported File Types

# Images

# PDF Documents

# Supported Document Types

# Available Models

# New Inference

# Parameters

# Example Request (Image)

# Example Request (PDF)

# Response Format

# VLM Response Model (Single Page)

# Multi-Page PDF Response

# Retrieve Inference

# Parameters

# Example Request

# List Inferences

# Query Parameters

# Example Request

# VLM Inference Model

# Best Practices

# Image Quality

# PDF Documents

# Prompt Engineering

# Model Selection

# Error Handling

# Common Errors

# Example Error Response

On this page

Supported File Types

Images

PDF Documents

Supported Document Types

Available Models

New Inference

Parameters

Example Request (Image)

Example Request (PDF)

Response Format

VLM Response Model (Single Page)

Multi-Page PDF Response

Retrieve Inference

Parameters

Example Request

List Inferences

Query Parameters

Example Request

VLM Inference Model

Best Practices

Image Quality

PDF Documents

Prompt Engineering

Model Selection

Error Handling

Common Errors

Example Error Response