1. Vision SDK
  2. Android

Vision SDK

Android

Barcode and QR Code scanner framework for Android. VisionSDK provides a way to detect barcode and qr codes. It also provides the functionality for information extraction from different kind of logistic labels like shipping labels, inventory labels, bill of ladings, receipts & invoices

Installation

Vision SDK is hosted on JitPack.io

First add JitPack to your root project

        maven { url "https://jitpack.io" }

      

Then add the following dependency to your project's build.gradle file:

        implementation 'com.github.packagexlabs:vision-sdk-android:v2.0.18'

      

Usage

Initialization

In order to use the OCR API for information extraction, you have to set following parameters

  • Constants.apiKey to your API key.
  • Constants.apiEnvironment you also need to specify the API environment that you have the API key for (sandbox or production).
NOTE

Please note that these have to be set before using the API call. You can generate your own API key at cloud.packagex.io. You can find the instruction guide here.

Initialise the SDK first:

There are 2 ways for authentication

  1. Via API key
  2. Via Token
        VisionSDK.getInstance().initialize(
    environment = //TODO environment,
    authentication = //TODO authentication,
)

      

Basic Usage

You can find the sample code for using the vision-sdk on our github.

To start scanning for barcode and QR codes, use the startScanning method and specify the view type:

        private fun startScanning() {

    //setting the scanning window configuration
    binding.customScannerView.startScanning(
        viewType = screenState.scanningWindow,
        scanningMode = screenState.scanningMode,
        detectionMode = screenState.detectionMode,
        scannerCallbacks = this
    )
}

      

Configuration

Set initial setting/configurations of the camera:

            visionCameraView.setStateAndFocusSettings(ScreenState(), FocusSettings())

    val nthFrameToProcess = 15
    visionCameraView.shouldOnlyProcessNthFrame(nthFrameToProcess)

      

To start scanning for barcode, QR codes, text or documents, use the startCamera method after it is initialized. See the code below for example:

        private fun startScanning() {

    visionCameraView.setObjectDetectionConfiguration(
        ObjectDetectionConfiguration(
           isTextIndicationOn = true,
           isBarcodeOrQRCodeIndicationOn = true,
           isDocumentIndicationOn = true,
           secondsToWaitBeforeDocumentCapture = 3
        )
    )
    visionCameraView.setScannerCallback(object : ScannerCallback {
      fun detectionCallbacks(barcodeDetected: Boolean, qrCodeDetected: Boolean, textDetected: Boolean, documentDetected: Boolean) {

      }

      fun onBarcodesDetected(barcodeList: List<String>) {

      }

      fun onFailure(exception: ScannerException) {

      }

      fun onImageCaptured(bitmap: Bitmap, imageFile: File?, value: List<String>) {
         // bitmap: The bitmap of captured image
         // imageFile: Optional image file if user requested the image to be saved as file also.
         // value: List of barcode that were detected in the given image.
      }
    })

    visionCameraView.setCameraLifecycleCallback(object : CameraLifecycleCallback {
      fun onCameraStarted()
      fun onCameraStopped()
    })

    visionCameraView.initialize {
        visionCameraView.startCamera()
    }
}

      

Note that, once onBarcodesDetected or onImageCaptured callbacks are called, VisionSDK will stop analyzing camera feed for text or barcode/QR codes. This is to prevent extra processing and battery consumption. When client wants to start analyzing camera feed again, after consuming the results of previous scan, client needs to call the following function:

          visionCameraView.rescan()

      

Scanning Modes

There are 2 types of scanning mode

  1. Auto mode will auto-detect any Barcode or QR code based on the detection mode
  2. Manual mode will detect Barcode or QR code upon calling Capture

Detection Modes

Detection mode is used to specify what the user is trying to detect

  1. Barcode detects only barcode
  2. QRCode detects only QR codes
  3. OCR for OCR detection. This mode will capture an image for the user. User can then decide to extract logistic data from it using either API call or on-device OCR capabilities.
  4. PriceTag is under progress
  5. Photo is used to take images without any processing

Trigger Manual Capture

As mentioned above that we have a manual mode. For manual mode trigger, you can call visionCameraView.capture(), based on the mode it will be giving different callbacks

If detection mode is

  1. DetectionMode.Barcode then it will trigger the onBarcodeDetected callback in case of barcode detection or throw an exception.
  2. DetectionMode.QRCode will return QR code or throw exception.
  3. DetectionMode.OCR will capture an image along with the current barcode and return in onImageCaptured

Make sure that when calling capture, scan mode should be manual

Capturing image:

You can capture an image when mode is OCR. In OCR mode when capture is called, then in the callback, it will return an image.

        visionCameraView.captureImage()

      
        fun onImageCaptured(bitmap: Bitmap, imageFile: File?, value: List<String>) {
   //Image along with the barcode
}

      

In the callback, it will return the image bitmap along with the barcode list in the current frame.

Making OCR Call:

To make an API call, first thing that the user needs to make sure is that VisionSDK is initialized. See Initialization for details.

Once the user has successfully initialized VisionSDK, the they can use class ApiManager for making API calls.

Following are the APIs that are available for our users through VisionSDK:

  1. Shipping Label (both online and on-device)
  2. Bill of Lading (only online)

Logistic information can be extracted from image in these two contexts. Users can post these images using the following suspended functions from ApiManager class.

Shipping Label

        val jsonResponse = ApiManager().shippingLabelApiCallSync(
   bitmap = bitmap,
   barcodeList = list,
   locationId = "OPTIONAL_LOCATION_ID"
   recipient = mapOf(
      "contact_id" to "CONTACT_ID_HERE"
   ),
   sender = mapOf(
      "contact_id" to "CONTACT_ID_HERE"
   ),
   options = mapOf(
      "match" to mapOf(
         "search" to listOf("recipients"),
         "location" to true
      ),
      "postprocess" to mapOf(
         "require_unique_hash" to false
      ),
      "transform" to mapOf(
         "tracker" to "outbound",
         "use_existing_tracking_number" to false
      )
   ),
   metadata = mapOf(
      "Test" to "Pass"
   )
)

      

Bill of Lading

        val jsonResponse = ApiManager().manifestApiCallSync(
   bitmap = bitmap,
   barcodeList = list
)

      

Making On-Device Shipping Label Call (usage without internet):

You need internet to download the important files for the first time. These files are used for processing and extracting data from the images. In order to do that, you need to use configure() function of the class OnDeviceOCRManager.

        val onDeviceOCRManager = OnDeviceOCRManager(
   context: Context,
   platformType: PlatformType,
   modelClass: ModelClass,
   modelSize: ModelSize
)

      

PlatformType is an enum with following values:

        Native // For Android native apps
Flutter // For Flutter apps
ReactNative // For React Native apps

      

ModelClass is an enum that is used to inform the VisionSDK of the kind of OCR capabilities you require in your app. Following are its option:

        ShippingLabel
BillOfLading
PriceTag

      

PriceTag is not currently supported.

ModelSize is another enum (an optional parameter here) that is used to inform the VisionSDK about the type of files that it should download. Following are the types of model:

        Nano
Micro
Small
Medium
Large
XLarge

      

Currently, only Micro and Large are supported.

After creating the instance of OnDeviceOCRManager, you need to call its configure() function. This suspend function will download the important files, if needed, and then load them in memory.

        onDeviceOCRManager.configure(
   executionProvider: ExecutionProvider = ExecutionProvider.NNAPI,
   progressListener: (suspend (Float) -> Unit)? = null
)

      

ExecutionProvider parameter is optional and usually the default value works fine.

The progressListener (also optional) parameter can be used to track the progress of downloading and loading of files. Its value is from 0.0 to 1.0.

After configuring it successfully, you need to call the function getPredictions() and pass it the bitmap of the image you want to perform OCR operations on and list of barcode to try to extract information using regex.

        onDeviceOCRManager.getPredictions(bitmap: Bitmap, barcodes: List<String>): String

      

The result of getPredictions() function is a JSON response in String form.

Once you are done with the OnDeviceOCRManager, make sure that you destroy its instance by calling the following method:

        onDeviceOCRManager.destroy()

      

Report an issue

VisionSDK contains internal error reporting mechanism if it faces any issue. Furthermore, if you get a response from On-Device models, that you consider to be incorrect, then you can report it using the following function:

        val reportResult: ReportResult = ApiManager().reportAnIssueSync(
   context: Context,
   apiKey = "YOUR_API_KEY_HERE", // You only need to pass one of the two params (apiKey or token)
   token = "YOUR_TOKEN_HERE",
   platformType: PlatformType = PlatformType.Native,
   modelClass: ModelClass,
   modelSize: ModelSize,
   report: String,
   customData: Map<String, Any?>? = null,
   base64ImageToReportOn: String? = null
)