Friday, February 2, 2024

Extract key values with Oracle Analytics and OCI Document Understanding

Extract key values with Oracle Analytics and OCI Document Understanding
Example passport images used in Oracle Analytics to recognize text and expiration dates based on a pre-trained OCI AI model.

Oracle Analytics solutions now integrate with Oracle Cloud Infrastructure (OCI) Document Understanding!

OCI Document Understanding is an OCI AI service that enables developers to extract text, tables, and other key data from document files through APIs and command-line interface tools. With OCI Document Understanding, you can automate tedious business processing tasks with prebuilt AI models and customize document extraction to fit your industry-specific needs.

You can use pre-trained models for text extraction, table extraction, key value extraction, and document classification. Once you choose a pre-trained model, you can upload your images, then register the model in Oracle Analytics Cloud (OAC), and apply your model in a data flow to extract the key values from a resume, a passport, a receipt, or an invoice.

Here's how you can create a data visualization project based on OCI Document Understanding in 4 steps under 5 minutes. The steps are to:

1. Upload sample passport images to Oracle Cloud, into a private (non-public) bucket object storage.
2. Register the OCI Document Understanding model in Oracle Analytics Cloud.
3. Create a data flow to apply the AI model to example passport images.
4. Add the recognized key values to a Dataset and use this dataset in a workbook to visualize the data.
5. A prerequisite is to have an Oracle Analytics Cloud instance with a connection to OCI.

Step 1 – Create a bucket in OCI

1. Connect to OCI at this URL: https://www.oracle.com/cloud/sign-in.html
2. Click the menu option, then Storage, create a Bucket, and set a name.
3. Upload into the bucket all the document images needed to train and test your model.
4. Ensure the bucket is in the same tenancy as OAC. In this example, the bucket is “Bucket-vision-ai” that was previously created, and I uploaded 9 sample passport pictures under a folder named AID.

This step generates a location where OAC will be able to access uploaded images and apply the AI model.

Step 2 – Register your model in Oracle Analytics Cloud

1. On the OAC Home Page, click the 3 little dots menu ("...", or the ellipsis) at the top right corner. 
2. Select “Register Model/Function” and select “OCI Document Understanding Models”.
3. Once selected, choose your OCI connection. If it does not exist, you will need to create a new connection in OAC (Create > Connection > OCI Resource).
4. The window “Select a Model” will pop up. Select the model type “Pretrained Document Key Value Extraction”.
5. In the right-side panel, select your OCI Bucket and select the document type. In this example, it's “Passport”.

Extract key values with Oracle Analytics and OCI Document Understanding
Example of a Selection of Pre-trained Document Key Value Extraction AI models in OAC.

Step 3 – Apply the AI model to your images

1. Create a new data flow in OAC.
2. Create a dataset using a CSV file including your Bucket URL.
3. Add the Dataset to the data flow.
4. Add a step “Apply AI Model” to apply your pre-trained AI model to the images. Select the bucket URL in the Parameters, or File Location if you use a dataset showing images as a line item.
5. Select “Documents” as Input Type if you use itemized images or “Buckets” if you use your bucket URL.
6. Add a step to save the data in a new dataset. The saved data should contain the image name and URLs as well as the key values text/number extracted.

The data flow will load the images, analyze them with the pre-trained AI model, and extract the key values from the images. In this example, the passport documents. Then the data flow will load all the information into a dataset that you can use to explore and visualize the data.

Extract key values with Oracle Analytics and OCI Document Understanding
This screenshot shows an example of the data flow with the results generated by the AI model.

Step 4 – Visualize the results in Oracle Analytics.

1. Create a new Workbook in OAC.
2. Add the new dataset generated by the data flow.
3. Add an Image plugin visualization object to see all the images in your bucket.
4. Use this Image object as a filter by clicking the top left corner filter icon.
5. Create a new table to show all columns of your dataset.
6. Click one of the passport images to see the key value extracted in the table.
7. You can now use this workbook as a starting point to create additional calculation metrics to filter and analyze your documents. In this example, I decided to analyze the expiry date and create a conditional formatting donut to show if it’s expired (Red) or passed (Green).

Extract key values with Oracle Analytics and OCI Document Understanding
Example of the OAC workbook using the images plugin, OCI Document Understanding, and conditional formatting on calculation metrics.

Customers are using OCI Document Understanding to recognize multiple types of documents at scale, from passports to invoices to receipts and resumes. Oracle Analytics allows you to apply and visualize the data in a matter of minutes, helping you go quickly from data to insights, actions, and decisions.

Source: oracle.com

Related Posts

0 comments:

Post a Comment