Thursday, January 11, 2024

A Practical Guide to Using Sequences in Oracle Analytics

Understanding Sequences


Sequences in Oracle Analytics serve as a powerful tool for organizing and executing data flows, datasets, and other sequences in a logical manner. Sequences are particularly beneficial if you need to execute these items on a set schedule or in a particular order, or you want to leverage parallel execution for optimized performance. In this article, we’ll explore the technical advantages of sequences through a fitness-related use case.

Fitness Use Case


Imagine you have data streaming from your wearable device that’s populating new records to an Oracle Autonomous Data Warehouse (ADW) table on a weekly basis. Your goal is to transform and cleanse this data to create a curated dataset for visualizing in a workbook. In addition, you want to train a machine learning model to predict the number of calories burned during workouts that you want to periodically retrain. Here is an overview of the steps:

  1. Data Preparation and Transformation: Use data flows to cleanse the raw wearable device data to create datasets to use in visualizations and for machine learning training and testing.
  2. No Code Machine Learning Model Training: Use the no code machine learning features in data flows to create a model to predict caloric burn.
  3. Model Performance Evaluation: Examine how the model performs on a test dataset and visualize the results in a workbook.
  4. Incorporate External Datasets: Reload a cached weather-related dataset to use in the workbook to analyze trends such as average run pace based on outside temperatures and common running conditions.

The following high-level architecture diagram depicts the solution using wearable device data to address the requirements in the previous list. This solution involves multiple artifacts and requires various job runs.

A Practical Guide to Using Sequences in Oracle Analytics

To simplify and automate this process, we can group these processes into a sequence that runs on a set schedule. Relying on a sequence eliminates the need to configure individual schedules for each artifact. Sequences not only simplify scheduling and execution, but they make sharing with other users much faster. Users can easily share sequences and automatically share their contents and associated artifacts with a few simple clicks.

Building an Efficient Workflow


The following sections explain how to construct the wearable device data solution. The solution is running in a scheduled sequence in a personal Oracle Analytics Cloud (OAC) environment to ensure the data is current.

Step 1: Data Preparation and Transformation

We create a data flow that cleans the wearable device data and creates a curated dataset to use for creating visualizations in a workbook. This data flow also creates training and testing datasets for machine learning purposes. The following screenshot illustrates various transformation steps that were applied, and the three output datasets that were generated.

A Practical Guide to Using Sequences in Oracle Analytics

I used this data flow to create the test and train datasets for machine learning. I used a Branch step to create a branch after the data is cleansed and an Add Columns step with the RAND() function. This function created a column with pseudo-random numbers that fall between 0 and 1. I created another branch to create the two distinct testing and training datasets. To create the testing dataset, I used the Filter step to selectively retrieve rows where the newly added column exceeded 0.7. To create the training dataset, I used the Filter step to retrieve rows where the values in the new column are less than or equal to 0.7. This process allowed me to randomly select train and test data.

A Practical Guide to Using Sequences in Oracle Analytics

Step 2: No Code Machine Learning Model Training

The second data flow involved in the solution uses the training dataset created in the data flow above to generate a numeric prediction model to predict the number of calories burned in each workout. In other words, the output of data flow 1 is used as input in data flow 2.

A Practical Guide to Using Sequences in Oracle Analytics

Step 3: Model Performance Evaluation

The third and final data flow applies the machine learning model generated above to the test dataset generated in the first data flow. The purpose is to validate how well the machine learning model predicts calories burned.

A Practical Guide to Using Sequences in Oracle Analytics

Step 4: Incorporate External Datasets and Group Items in Sequence

It’s clear that the data flows above have many dependencies (for example, data flow 1 generates artifacts used by data flows 2 and 3, meaning it needs to be executed first). This step involves adding these data flows to a sequence, along with a cached weather dataset that requires a refresh to pull up-to-date weather information. The following screenshot shows these three items in the sequence. Notice that the sequence items aren't listed in order in the following screenshot and that the Ordered toggle at the top of the page is unchecked. When this toggle is unchecked, the system executes as many tasks as possible in parallel to optimize performance. It takes into consideration any artifact dependencies to determine the order in which the items need to be executed. If the Ordered toggle is checked, the order in which you place the items matters; the items are executed in the order in which they are placed.

A Practical Guide to Using Sequences in Oracle Analytics

A Practical Guide to Using Sequences in Oracle Analytics

As mentioned earlier, this sequence is running on a schedule.

Visualizing the Results

The following screenshot shows the visualizations created as part of this solution. Because the sequence is running on a schedule, the data is always up-to-date. The first canvas contains visualizations that illustrate the most common workout type, how the running pace has varied throughout training, and how the pace varies based on the outdoor temperature.

A Practical Guide to Using Sequences in Oracle Analytics

The next canvas contains charts generated from the machine learning model predictions. From these visualizations, it’s clear that the model performs well at predicting the calories burned for certain workouts. The visualization depict total caloric expenditure vs predicted caloric expenditure for different workout types and specific activities.

A Practical Guide to Using Sequences in Oracle Analytics

Call to Action


I encourage you to draw inspiration from this article and to leverage the power of data flows and sequences to streamline analytic workflows. By leveraging sequences, you can optimize data processing, enhance automation, and accelerate time-to-value.

Source: oracle.com

Related Posts

0 comments:

Post a Comment