Thursday, February 25, 2021

Getting started with Oracle Machine Learning for Python

Oracle Machine Learning, Python, Oracle Database Exam Prep, Oracle Database Certification, Oracle Database Preparation, Database Prep, Oracle Database Exam Prep

As noted in Introducing Oracle Machine Learning for Python, OML4Py is included with Oracle Autonomous Database, making the open source Python scripting language and environment ready for the enterprise and big data. 

To get started with OML4Py, log into your Oracle Machine Learning Notebooks account and create a new notebook. If you don't have one yet, you can create an Autonomous Database account using your Oracle Always Free Services and follow this OML Notebooks tutorial.

Load the OML package

In the initial paragraph, specify %python as your interpreter. At this point, you can invoke Python code. However, to use OML4Py, import the package oml. Click the "run this paragraph" button. You can optionally invoke oml.isconnected to verify your connection, which should return true.

%python

import oml

oml.isconnected()

Load a Pandas DataFrame to the database

There are several way to load data into Oracle Autonomous Database. In this first example, we create a table using the sklearn iris data set. We combine the target and predictors into a single Pandas DataFrame and load this DataFrame object into an Oracle Autonomous Database  table using the create function.

from sklearn.datasets import load_iris

import pandas as pd

iris = load_iris()

x = pd.DataFrame(iris.data, 

                 columns = ["SEPAL_LENGTH", "SEPAL_WIDTH", 

                            "PETAL_LENGTH", "PETAL_WIDTH"])

y = pd.DataFrame(list(map(lambda x: {0:'setosa', 1: 'versicolor', 

                                     2:'virginica'}[x], iris.target)), 

                 columns = ['Species'])

iris_df = pd.concat([x,y], axis=1)

IRIS = oml.create(iris_df, table="IRIS")

print("Shape:",IRIS.shape)

print("Columns:",IRIS.columns)

IRIS.head(4)

The script above produces the following output. Note that we access shape and columns properties on the proxy object, just as we would with a Pandas DataFrame. Similarly, we invoke the overloaded head function on the IRIS proxy object.

Shape: (150, 5)

Columns: ['SEPAL_LENGTH', 'SEPAL_WIDTH', 'PETAL_LENGTH', 'PETAL_WIDTH', 'Species']

Out[6]:

   SEPAL_LENGTH  SEPAL_WIDTH  PETAL_LENGTH  PETAL_WIDTH Species

0           5.1          3.5           1.4          0.2  setosa

1           4.9          3.0           1.4          0.2  setosa

2           4.7          3.2           1.3          0.2  setosa

3           4.6          3.1           1.5          0.2  setosa

This table is also readily available in the user schema under the name IRIS, just as any other database table.

Using overloaded functions

Using the numeric columns, we compute the correlation matrix on the in-database table IRIS using the overloaded corr function. Here, we see that petal length and petal width are highly correlated.

IRIS.corr()

With the output:

              SEPAL_LENGTH  SEPAL_WIDTH  PETAL_LENGTH  PETAL_WIDTH

SEPAL_LENGTH      1.000000    -0.109369      0.871754     0.817954

SEPAL_WIDTH      -0.109369     1.000000     -0.420516    -0.356544

PETAL_LENGTH      0.871754    -0.420516      1.000000     0.962757

PETAL_WIDTH       0.817954    -0.356544      0.962757     1.000000

OML4Py overloads graphics functions as well. Here, we use boxplot to show the distribution of the numeric columns. In such overloaded functions, the statistical computations take place in the database - avoiding data movement and leveraging Autonomous Database as a high performance compute engine - returning only the summary statistics needed to produce the plot.

import matplotlib.pyplot as plt

plt.style.use('seaborn')

plt.figure(figsize=[10,5]))

oml.graphics.boxplot(IRIS[:, :4], notch=True, showmeans = True,

                     labels=IRIS.columns[:4])

plt.title('Distribution of IRIS Attributes')

plt.ylabel('cm');

Oracle Machine Learning, Python, Oracle Database Exam Prep, Oracle Database Certification, Oracle Database Preparation, Database Prep, Oracle Database Exam Prep

In-database attribute importance


Let's rank the relative importance of each variable (a.k.a., attribute or predictor) to predict the target 'Species' from the IRIS table.

We define the ai (attribute importance) object, compute the result, and show the attribute importance ranking.

In the result, notice that petal width is most predictive of the target species. The importance value produced by this algorithm provides a relative ranking to be used to distinguish importance among variables.

from oml import ai

# here we use sync to get handle to existing table
IRIS = oml.sync(table = "IRIS")
IRIS_x = IRIS.drop('Species')
IRIS_y = IRIS['Species']

ai_obj = ai()  # Create attribute importance object
ai_obj = ai_obj.fit(IRIS_x, IRIS_y)
ai_obj 

With the output:

Algorithm Name: Attribute Importance

Mining Function: ATTRIBUTE_IMPORTANCE

Settings: 
                   setting name            setting value
0                     ALGO_NAME              ALGO_AI_MDL
1                  ODMS_DETAILS              ODMS_ENABLE
2  ODMS_MISSING_VALUE_TREATMENT  ODMS_MISSING_VALUE_AUTO
3                 ODMS_SAMPLING    ODMS_SAMPLING_DISABLE
4                     PREP_AUTO                       ON

Global Statistics: 
  attribute name  attribute value
0       NUM_ROWS              150

Attributes: 
PETAL_LENGTH
PETAL_WIDTH
SEPAL_LENGTH
SEPAL_WIDTH

Partition: NO

Importance: 

       variable  importance  rank
0   PETAL_WIDTH    1.050935     1
1  PETAL_LENGTH    1.030633     2
2  SEPAL_LENGTH    0.454824     3
3   SEPAL_WIDTH    0.191514     4

Change your service level


In your notebook, you can change the service level of your connection to Oracle Autonomous Database to take advantage of different parallelism options. Available parallelism is relative to your autonomous database compute resource settings. Click the gear icon in the upper right (as indicated by the arrow in the figure), then click individual interpreters to turn them on or off, and click and drag each interpreter box to change the default service level. The 'low' binding runs your functions and queries without parallelism, 'medium' allows limited parallelism, and 'high' allows your functions and queries to use up to the maximum number of compute resources allocated to your Autonomous Database.

Oracle Machine Learning, Python, Oracle Database Exam Prep, Oracle Database Certification, Oracle Database Preparation, Database Prep, Oracle Database Exam Prep

Related Posts

0 comments:

Post a Comment