Thursday, February 25, 2021

Oracle Machine Learning

Getting started with Oracle Machine Learning for Python

February 25, 2021 By DB Exam Study 0 Comment

Oracle Machine Learning, Python, Oracle Database Exam Prep, Oracle Database Certification, Oracle Database Preparation, Database Prep, Oracle Database Exam Prep

As noted in Introducing Oracle Machine Learning for Python, OML4Py is included with Oracle Autonomous Database, making the open source Python scripting language and environment ready for the enterprise and big data.

To get started with OML4Py, log into your Oracle Machine Learning Notebooks account and create a new notebook. If you don't have one yet, you can create an Autonomous Database account using your Oracle Always Free Services and follow this OML Notebooks tutorial.

Load the OML package

In the initial paragraph, specify %python as your interpreter. At this point, you can invoke Python code. However, to use OML4Py, import the package oml. Click the "run this paragraph" button. You can optionally invoke oml.isconnected to verify your connection, which should return true.

%python

import oml

oml.isconnected()

Load a Pandas DataFrame to the database

There are several way to load data into Oracle Autonomous Database. In this first example, we create a table using the sklearn iris data set. We combine the target and predictors into a single Pandas DataFrame and load this DataFrame object into an Oracle Autonomous Database table using the create function.

from sklearn.datasets import load_iris

import pandas as pd

iris = load_iris()

x = pd.DataFrame(iris.data,

columns = ["SEPAL_LENGTH", "SEPAL_WIDTH",

"PETAL_LENGTH", "PETAL_WIDTH"])

y = pd.DataFrame(list(map(lambda x: {0:'setosa', 1: 'versicolor',

2:'virginica'}[x], iris.target)),

columns = ['Species'])

iris_df = pd.concat([x,y], axis=1)

IRIS = oml.create(iris_df, table="IRIS")

print("Shape:",IRIS.shape)

print("Columns:",IRIS.columns)

IRIS.head(4)

The script above produces the following output. Note that we access shape and columns properties on the proxy object, just as we would with a Pandas DataFrame. Similarly, we invoke the overloaded head function on the IRIS proxy object.

Shape: (150, 5)

Columns: ['SEPAL_LENGTH', 'SEPAL_WIDTH', 'PETAL_LENGTH', 'PETAL_WIDTH', 'Species']

Out[6]:

SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH Species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

This table is also readily available in the user schema under the name IRIS, just as any other database table.

Using overloaded functions

Using the numeric columns, we compute the correlation matrix on the in-database table IRIS using the overloaded corr function. Here, we see that petal length and petal width are highly correlated.

IRIS.corr()

With the output:

SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH

SEPAL_LENGTH 1.000000 -0.109369 0.871754 0.817954

SEPAL_WIDTH -0.109369 1.000000 -0.420516 -0.356544

PETAL_LENGTH 0.871754 -0.420516 1.000000 0.962757

PETAL_WIDTH 0.817954 -0.356544 0.962757 1.000000

OML4Py overloads graphics functions as well. Here, we use boxplot to show the distribution of the numeric columns. In such overloaded functions, the statistical computations take place in the database - avoiding data movement and leveraging Autonomous Database as a high performance compute engine - returning only the summary statistics needed to produce the plot.

import matplotlib.pyplot as plt

plt.style.use('seaborn')

plt.figure(figsize=[10,5]))

oml.graphics.boxplot(IRIS[:, :4], notch=True, showmeans = True,

labels=IRIS.columns[:4])

plt.title('Distribution of IRIS Attributes')

plt.ylabel('cm');

In-database attribute importance

Let's rank the relative importance of each variable (a.k.a., attribute or predictor) to predict the target 'Species' from the IRIS table.

We define the ai (attribute importance) object, compute the result, and show the attribute importance ranking.

In the result, notice that petal width is most predictive of the target species. The importance value produced by this algorithm provides a relative ranking to be used to distinguish importance among variables.

from oml import ai

# here we use sync to get handle to existing table

IRIS = oml.sync(table = "IRIS")

IRIS_x = IRIS.drop('Species')

IRIS_y = IRIS['Species']

ai_obj = ai() # Create attribute importance object

ai_obj = ai_obj.fit(IRIS_x, IRIS_y)

ai_obj

With the output:

Algorithm Name: Attribute Importance

Mining Function: ATTRIBUTE_IMPORTANCE

Settings:

setting name setting value

0 ALGO_NAME ALGO_AI_MDL

1 ODMS_DETAILS ODMS_ENABLE

2 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO

3 ODMS_SAMPLING ODMS_SAMPLING_DISABLE

4 PREP_AUTO ON

Global Statistics:

attribute name attribute value

0 NUM_ROWS 150

Attributes:

PETAL_LENGTH

PETAL_WIDTH

SEPAL_LENGTH

SEPAL_WIDTH

Partition: NO

Importance:

variable importance rank

0 PETAL_WIDTH 1.050935 1

1 PETAL_LENGTH 1.030633 2

2 SEPAL_LENGTH 0.454824 3

3 SEPAL_WIDTH 0.191514 4

Change your service level

In your notebook, you can change the service level of your connection to Oracle Autonomous Database to take advantage of different parallelism options. Available parallelism is relative to your autonomous database compute resource settings. Click the gear icon in the upper right (as indicated by the arrow in the figure), then click individual interpreters to turn them on or off, and click and drag each interpreter box to change the default service level. The 'low' binding runs your functions and queries without parallelism, 'medium' allows limited parallelism, and 'high' allows your functions and queries to use up to the maximum number of compute resources allocated to your Autonomous Database.

Oracle Machine Learning

DB Exam Study

DB Exam Study

Thursday, February 25, 2021

Getting started with Oracle Machine Learning for Python

Load the OML package

Load a Pandas DataFrame to the database

Using overloaded functions

In-database attribute importance

Change your service level

0 comments:

Post a Comment

Blog Archive

Labels

Popular Posts

Total Pageviews

Google Translate

DB Exam Study

Thursday, February 25, 2021

Getting started with Oracle Machine Learning for Python

Load the OML package

Load a Pandas DataFrame to the database

Using overloaded functions

In-database attribute importance

Change your service level

Related Posts

0 comments:

Post a Comment

Blog Archive

Labels

Popular Posts

Subscribe To

Total Pageviews

Google Translate