As noted in Introducing Oracle Machine Learning for Python, OML4Py is included with Oracle Autonomous Database, making the open source Python scripting language and environment ready for the enterprise and big data.
To get started with OML4Py, log into your Oracle Machine Learning Notebooks account and create a new notebook. If you don't have one yet, you can create an Autonomous Database account using your Oracle Always Free Services and follow this OML Notebooks tutorial.
Load the OML package
In the initial paragraph, specify %python as your interpreter. At this point, you can invoke Python code. However, to use OML4Py, import the package oml. Click the "run this paragraph" button. You can optionally invoke oml.isconnected to verify your connection, which should return true.
%python
import oml
oml.isconnected()
Load a Pandas DataFrame to the database
There are several way to load data into Oracle Autonomous Database. In this first example, we create a table using the sklearn iris data set. We combine the target and predictors into a single Pandas DataFrame and load this DataFrame object into an Oracle Autonomous Database table using the create function.
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data,
columns = ["SEPAL_LENGTH", "SEPAL_WIDTH",
"PETAL_LENGTH", "PETAL_WIDTH"])
y = pd.DataFrame(list(map(lambda x: {0:'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
iris_df = pd.concat([x,y], axis=1)
IRIS = oml.create(iris_df, table="IRIS")
print("Shape:",IRIS.shape)
print("Columns:",IRIS.columns)
IRIS.head(4)
The script above produces the following output. Note that we access shape and columns properties on the proxy object, just as we would with a Pandas DataFrame. Similarly, we invoke the overloaded head function on the IRIS proxy object.
Shape: (150, 5)
Columns: ['SEPAL_LENGTH', 'SEPAL_WIDTH', 'PETAL_LENGTH', 'PETAL_WIDTH', 'Species']
Out[6]:
SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
This table is also readily available in the user schema under the name IRIS, just as any other database table.
Using overloaded functions
Using the numeric columns, we compute the correlation matrix on the in-database table IRIS using the overloaded corr function. Here, we see that petal length and petal width are highly correlated.
IRIS.corr()
With the output:
SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH
SEPAL_LENGTH 1.000000 -0.109369 0.871754 0.817954
SEPAL_WIDTH -0.109369 1.000000 -0.420516 -0.356544
PETAL_LENGTH 0.871754 -0.420516 1.000000 0.962757
PETAL_WIDTH 0.817954 -0.356544 0.962757 1.000000
OML4Py overloads graphics functions as well. Here, we use boxplot to show the distribution of the numeric columns. In such overloaded functions, the statistical computations take place in the database - avoiding data movement and leveraging Autonomous Database as a high performance compute engine - returning only the summary statistics needed to produce the plot.
import matplotlib.pyplot as plt
plt.style.use('seaborn')
plt.figure(figsize=[10,5]))
oml.graphics.boxplot(IRIS[:, :4], notch=True, showmeans = True,
labels=IRIS.columns[:4])
plt.title('Distribution of IRIS Attributes')
plt.ylabel('cm');
0 comments:
Post a Comment