Monday, March 1, 2021

Oracle Database Oracle Machine Learning

Using Oracle Machine Learning for Python - building predictive models

March 01, 2021 By DB Exam Study 0 Comment

In my previous post, we introduced some basic features of OML4Py. In this post, I'll illustrate using some in-database machine learning algorithms.

Creating in-database predictive models

In this example, we use the Random Forest algorithm and the optional cost matrix input to predict Species from the IRIS table. First, we get a DataFrame proxy object to the IRIS table, use the overloaded split function to obtain training and test data sets, which are also proxy objects (data remains in the database). We then create and display a demo cost matrix - just for illustration - before building a Random Forest model. Note that the cost matrix is also a named database table, RF_COST.

IRIS = oml.sync(table = "IRIS") # obtain proxy object

IRIS_TRAIN, IRIS_TEST = IRIS.split() # split data using default ratio

TRAIN_x = IRIS_TRAIN[0].drop('Species')

TRAIN_y = IRIS_TRAIN[0]['Species']

# Create a cost matrix table in the database

cost_matrix = [['setosa', 'setosa', 0],

['setosa', 'virginica', 0.2],

['setosa', 'versicolor', 0.8],

['virginica', 'virginica', 0],

['virginica', 'setosa', 0.5],

['virginica', 'versicolor', 0.5],

['versicolor', 'versicolor', 0],

['versicolor', 'setosa', 0.4],

['versicolor', 'virginica', 0.6]]

COST = oml.create(pd.DataFrame(cost_matrix,

columns = ['ACTUAL_TARGET_VALUE',

'PREDICTED_TARGET_VALUE',

'COST']),

"RF_COST")

COST.pull().pivot(index='ACTUAL_TARGET_VALUE',

columns='PREDICTED_TARGET_VALUE',values='COST')

With the output:

PREDICTED_TARGET_VALUE setosa versicolor virginica

ACTUAL_TARGET_VALUE

setosa 0.0 0.8 0.2

versicolor 0.4 0.0 0.6

virginica 0.5 0.5 0.0

To build our Random Forest model, we create an rf object, specifying a maximum tree depth of 4, and invoke fit on our training data with the cost matrix. To display model details, we print the object. Note that the Random Forest model also reports attribute importance.

from oml import rf

rf_mod = rf(tree_term_max_depth = '4')

rf_mod = rf_mod.fit(TRAIN_x, TRAIN_y, COST)

rf_mod # Show model details

With the output:

Algorithm Name: Random Forest

Mining Function: CLASSIFICATION

Target: Species

Settings:

setting name setting value

0 ALGO_NAME ALGO_RANDOM_FOREST

1 CLAS_COST_TABLE_NAME "PYQUSER"."RF_COST"

2 CLAS_MAX_SUP_BINS 32

3 CLAS_WEIGHTS_BALANCED OFF

4 ODMS_DETAILS ODMS_ENABLE

5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO

6 ODMS_RANDOM_SEED 0

7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE

8 PREP_AUTO ON

9 RFOR_NUM_TREES 20

10 RFOR_SAMPLING_RATIO .5

11 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI

12 TREE_TERM_MAX_DEPTH 4

13 TREE_TERM_MINPCT_NODE .05

14 TREE_TERM_MINPCT_SPLIT .1

15 TREE_TERM_MINREC_NODE 10

16 TREE_TERM_MINREC_SPLIT 20

Computed Settings:

setting name setting value

0 RFOR_MTRY 2

Global Statistics:

AVG_DEPTH AVG_NODECOUNT MAX_DEPTH MAX_NODECOUNT MIN_DEPTH \

0 3.25 5.6 4.0 6.0 4.0

MIN_NODECOUNT NUM_ROWS

0 4.0 102.0

Attributes:

SEPAL_LENGTH

SEPAL_WIDTH

PETAL_LENGTH

PETAL_WIDTH

Partition: NO

Importance:

ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_IMPORTANCE

0 PETAL_LENGTH None 0.584287

1 PETAL_WIDTH None 0.512901

2 SEPAL_LENGTH None 0.136424

3 SEPAL_WIDTH None 0.038918

Use this Random Forest in-database model to make predictions

The rf_mod object is itself a proxy object to its corresponding in-database model. We predict using the IRIS_TEST proxy object and optionally specify supplemental columns from the original data. Supplemental columns are useful to compare actual target values against predicted values, or to include unique keys for subsequent row identification or joining with other tables. You may recall that row order is not implicit on results from relational databases unless a sort (i.e., ORDER BY) is specified by the user. Why? Sorting of often unnecessary and a potentially expensive operation depending on data size. The predicted results in pred are also a DataFrame proxy object, since the result could be large (e.g., millions or billions of rows).

pred = rf_mod.predict(IRIS_TEST.drop('Species'),

supplemental_cols = test_dat[:, ['SEPAL_LENGTH',

'SEPAL_WIDTH',

'PETAL_LENGTH',

'Species']])

print("Shape:",pred.shape)

pred.head(10)

With the output:

Shape: (48, 5)

Out[16]:

SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH Species PREDICTION

0 4.9 3.0 1.4 setosa setosa

1 4.9 3.1 1.5 setosa setosa

2 4.8 3.4 1.6 setosa setosa

3 5.8 4.0 1.2 setosa setosa

4 5.1 3.5 1.4 setosa setosa

5 5.0 3.4 1.6 setosa setosa

6 5.2 3.5 1.5 setosa setosa

7 5.4 3.4 1.5 setosa setosa

8 4.9 3.1 1.5 setosa setosa

9 5.0 3.2 1.2 setosa setosa

Oracle Database Tutorial and Material, Oracle Database Learning, Oracle Database Certification, Oracle Database Guides, Database Career, Database Exam Study

Next, let's compute a confusion matrix for the model predictions to assess model accuracy. Using the overloaded crosstab function, we compute the confusion matrix between the actual values in Species and the predicted result in the PREDICTION column directly in the database.

res_ct = pred.crosstab('Species','PREDICTION',pivot=True)

print("Type:",type(res_ct))

print("Columns:",res_ct.columns)

res_ct.sort_values(by='Species')

With the output:

Type:

Columns: ['Species', 'count_(setosa)', 'count_(versicolor)', 'count_(virginica)']

Species count_(setosa) count_(versicolor) count_(virginica)

0 setosa 16 0 0

1 versicolor 0 13 2

2 virginica 0 0 17

Oracle Database Oracle Machine Learning

DB Exam Study

DB Exam Study

Monday, March 1, 2021

Using Oracle Machine Learning for Python - building predictive models

Creating in-database predictive models

Use this Random Forest in-database model to make predictions

0 comments:

Post a Comment

Blog Archive

Labels

Popular Posts

Total Pageviews

Google Translate

DB Exam Study

Monday, March 1, 2021

Using Oracle Machine Learning for Python - building predictive models

Creating in-database predictive models

Use this Random Forest in-database model to make predictions

Related Posts

0 comments:

Post a Comment

Blog Archive

Labels

Popular Posts

Subscribe To

Total Pageviews

Google Translate