Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding In-DB Scoring Demo #25

Merged
merged 13 commits into from
Jul 29, 2020
137 changes: 137 additions & 0 deletions In_Db2_Machine_Learning/Building ML Models with Db2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Instructions

This repository contains notebooks and datasets that will allow Db2 customers build ML models with IBM Db2's in-database machine learning capabilities.

# Table of Contents
1. [Prerequistes](#Prerequisites)
2. [Downloading the Dataset](#Downloads)
3. [Loading the Dataset into a Db2 Table](#Loading)
4. [Notebook-specific requirements](#Notebook-specific)
5. [Troubleshooting](#Troubleshooting)
6. [Other Resources](#Resources)

## 1. Prerequisites <a name="Prerequisites"></a>

You must meet the following requirements to use the machine learning functionality in Db2:
- Install the `ibm_db` python package
- Enable IDAX Stored Procedures for ML in your Db2 instance

### 1.1 Installing the ibm_db python package

Please follow the documentation [here](https://github.com/ibmdb/python-ibmdb#-installation) to install the `ibm_db` python package. This will allow you to connect to and communicate with your Db2 instance.


### 1.2 Enable IDAX Stored Proceduces for ML in your Db2 instance

Please follow the documentation [here](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) to enable ML functionality in your Db2 instance.


## 2. Downloading the Datasets <a name="Downloads"></a>
### 2.1 Regression with GoSales

Download the file [GoSales.csv](Datasets/GoSales.csv) from the `Datasets` directory

### 2.2 Classification with Titanic

Download the file [Titanic.csv](Datasets/Titanic.csv) from the `Datasets` directory

## 3. Loading the Dataset into a Db2 Table <a name="Loading"></a>

To load the TITANIC dataset into your Db2 table:

```
db2 start
db2 connect to <database_name>
CREATE TABLE <table_schema>.<table_name> (
PASSENGERID INTEGER NOT NULL,
SURVIVED INTEGER,
PCLASS INTEGER,
NAME VARCHAR(255),
SEX VARCHAR(6),
AGE DECIMAL(5,2),
SIBSP INTEGER,
PARCH INTEGER,
TICKET VARCHAR(255),
FARE DECIMAL(30,5),
CABIN VARCHAR(255),
EMBARKED VARCHAR(3),
PRIMARY KEY (PASSENGERID))
ORGANIZE BY ROW;
db2 IMPORT FROM "<full_path_to_csv>" OF DEL skipcount 1 INSERT INTO
<table_schema>.<table_name>(PASSENGERID, SURVIVED, PCLASS, NAME, SEX, AGE, SIBSP, PARCH, TICKET, FARE, CABIN, EMBARKED)
```

For loading the GO_SALES data you can take the following steps:

```
db2start
connect to <database_name>
CREATE TABLE <table_schema>.<table_name> (
ID INTEGER NOT NULL,
GENDER VARCHAR(3),
AGE INTEGER,
MARITAL_STATUS VARCHAR(30),
PROFESSION VARCHAR(30),
IS_TENT INTEGER,
PRODUCT_LINE VARCHAR(30),
PURCHASE_AMOUNT DECIMAL(30, 5),
PRIMARY KEY (ID))
ORGANIZE BY ROW;
IMPORT FROM "<full_path_to_csv>" OF DEL skipcount 1 INSERT INTO
<table_schema>.<table_name>(ID, GENDER, AGE, MARITAL_STATUS, PROFESSION, IS_TENT, PRODUCT_LINE, PURCHASE_AMOUNT)
```

## 4. Notebook-specific requirements <a name="Notebook-specific"></a>
### 4.1 Using the Classification Notebook
To use the [classification demo](Notebooks/Classification_Demo.ipynb) notebook, please ensure that the following Python libraries are installed in your development environment:
- [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html)
- [Numpy](https://pypi.org/project/numpy/)
- [IPython](https://ipython.org/install.html)
- [Scipy](https://www.scipy.org/install.html)
- [Itertools](https://docs.python.org/3/library/itertools.html)
- [Matplotlib](https://matplotlib.org/users/installing.html)
- [Seaborn](https://pypi.org/project/seaborn/#description)

Once the above prerequisites have been met, ensure that:
- The parameters in the connection string variable `conn_str` have been changed to your particular Db2 instance (cell 2)
- The value of the variable `schema` has been changed to the appropriate schema where the ML pipeline will be executed (cell 2)
- The value `DATA.TITANIC` in cells 8, 11, 14, 17, and 18 is changed to the `<schema_name>.<table_name>` where the csv data was loaded (section 3)

### 4.2 Using the Regression Notebook
To use the [regression demo](Notebooks/Regression_Demo.ipynb) notebook, please ensure that the following Python libraries are installed in your development environment:
- [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html)
- [Numpy](https://pypi.org/project/numpy/)
- [Matplotlib](https://matplotlib.org/users/installing.html)

Also make sure that you have the [InDBMLModules.py](lib/InDBMLModules.py) file in the same directory as your notebook.

Once the above prerequisites have been met, ensure that:
- The parameters in the connection string variable `conn_str` have been changed to your particular Db2 instance (cell 3)
- The value `DATA.GO_SALES` in cells 6,10, and 11 is changed to the `<schema_name>.<table_name>` where the csv data was loaded (section 3)

## 5. Troubleshooting <a name="Troubleshooting"></a>

When using a jupyter notebook, some users may find that they are unable to import a module that has been successfully installed via pip.

Check `sys.executable` to see which Python and environment you're running in, and `sys.path` to see where it looks to import modules:

```
import sys
print(sys.executable)
print(sys.path)
```

If the path in `sys.executable` is not in `sys.path`, you can add it using the following:
`sys.path.append('/path/from/sys.executable')`

## 6. Demo Videos <a name="Resources"></a>

Find step-by-step demonstrations here:
- [Classification with Db2](https://youtu.be/jCgschThiRQ)
- [Linear Regression with Db2](https://youtu.be/RpX0iHL97dc)

Db2 Machine Learning [Documentation](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html)
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<report xmlns="http://developer.cognos.com/schemas/report/15.4/" expressionLocale="en-us" useStyleVersion="11.5">

<drillBehavior/>
<layouts>
<layout>
<reportPages>
<page name="Page1">
<style>
<defaultStyles>
<defaultStyle refStyle="pg"/>
</defaultStyles>
</style>
<pageBody>
<style>
<defaultStyles>
<defaultStyle refStyle="pb"/>
</defaultStyles>
<CSS value="padding-top:15px;padding-left:15px"/></style>
<contents><table><style><defaultStyles><defaultStyle refStyle="tb"/></defaultStyles><CSS value="border-collapse:collapse;width:100%"/></style><tableRows><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Gender:</staticValue></dataSource></textItem></contents></tableCell><tableCell colSpan="4"><contents><selectValue parameter="pGender" refQuery="Query2" autoSubmit="true"><useItem refDataItem="GENDER"><displayItem refDataItem="GENDER1"/></useItem><defaultSelections><defaultSimpleSelection>M</defaultSimpleSelection></defaultSelections></selectValue></contents></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Age:</staticValue></dataSource></textItem></contents><style><CSS value="width:25%"/></style></tableCell><tableCell><contents><textBox parameter="pAge" numbersOnly="true"><defaultSelections><defaultSimpleSelection>33</defaultSimpleSelection></defaultSelections></textBox></contents><style><CSS value="width:25%"/></style></tableCell><tableCell colSpan="2"><contents/><style><CSS value="width:50%;text-align:left;vertical-align:top"/></style></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Marital Status: </staticValue></dataSource></textItem></contents></tableCell><tableCell colSpan="4"><contents><selectValue parameter="pMaritalStatus" refQuery="Query3" autoSubmit="true"><useItem refDataItem="MARITAL_STATUS"><displayItem refDataItem="MARITAL_STATUS1"/></useItem><defaultSelections><defaultSimpleSelection>Married</defaultSimpleSelection></defaultSelections></selectValue></contents></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Profession:</staticValue></dataSource></textItem></contents></tableCell><tableCell colSpan="4"><contents><selectValue parameter="pProfession" refQuery="Query4" autoSubmit="true"><useItem refDataItem="PROFESSION"><displayItem refDataItem="PROFESSION1"/></useItem><defaultSelections><defaultSimpleSelection>Professional</defaultSimpleSelection></defaultSelections></selectValue></contents></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Has purchased tent before:</staticValue></dataSource></textItem></contents></tableCell><tableCell colSpan="4"><contents><selectValue parameter="pIsTent" refQuery="Query5" autoSubmit="true"><useItem refDataItem="IS_TENT"><displayItem refDataItem="IS_TENT1"/></useItem><defaultSelections><defaultSimpleSelection>1</defaultSimpleSelection></defaultSelections></selectValue></contents></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Product line:</staticValue></dataSource></textItem></contents></tableCell><tableCell><contents><selectValue parameter="pProductLine" refQuery="Query6" autoSubmit="true"><useItem refDataItem="PRODUCT_LINE"><displayItem refDataItem="PRODUCT_LINE1"/></useItem><defaultSelections><defaultSimpleSelection>Camping Equipment</defaultSimpleSelection></defaultSelections></selectValue></contents><style><CSS value="width:33%"/></style></tableCell><tableCell><contents/><style><CSS value="width:33%;text-align:left;vertical-align:top"/></style></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><table><style><defaultStyles><defaultStyle refStyle="tb"/></defaultStyles><CSS value="border-collapse:collapse;height:100%;width:100%;font-weight:bold;font-style:italic"/></style><tableRows><tableRow><tableCells><tableCell><contents><table><style><defaultStyles><defaultStyle refStyle="tb"/></defaultStyles><CSS value="border-collapse:collapse;width:100%"/></style><tableRows><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Estimated purchase amount: </staticValue></dataSource></textItem></contents><style><CSS value="text-align:right"/></style></tableCell><tableCell><contents><singleton name="Singleton1" refQuery="Query1">
<contents><textItem><dataSource><dataItemValue refDataItem="PURCHASE_AMOUNT"/></dataSource><style><dataFormat><currencyFormat currencyCode="USD" useIntlSymbol="false" decimalDelimiter="." decimalSize="2"/></dataFormat></style></textItem></contents>
</singleton></contents><style><CSS value="text-align:left"/></style></tableCell></tableCells></tableRow></tableRows></table></contents><style><CSS value="text-align:left;vertical-align:top;width:50%"/></style></tableCell></tableCells></tableRow></tableRows></table></contents><style><defaultStyles><defaultStyle refStyle="GuidedLayoutTopPadding"/><defaultStyle refStyle="GuidedLayoutRightPadding"/></defaultStyles></style></tableCell><tableCell><contents><promptButton type="reprompt">
<contents><textItem><dataSource><staticValue>Estimate</staticValue></dataSource></textItem></contents>
<style>
<defaultStyles>
<defaultStyle refStyle="bp"/>
</defaultStyles>
</style>
</promptButton></contents><style><CSS value="width:20%;text-align:left;vertical-align:top"/></style></tableCell><tableCell><contents/><style><CSS value="text-align:left;vertical-align:top"/></style></tableCell></tableCells></tableRow></tableRows></table></contents>
</pageBody>
<pageHeader><contents><table><style><CSS value="border-collapse:collapse;width:100%"/><defaultStyles><defaultStyle refStyle="tb"/></defaultStyles></style><tableRows><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Predict purchase amount</staticValue></dataSource><style><CSS value="font-size:18pt"/></style></textItem></contents></tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents/></tableCell></tableCells></tableRow></tableRows></table></contents><style><CSS value="padding-top:15px;padding-left:15px"/></style></pageHeader></page>
</reportPages>
</layout>
</layouts>
<queries><query name="Query1"><source><model/></source><selection><dataItem aggregate="total" name="PURCHASE_AMOUNT"><expression>[C].[C_In_DB_ML_Model].[PredictSP].[PURCHASE_AMOUNT]</expression><XMLAttributes><XMLAttribute output="no" name="RS_dataType" value="9"/><XMLAttribute output="no" name="RS_dataUsage" value="2"/></XMLAttributes></dataItem></selection></query><query name="Query2"><source><model/></source><selection><dataItem aggregate="none" name="GENDER"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[GENDER]</expression></dataItem><dataItem aggregate="none" sort="ascending" name="GENDER1"><expression>case [C].[C_In_DB_ML_Model].[GO_Sales].[GENDER]
when (&apos;M&apos;) then (&apos;Male&apos;)
when (&apos;F&apos;) then (&apos;Female&apos;)
else (null)
end</expression></dataItem></selection></query><query name="Query5"><source><model/></source><selection><dataItem aggregate="none" name="IS_TENT"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[IS_TENT]</expression></dataItem><dataItem aggregate="none" sort="ascending" name="IS_TENT1"><expression>case [C].[C_In_DB_ML_Model].[GO_Sales].[IS_TENT]
when (1) then(&apos;Yes&apos;)
when (0) then (&apos;No&apos;)
when (null) then (&apos;Not Specified&apos;)
end</expression></dataItem></selection></query><query name="Query3"><source><model/></source><selection><dataItem aggregate="none" name="MARITAL_STATUS"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[MARITAL_STATUS]</expression></dataItem><dataItem aggregate="none" sort="ascending" name="MARITAL_STATUS1"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[MARITAL_STATUS]</expression></dataItem></selection></query><query name="Query4"><source><model/></source><selection><dataItem aggregate="none" name="PROFESSION"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[PROFESSION]</expression></dataItem><dataItem aggregate="none" sort="ascending" name="PROFESSION1"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[PROFESSION]</expression></dataItem></selection></query><query name="Query6"><source><model/></source><selection><dataItem aggregate="none" name="PRODUCT_LINE"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[PRODUCT_LINE]</expression></dataItem><dataItem aggregate="none" sort="ascending" name="PRODUCT_LINE1"><expression>[C].[C_In_DB_ML_Model].[GO_Sales].[PRODUCT_LINE]</expression></dataItem></selection></query></queries><XMLAttributes><XMLAttribute output="no" name="RS_CreateExtendedDataItems" value="true"/><XMLAttribute output="no" name="listSeparator" value=","/><XMLAttribute output="no" name="decimalSeparator" value="."/><XMLAttribute output="no" name="RS_modelModificationTime" value="2020-06-23T14:56:20.507Z"/></XMLAttributes><classStyles><classStyle name="GuidedLayoutLeftPadding"><CSS value="padding-left:5px;border-top-width:1px;border-bottom-width:1px;border-left-width:1px;border-right-width:1px"/></classStyle><classStyle name="GuidedLayoutTopPadding"><CSS value="padding-top:5px;border-top-width:1px;border-bottom-width:1px;border-left-width:1px;border-right-width:1px"/></classStyle><classStyle name="GuidedLayoutRightPadding"><CSS value="padding-right:5px;border-top-width:1px;border-bottom-width:1px;border-left-width:1px;border-right-width:1px"/></classStyle><classStyle name="GuidedLayoutBottomPadding"><CSS value="padding-bottom:5px;border-top-width:1px;border-bottom-width:1px;border-left-width:1px;border-right-width:1px"/></classStyle><classStyle name="GuidedLayoutMargin"><CSS value="margin-bottom:10px"/></classStyle></classStyles><modelPath type="module">CAMID(&quot;BluePages:u:uid=0d5135649,c=ca,ou=bluepages&quot;)/folder[@name=&apos;My Folders&apos;]/folder[@name=&apos;in-DB ML&apos;]/module[@name=&apos;In-DB ML Model&apos;]</modelPath><reportName>GoSales_predict_report</reportName></report>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading