diff --git a/In_Db2_Machine_Learning/Datasets/GoSales.csv b/In_Db2_Machine_Learning/Building ML Models with Db2/Datasets/GoSales.csv similarity index 100% rename from In_Db2_Machine_Learning/Datasets/GoSales.csv rename to In_Db2_Machine_Learning/Building ML Models with Db2/Datasets/GoSales.csv diff --git a/In_Db2_Machine_Learning/Datasets/Titanic.csv b/In_Db2_Machine_Learning/Building ML Models with Db2/Datasets/Titanic.csv similarity index 100% rename from In_Db2_Machine_Learning/Datasets/Titanic.csv rename to In_Db2_Machine_Learning/Building ML Models with Db2/Datasets/Titanic.csv diff --git a/In_Db2_Machine_Learning/Notebooks/Classification_Demo.ipynb b/In_Db2_Machine_Learning/Building ML Models with Db2/Notebooks/Classification_Demo.ipynb similarity index 100% rename from In_Db2_Machine_Learning/Notebooks/Classification_Demo.ipynb rename to In_Db2_Machine_Learning/Building ML Models with Db2/Notebooks/Classification_Demo.ipynb diff --git a/In_Db2_Machine_Learning/Notebooks/Regression_Demo.ipynb b/In_Db2_Machine_Learning/Building ML Models with Db2/Notebooks/Regression_Demo.ipynb similarity index 100% rename from In_Db2_Machine_Learning/Notebooks/Regression_Demo.ipynb rename to In_Db2_Machine_Learning/Building ML Models with Db2/Notebooks/Regression_Demo.ipynb diff --git a/In_Db2_Machine_Learning/Building ML Models with Db2/README.md b/In_Db2_Machine_Learning/Building ML Models with Db2/README.md new file mode 100644 index 0000000..6a747c2 --- /dev/null +++ b/In_Db2_Machine_Learning/Building ML Models with Db2/README.md @@ -0,0 +1,137 @@ +# Instructions + +This repository contains notebooks and datasets that will allow Db2 customers build ML models with IBM Db2's in-database machine learning capabilities. + +# Table of Contents +1. [Prerequistes](#Prerequisites) +2. [Downloading the Dataset](#Downloads) +3. [Loading the Dataset into a Db2 Table](#Loading) +4. [Notebook-specific requirements](#Notebook-specific) +5. [Troubleshooting](#Troubleshooting) +6. [Other Resources](#Resources) + +## 1. Prerequisites + +You must meet the following requirements to use the machine learning functionality in Db2: +- Install the `ibm_db` python package +- Enable IDAX Stored Procedures for ML in your Db2 instance + +### 1.1 Installing the ibm_db python package + +Please follow the documentation [here](https://github.com/ibmdb/python-ibmdb#-installation) to install the `ibm_db` python package. This will allow you to connect to and communicate with your Db2 instance. + + +### 1.2 Enable IDAX Stored Proceduces for ML in your Db2 instance + +Please follow the documentation [here](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) to enable ML functionality in your Db2 instance. + + +## 2. Downloading the Datasets +### 2.1 Regression with GoSales + +Download the file [GoSales.csv](Datasets/GoSales.csv) from the `Datasets` directory + +### 2.2 Classification with Titanic + +Download the file [Titanic.csv](Datasets/Titanic.csv) from the `Datasets` directory + +## 3. Loading the Dataset into a Db2 Table + +To load the TITANIC dataset into your Db2 table: + +``` +db2 start +db2 connect to + +CREATE TABLE . ( +PASSENGERID INTEGER NOT NULL, +SURVIVED INTEGER, +PCLASS INTEGER, +NAME VARCHAR(255), +SEX VARCHAR(6), +AGE DECIMAL(5,2), +SIBSP INTEGER, +PARCH INTEGER, +TICKET VARCHAR(255), +FARE DECIMAL(30,5), +CABIN VARCHAR(255), +EMBARKED VARCHAR(3), +PRIMARY KEY (PASSENGERID)) +ORGANIZE BY ROW; + +db2 IMPORT FROM "" OF DEL skipcount 1 INSERT INTO +.(PASSENGERID, SURVIVED, PCLASS, NAME, SEX, AGE, SIBSP, PARCH, TICKET, FARE, CABIN, EMBARKED) +``` + +For loading the GO_SALES data you can take the following steps: + +``` +db2start +connect to + +CREATE TABLE . ( +ID INTEGER NOT NULL, +GENDER VARCHAR(3), +AGE INTEGER, +MARITAL_STATUS VARCHAR(30), +PROFESSION VARCHAR(30), +IS_TENT INTEGER, +PRODUCT_LINE VARCHAR(30), +PURCHASE_AMOUNT DECIMAL(30, 5), +PRIMARY KEY (ID)) +ORGANIZE BY ROW; + +IMPORT FROM "" OF DEL skipcount 1 INSERT INTO +.(ID, GENDER, AGE, MARITAL_STATUS, PROFESSION, IS_TENT, PRODUCT_LINE, PURCHASE_AMOUNT) +``` + +## 4. Notebook-specific requirements +### 4.1 Using the Classification Notebook +To use the [classification demo](Notebooks/Classification_Demo.ipynb) notebook, please ensure that the following Python libraries are installed in your development environment: +- [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html) +- [Numpy](https://pypi.org/project/numpy/) +- [IPython](https://ipython.org/install.html) +- [Scipy](https://www.scipy.org/install.html) +- [Itertools](https://docs.python.org/3/library/itertools.html) +- [Matplotlib](https://matplotlib.org/users/installing.html) +- [Seaborn](https://pypi.org/project/seaborn/#description) + +Once the above prerequisites have been met, ensure that: +- The parameters in the connection string variable `conn_str` have been changed to your particular Db2 instance (cell 2) +- The value of the variable `schema` has been changed to the appropriate schema where the ML pipeline will be executed (cell 2) +- The value `DATA.TITANIC` in cells 8, 11, 14, 17, and 18 is changed to the `.` where the csv data was loaded (section 3) + +### 4.2 Using the Regression Notebook +To use the [regression demo](Notebooks/Regression_Demo.ipynb) notebook, please ensure that the following Python libraries are installed in your development environment: +- [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html) +- [Numpy](https://pypi.org/project/numpy/) +- [Matplotlib](https://matplotlib.org/users/installing.html) + +Also make sure that you have the [InDBMLModules.py](lib/InDBMLModules.py) file in the same directory as your notebook. + +Once the above prerequisites have been met, ensure that: +- The parameters in the connection string variable `conn_str` have been changed to your particular Db2 instance (cell 3) +- The value `DATA.GO_SALES` in cells 6,10, and 11 is changed to the `.` where the csv data was loaded (section 3) + +## 5. Troubleshooting + +When using a jupyter notebook, some users may find that they are unable to import a module that has been successfully installed via pip. + +Check `sys.executable` to see which Python and environment you're running in, and `sys.path` to see where it looks to import modules: + +``` +import sys +print(sys.executable) +print(sys.path) +``` + +If the path in `sys.executable` is not in `sys.path`, you can add it using the following: +`sys.path.append('/path/from/sys.executable')` + +## 6. Demo Videos + +Find step-by-step demonstrations here: +- [Classification with Db2](https://youtu.be/jCgschThiRQ) +- [Linear Regression with Db2](https://youtu.be/RpX0iHL97dc) + +Db2 Machine Learning [Documentation](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) \ No newline at end of file diff --git a/In_Db2_Machine_Learning/lib/InDBMLModules.py b/In_Db2_Machine_Learning/Building ML Models with Db2/lib/InDBMLModules.py similarity index 100% rename from In_Db2_Machine_Learning/lib/InDBMLModules.py rename to In_Db2_Machine_Learning/Building ML Models with Db2/lib/InDBMLModules.py diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Cognos_Report_Specs.txt b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Cognos_Report_Specs.txt new file mode 100644 index 0000000..c85009c --- /dev/null +++ b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Cognos_Report_Specs.txt @@ -0,0 +1,42 @@ + + + + + + + + + + + Gender:MAge:33Marital Status: MarriedProfession:ProfessionalHas purchased tent before:1Product line:Camping Equipment
Estimated purchase amount: + +
+ Estimate + + +
+ Predict purchase amount
+
+
+
+ [C].[C_In_DB_ML_Model].[PredictSP].[PURCHASE_AMOUNT][C].[C_In_DB_ML_Model].[GO_Sales].[GENDER]case [C].[C_In_DB_ML_Model].[GO_Sales].[GENDER] +when ('M') then ('Male') +when ('F') then ('Female') +else (null) +end[C].[C_In_DB_ML_Model].[GO_Sales].[IS_TENT]case [C].[C_In_DB_ML_Model].[GO_Sales].[IS_TENT] +when (1) then('Yes') +when (0) then ('No') +when (null) then ('Not Specified') +end[C].[C_In_DB_ML_Model].[GO_Sales].[MARITAL_STATUS][C].[C_In_DB_ML_Model].[GO_Sales].[MARITAL_STATUS][C].[C_In_DB_ML_Model].[GO_Sales].[PROFESSION][C].[C_In_DB_ML_Model].[GO_Sales].[PROFESSION][C].[C_In_DB_ML_Model].[GO_Sales].[PRODUCT_LINE][C].[C_In_DB_ML_Model].[GO_Sales].[PRODUCT_LINE]CAMID("BluePages:u:uid=0d5135649,c=ca,ou=bluepages")/folder[@name='My Folders']/folder[@name='in-DB ML']/module[@name='In-DB ML Model']GoSales_predict_report
\ No newline at end of file diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard.png b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard.png new file mode 100644 index 0000000..744976b Binary files /dev/null and b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard.png differ diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard2.png b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard2.png new file mode 100644 index 0000000..3261e46 Binary files /dev/null and b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard2.png differ diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard3.png b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard3.png new file mode 100644 index 0000000..e39098e Binary files /dev/null and b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Dashboard3.png differ diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Gear.png b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Gear.png new file mode 100644 index 0000000..b80d8f3 Binary files /dev/null and b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/Gear.png differ diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/README.md b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/README.md new file mode 100644 index 0000000..5e7e616 --- /dev/null +++ b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/README.md @@ -0,0 +1,150 @@ +# Instructions + +This folder contains assets that will allow Db2 customers to use in-Db2 machine learning functionality for single row scoring inside the database. + +## Table of Contents +1. [Prerequisites](#Prerequisites) +2. [Create the scoring pipeline stored procedure](#CreateSP) +3. [Calling the stored procedure from Cognos Analytics](#Cognos) +4. [Other resources](#Resources) + +## 1. Prerequisites + +You must meet the following requirements to use the machine learning functionality in Db2: +- Enable IDAX Stored Procedures for ML in your Db2 instance +- Have a linear regression model trained based on the GoSales dataset + +### 1.1 Enable IDAX Stored Proceduces for ML in your Db2 instance +Please follow the documentation [here](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) to enable ML functionality in your Db2 instance. + +### 1.2 Train a linear regression model to predict the PURCHASE_AMOUNT using the GoSales dataset + +Please follow the steps provided [here](../Building%20ML%20Models%20with%20Db2/README.md) to download the GoSales dataset and load it into your Db2 database. Then use this [notebook](../Building%20ML%20Models%20with%20Db2/Notebooks/Regression_Demo.ipynb) to pre-process the data, collect and store the statistics, train and deploy a linear regression model in your database. + +Once you take these steps, verify that the following tables are created and contain values: +- GO_SALES_SUM1000_NUM (statistics for the numerical columns) +- GO_SALES_SUM1000_CHAR (statistics for the nominal columns) +- GSLINREG_MODEL (model table) + +## 2. Create the scoring pipeline stored procedure +### 2.1 Copy the stored procedure file + +Copy the file [scoring_pipeline](scoring_pipeline) to your working directory. The steps needed to write the code in this file in described in this demo. + +### 2.2 Create the stored procedure +Execute the following command to create or update the stored procedure: + +``` +db2 -td@ -vf scoring_pipeline +``` + +The @ after -td option indicates the last character in the file, which is called the terminating character. The -v option indicates using the optional verbosity which will cause each SQL statement or command in the script to be displayed to the screen as it is run, along with any output that results from its execution. The -f option indicates that the target of the command is a file. + +### 2.3 Authorize the users who need to execute the pipeline +Once the stored procedure is created, we can call it from either Db2 command line or from a third-party application that is connected to the database, using credentials with the proper authorizations. Specifically, the credentials used must have the EXECUTE privilege on the created stored procedure. Execute the following command to grant execute privilege to `` on your stored procedure `` : + +``` +db2 "grant execute on procedure to user " +``` + +### 2.4 Test your stored procedure +Call your stored procedure using the following command: +db2 "call pipe(1, 'M', 45, 'Single', NULL, 0, NULL)" + +You should see the result printed like this: + +``` + + Result set 1 + -------------- + + PURCHASE_AMOUNT + ------------------------ + +1.21634933813948E+002 + + 1 record(s) selected. + + Return Status = 0 + +``` + +## 3. Calling the stored procedure from Cognos Analytics + +We use Cognos Analytics platform in this demo, but this solution would generalize to different applications that need to interact with Machine Learning models from Db2. In any other use-cases, you would need to: +1) Establish a connection to your Db2 database +2) Call the stored procedure by passing the input parameters +3) Retrieve and use the result which is returned as a table with PURCHASE_AMOUNT as the only column. + +In this demo we call the scoring pipeline from a Cognos Analytics report that connects to Db2 using a JDBC connection. The dashboard receives the input parameters from the user and passes them to the stored procedure. The stored procedure computes and returns the result that is then presented to the user in the dashboard. The following sections describe the steps needed to implement this solution in Cognos Analytics. + +### 3.1. Create a connection to your Db2 database +Follow the instructions provided [here](https://www.ibm.com/support/knowledgecenter/SSEP7J_11.1.0/com.ibm.swg.ba.cognos.ug_cra.doc/c_db2_ds.html#DB2_ds) to create a data source connection in your Cognos Analytics environment. + +### 3.2. Create a data module +Follow the instructions provided [here](https://www.ibm.com/support/knowledgecenter/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.ca_mdlg.doc/t_ca_mldg_src_dataserver.html) to create a data module using the connection to your Db2 database. + +### 3.3. Add the scoring stored procedure to your data module +In your data module, take the following steps to add the scoring stored procedure. In your data module, click on the “…” right to the data module root node. From the menu select *New Table using SQL…* + +![Dashboard](Dashboard.png) + +In the *Edit Table* window, add the following command to the *Expression* box: + +``` +CALL ( 1, + #prompt('pGender' +,'varchar(10)' +,'') # +, + #prompt('pAge' +,'integer' +,'') # +, + #prompt('pMaritalStatus' +,'varchar(50)' +,'') # +, + #prompt('pProfession' +,'varchar(50)' +,'') # +, + #prompt('pIsTent' +,'smallint' +,'') # +, + #prompt('pProductLine' +,'varchar(50)' +,'') # + +) ; +``` + +In the above command, replace `` with your stored procedure name. +The `#prompt()#` blocks make the command parametric so that the input values can be passed from a report. +In `#prompt(,,)# `, `` is the parameter name which will be used for authoring a report, `` is the data type as declared in the stored procedure, and `` is the default value to be passed. Let us call this table *PredictSP*. + +### 3.5. Set the SQL Type +From the SQL Type drop down menu select “Pass-Through SQL”. Click OK. Click Save. + +![Dashboard2](Dashboard2.png) + +### 3.6. Add the table containing the training data +In your data module, add the table from your database that contains the training data. Let us call this table *GO_Sales*. This table will be used to create pre-populate the prompts in the report. + +### 3.7. Use the created table in a report +Open a blank report based on the data module you just created and take the following steps: +- Add the *PURCHASE_AMOUNT* from the table you created from the stored procedure *(PredictSP)* to your report as a *Singleton*. This shows the prediction returned from Db2 in your report. +- Add a *value prompt control* to your report for each input parameter. These controls receive the model input values from the user. +- For each prompt create a query that points to the relevant column of the table containing the training data (Go_Sales). Then in the *Properties* tab, choose that column as *Use Value* and *Display Value* properties. This will pre-populate the prompt when the report runs. +- Add a *Prompt button*. When the report runs, by clicking this button you pass the parameters from the prompt controls to the stored procedure. +- Click *Save*. + +Alternatively, you can copy the report specification provided [here](Cognos_Report_Specs.txt) into your clipboard, and then from *More* menu( ![Gear](Gear.png) ) select *Open report from clipboard*. Click *Save*. + +![Dashboard3](Dashboard3.png) + +## 4. Other resources +Find step-by-step demonstrations for creating the scoring pipeline stored procedure here: +- Building a scoring pipeline inside Db2 + +Db2 Machine Learning [Documentation](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) diff --git a/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/scoring_pipeline b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/scoring_pipeline new file mode 100644 index 0000000..58ab79c --- /dev/null +++ b/In_Db2_Machine_Learning/Building a Scoring Pipeline with Db2/scoring_pipeline @@ -0,0 +1,50 @@ +CREATE OR REPLACE PROCEDURE mingpang.pipe(IN ID_VALUE INT, IN GENDER_VALUE VARCHAR(10),IN AGE_VALUE DOUBLE, IN MARITAL_STATUS_VALUE VARCHAR(50), IN PROFESSION_VALUE VARCHAR(50),IN IS_TENT_VALUE SMALLINT, IN PRODUCT_LINE_VALUE VARCHAR(50)) +LANGUAGE SQL +DYNAMIC RESULT SETS 1 +MODIFIES SQL DATA +BEGIN + +-- Declare block +declare myqry varchar(1024); +declare callsp varchar(1024); +declare dropt varchar(1024); +declare tname varchar(128); +declare cretemptab varchar(1024); +declare mystmt statement; +DECLARE C1 CURSOR WITH RETURN FOR mystmt; + +-- Delare an input table +DECLARE GLOBAL TEMPORARY TABLE SESSION.INPUT(ID INT GENERATED BY DEFAULT AS IDENTITY NOT NULL,GENDER VARCHAR(10),STD_AGE DOUBLE,MARITAL_STATUS VARCHAR(50),PROFESSION VARCHAR(50),IS_TENT SMALLINT,PRODUCT_LINE VARCHAR(50)) ON COMMIT PRESERVE ROWS WITH REPLACE; +-- Insert values into the table +INSERT INTO SESSION.INPUT(ID, GENDER, STD_AGE, MARITAL_STATUS, PROFESSION, IS_TENT, PRODUCT_LINE) + VALUES(ID_VALUE, GENDER_VALUE, AGE_VALUE, MARITAL_STATUS_VALUE, PROFESSION_VALUE, IS_TENT_VALUE, PRODUCT_LINE_VALUE); + +-- Null impute numerical columns with mean +UPDATE SESSION.INPUT SET STD_AGE = (SELECT AVERAGE FROM LINREG.GSTRAIN_STATS_NUM WHERE COLUMNNAME='AGE') WHERE STD_AGE IS NULL; +-- Null impute nominal with most frequent +UPDATE SESSION.INPUT SET GENDER = (SELECT MOSTFREQUENTVALUE FROM LINREG.GSTRAIN_STATS_CHAR WHERE COLNAME='GENDER') WHERE GENDER IS NULL; +UPDATE SESSION.INPUT SET MARITAL_STATUS = (SELECT MOSTFREQUENTVALUE FROM LINREG.GSTRAIN_STATS_CHAR WHERE COLNAME='MARITAL_STATUS') WHERE MARITAL_STATUS IS NULL; +UPDATE SESSION.INPUT SET PROFESSION = (SELECT MOSTFREQUENTVALUE FROM LINREG.GSTRAIN_STATS_CHAR WHERE COLNAME='PROFESSION') WHERE PROFESSION IS NULL; +UPDATE SESSION.INPUT SET PRODUCT_LINE = (SELECT MOSTFREQUENTVALUE FROM LINREG.GSTRAIN_STATS_CHAR WHERE COLNAME='PRODUCT_LINE') WHERE PRODUCT_LINE IS NULL; +UPDATE SESSION.INPUT SET IS_TENT = (SELECT MOSTFREQUENTVALUE FROM LINREG.GSTRAIN_STATS_CHAR WHERE COLNAME='IS_TENT') WHERE IS_TENT IS NULL; +-- Standardize Age +UPDATE SESSION.INPUT SET STD_AGE = ((CAST(STD_AGE AS FLOAT) - (SELECT AVERAGE FROM LINREG.GSTRAIN_STATS_NUM WHERE COLUMNNAME='AGE'))/(SELECT STDDEV FROM LINREG.GSTRAIN_STATS_NUM WHERE COLUMNNAME='AGE')); + +-- randomly generate output table name +set tname = TRANSLATE ( CHAR(BIGINT(RAND() * 10000000000 )), 'abcdefghij', '1234567890' ) ; + +-- Call predict stored procedure +set callsp = 'CALL IDAX.PREDICT_LINEAR_REGRESSION(''model=LINREG.GSLINREG, intable=SESSION.INPUT, outtable =LINREG.'||tname||', id=ID'')'; +execute immediate callsp; + +-- Retrieve the prediction result +set cretemptab = 'DECLARE GLOBAL TEMPORARY TABLE SESSION.TEMPTAB AS (select * from LINREG.'||tname||') with data ON COMMIT PRESERVE ROWS WITH REPLACE'; +execute immediate cretemptab; + +set myqry = 'SELECT PURCHASE_AMOUNT FROM SESSION.TEMPTAB'; +prepare mystmt from myqry; +open c1; + +set dropt = 'DROP TABLE LINREG.'||tname; +execute immediate dropt; +END@ \ No newline at end of file diff --git a/In_Db2_Machine_Learning/README.md b/In_Db2_Machine_Learning/README.md index 6da86a6..e799ed6 100644 --- a/In_Db2_Machine_Learning/README.md +++ b/In_Db2_Machine_Learning/README.md @@ -1,137 +1,5 @@ -# Instructions - This repository contains notebooks and datasets that will allow Db2 customers to use in-Db2 machine learning functionality. -# Table of Contents -1. [Prerequistes](#Prerequisites) -2. [Downloading the Dataset](#Downloads) -3. [Loading the Dataset into a Db2 Table](#Loading) -4. [Notebook-specific requirements](#Notebook-specific) -5. [Troubleshooting](#Troubleshooting) -6. [Other Resources](#Resources) - -## 1. Prerequisites - -You must meet the following requirements to use the machine learning functionality in Db2: -- Install the `ibm_db` python package -- Enable IDAX Stored Procedures for ML in your Db2 instance - -### 1.1 Installing the ibm_db python package - -Please follow the documentation [here](https://github.com/ibmdb/python-ibmdb#-installation) to install the `ibm_db` python package. This will allow you to connect to and communicate with your Db2 instance. - - -### 1.2 Enable IDAX Stored Proceduces for ML in your Db2 instance - -Please follow the documentation [here](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) to enable ML functionality in your Db2 instance. - - -## 2. Downloading the Datasets -### 2.1 Regression with GoSales - -Download the file [GoSales.csv](https://github.com/IBM/db2-samples/tree/master/In_Db2_Machine_Learning/Datasets/GoSales.csv) from the `Datasets` directory - -### 2.2 Classification with Titanic - -Download the file [Titanic.csv](https://github.com/IBM/db2-samples/tree/master/In_Db2_Machine_Learning/Datasets/Titanic.csv) from the `Datasets` directory - -## 3. Loading the Dataset into a Db2 Table - -To load the TITANIC dataset into your Db2 table: - -``` -db2 start -db2 connect to - -CREATE TABLE . ( -PASSENGERID INTEGER NOT NULL, -SURVIVED INTEGER, -PCLASS INTEGER, -NAME VARCHAR(255), -SEX VARCHAR(6), -AGE DECIMAL(5,2), -SIBSP INTEGER, -PARCH INTEGER, -TICKET VARCHAR(255), -FARE DECIMAL(30,5), -CABIN VARCHAR(255), -EMBARKED VARCHAR(3), -PRIMARY KEY (PASSENGERID)) -ORGANIZE BY ROW; - -db2 IMPORT FROM "" OF DEL skipcount 1 INSERT INTO -.(PASSENGERID, SURVIVED, PCLASS, NAME, SEX, AGE, SIBSP, PARCH, TICKET, FARE, CABIN, EMBARKED) -``` - -For loading the GO_SALES data you can take the following steps: - -``` -db2start -connect to - -CREATE TABLE . ( -ID INTEGER NOT NULL, -GENDER VARCHAR(3), -AGE INTEGER, -MARITAL_STATUS VARCHAR(30), -PROFESSION VARCHAR(30), -IS_TENT INTEGER, -PRODUCT_LINE VARCHAR(30), -PURCHASE_AMOUNT DECIMAL(30, 5), -PRIMARY KEY (ID)) -ORGANIZE BY ROW; - -IMPORT FROM "" OF DEL skipcount 1 INSERT INTO -.(ID, GENDER, AGE, MARITAL_STATUS, PROFESSION, IS_TENT, PRODUCT_LINE, PURCHASE_AMOUNT) -``` - -## 4 Notebook-specific requirements -### 4.1 Using the Classification Notebook -To use the [classification demo](https://github.com/IBM/db2-samples/tree/master/In_Db2_Machine_Learning/Notebooks/Classification_Demo.ipynb) notebook, please ensure that the following Python libraries are installed in your development environment: -- [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html) -- [Numpy](https://pypi.org/project/numpy/) -- [IPython](https://ipython.org/install.html) -- [Scipy](https://www.scipy.org/install.html) -- [Itertools](https://docs.python.org/3/library/itertools.html) -- [Matplotlib](https://matplotlib.org/users/installing.html) -- [Seaborn](https://pypi.org/project/seaborn/#description) - -Once the above prerequisites have been met, ensure that: -- The parameters in the connection string variable `conn_str` have been changed to your particular Db2 instance (cell 2) -- The value of the variable `schema` has been changed to the appropriate schema where the ML pipeline will be executed (cell 2) -- The value `DATA.TITANIC` in cells 8, 11, 14, 17, and 18 is changed to the `.` where the csv data was loaded (section 3) - -### 4.2 Using the Regression Notebook -To use the [regression demo](https://github.com/IBM/db2-samples/tree/master/In_Db2_Machine_Learning/Notebooks/Regression_Demo.ipynb) notebook, please ensure that the following Python libraries are installed in your development environment: -- [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html) -- [Numpy](https://pypi.org/project/numpy/) -- [Matplotlib](https://matplotlib.org/users/installing.html) - -Also make sure that you have the [InDBMLModules.py](https://github.com/IBM/db2-samples/tree/master/In_Db2_Machine_Learning/lib/InDBMLModules.py) file in the same directory as your notebook. - -Once the above prerequisites have been met, ensure that: -- The parameters in the connection string variable `conn_str` have been changed to your particular Db2 instance (cell 3) -- The value `DATA.GO_SALES` in cells 6,10, and 11 is changed to the `.` where the csv data was loaded (section 3) - -## 5. Troubleshooting - -When using a jupyter notebook, some users may find that they are unable to import a module that has been successfully installed via pip. - -Check `sys.executable` to see which Python and environment you're running in, and `sys.path` to see where it looks to import modules: - -``` -import sys -print(sys.executable) -print(sys.path) -``` - -If the path in `sys.executable` is not in `sys.path`, you can add it using the following: -`sys.path.append('/path/from/sys.executable')` - -## 6. Demo Videos - -Find step-by-step demonstrations here: -- [Classification with Db2](https://youtu.be/jCgschThiRQ) -- [Linear Regression with Db2](https://youtu.be/RpX0iHL97dc) - -Db2 Machine Learning [Documentation](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.ml.doc/doc/ml_prereqs.html) \ No newline at end of file +This repository contains the following example use-cases: +- **Building ML Models with Db2**: Build machine learning models entirely in-database with IBM Db2's in-database machine learning capabilities +- **Building a Scoring Pipeline with Db2**: Use in-Db2 machine learning functionality for single row scoring inside the database \ No newline at end of file