Skip to content

Commit

Permalink
Add spatial location jupyter notebook sample (#38)
Browse files Browse the repository at this point in the history
  • Loading branch information
czentgr authored Jul 3, 2021
1 parent 2dc4155 commit aa20693
Show file tree
Hide file tree
Showing 6 changed files with 985 additions and 5 deletions.
31 changes: 26 additions & 5 deletions spatial/README_spatial_samples.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,24 @@ README file for Db2 Spatial Analytics Samples

File: samples/spatial/README_spatial_samples.txt

The Db2 Spatial Analytics samples consist of one demo program.
- One sample is based on banking (branches, customers, employees).
This banking demo is written in SQL scripts run by the command-line
processor (CLP).
The Db2 Spatial Analytics samples consist of one demo program
and one jupyter notebook.
1. The demo program (bank) sample is based on banking (branches, customers, employees).
This banking demo is written in SQL scripts run by the command-line
processor (CLP).
2. The jupyter notebook (location) is based on using spatial data to find
a new location for company MYCO that is expanding.

This file briefly introduces the demo and indicates where to look for
further information.

Note: as of Db2 V11.5.6 these samples are not part of a Db2 installation.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
The Banking Demo is implemented in SQL scripts that are run with the Db2
command line processor. You can use the demo and scripts as a tutorial.
The scripts and README file "saBankDemoREADME.txt" are located in the
"bank" subdirectory (sqllib/extenders/samples/spatial/bank).
"bank" subdirectory (samples/spatial/bank).
The following excerpt from that file gives an introduction to the demo:
*****************************************************************************
Banking Customer Analysis Sample
Expand Down Expand Up @@ -59,3 +64,19 @@ After the Banking Demo runs, the complete record of its actions can be found
in the file "sa_bank.log" which is in the "tmp", subdirectory under the home
directory of the user who ran the demo.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
The Location demo is implemented as Jupyter notebook along with a supporting
SQL script to load the data.
You can use the demo as tutorial into using Jupyter notebooks.
The scripts and README file "README.txt" are located in the
"location" subdirectory (samples/spatial/location).

The demo is a Spatial Analytics Jupyter notebook version of the Spatial Extender demo found here:
https://www.ibm.com/blogs/cloud-archive/2015/08/location-location-location/

Files:
samples/location/location_demo.ipynb - the Jupyter notebook
samples/location/load_data.sql - support script to create the data tables run from the notebook
samples/location/README.txt - a more detailed README
samples/data/geo_county.zip - GEO_COUNTY table dataset
samples/data/geo_customer.zip - GEO_CUSTOMER table dataset
Binary file added spatial/data/geo_county.zip
Binary file not shown.
Binary file added spatial/data/geo_customer.zip
Binary file not shown.
69 changes: 69 additions & 0 deletions spatial/location/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
README file for Db2 Spatial Analytics Location Sample

*
*
* (C) COPYRIGHT INTERNATIONAL BUSINESS MACHINES CORPORATION 2021.
*
ALL RIGHTS RESERVED.
*


File: samples/spatial/location/README.txt

The Db2 Spatial Analytics sample consists of a Jupyter notebook with
supporting SQL script init_env.sql.
This file briefly introduces each demo and indicates where to look for
further information.


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
The demo is implemented in Jupyter notebook creating and using a locally
created database. It uses the Db2 command line provess (CLP)
and the IBM Python driver to interact with the database and local instance.
You can use the demo and scripts as a tutorial to work with a Jupyter notebook,
perform queries and display data on a map.
The data used is located in spatial/samples/data and consists of two tables:
- a customer table containing customer (fake) information.
- a county table containing all US counties with census information.

The script load_data.sql assumes that the CSV files with the data are co-located
with the notebook and scripts. Thus, prior to running the script in the notebook
extract the data
spatial/data/geo_customer.zip
spatial/data/geo_county.zip
followed by either copying the data to the directory of the script and notebook
or change the SQL script to point to the appropriate path for the files.

The following excerpt from that file gives an introduction to the demo:
*****************************************************************************
This demo illustrates adding a spatial dimension to an existing information system.
The existing system did not contain any explicit location (spatial) data.
However, the existing system did contain implicit location data in the
form of addresses. By spatially enabling the existing database,
the user expands the business analysis capabilities of the system.

This demo is a a jupyter notebook version of
https://www.ibm.com/blogs/cloud-archive/2015/08/location-location-location/

In this scenario, a small company (MYCO) has two offices, but business has been growing and there are
now customers across the country. Many of the customers have expressed a preference to meet company
representatives in person. The company owners want to explore where to open a new office.

Some of the questions in MYCO company owners want to answer are:

We already have some ideas where to open a new office.
- How can we find out which of these potential locations can serve the most customers?
- How can we reach the customers with the highest business volume?
- Are there other locations that should be considered?

Spatial analysis functions can help find the answers.

On Db2 Warehouse on Cloud the geospatial data used to bring this example to life can be found in the SAMPLE schema.
It contains data about customers in the GEO_CUSTOMER table and county data in the GEO_COUNTY table
in the Spatial Extender format and need conversion into the Spatial Analytics format first.
However, this notebook also works with Spatial Extender. Only the DB2GSE schema is necessary to be used in
queries for any spatial functions.
You can use the Tables menu to view the structure and browse the content of these tables.

For more information on Spatial Analytics visit the documentation:
https://www.ibm.com/docs/en/db2/11.5?topic=data-db2-spatial-analytics
62 changes: 62 additions & 0 deletions spatial/location/load_data.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
--Prep the database for the location demo

-- connect to bludb;

DROP TABLE GEO_TEMP;
DROP TABLE GEO_COUNTY;
DROP TABLE GEO_CUSTOMER;

--------------------------------------
-- Create the SRS needed for the demo.
--------------------------------------

CALL ST_DROP_SRS('SAMPLE_GCS_WGS_1984');
CALL ST_CREATE_SRS('SAMPLE_GCS_WGS_1984', 1005, -400, -400, 1.111948722222222E9,-100000,10000,-100000,10000,'GCS_WGS_1984',NULL, NULL,'GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]]', 'location demo srs');

------------------------
-- Load GEO_COUNTY table
------------------------

-- Load raw data.
CREATE TABLE GEO_TEMP (OBJECTID INTEGER NOT NULL PRIMARY KEY, WKT CLOB, STATEFP VARCHAR(2), COUNTYFP VARCHAR(3), COUNTYNS VARCHAR(8), NAME VARCHAR(100), GEOID VARCHAR(5), NAMELSAD VARCHAR(100), LSAD VARCHAR(2), CLASSFP VARCHAR(2), MTFCC VARCHAR(5), CSAFP VARCHAR(3), CBSAFP VARCHAR(5), METDIVFP VARCHAR(5), FUNCSTAT VARCHAR(1), ALAND DECIMAL(14,0), AWATER DECIMAL(14,0), INTPTLAT VARCHAR(11), INTPTLON VARCHAR(12)) ORGANIZE BY ROW;

-- Adjust the source path and the message path as necessary.
LOAD FROM ../data/county.del OF DEL LOBS FROM ./ MODIFIED BY COLDEL| MESSAGES /tmp/county_load.log INSERT INTO GEO_TEMP(OBJECTID, WKT, STATEFP, COUNTYFP, COUNTYNS, NAME, GEOID, NAMELSAD, LSAD, CLASSFP, MTFCC, CSAFP, CBSAFP, METDIVFP, FUNCSTAT, ALAND, AWATER, INTPTLAT, INTPTLON);

-- Create and load county table.
CREATE TABLE GEO_COUNTY (OBJECTID INTEGER NOT NULL PRIMARY KEY, Shape SYSIBM.ST_MultiPolygon INLINE LENGTH 32300, STATEFP VARCHAR(2), COUNTYFP VARCHAR(3), COUNTYNS VARCHAR(8), NAME VARCHAR(100), GEOID VARCHAR(5), NAMELSAD VARCHAR(100), LSAD VARCHAR(2), CLASSFP VARCHAR(2), MTFCC VARCHAR(5), CSAFP VARCHAR(3), CBSAFP VARCHAR(5), METDIVFP VARCHAR(5), FUNCSTAT VARCHAR(1), ALAND DECIMAL(14,0), AWATER DECIMAL(14,0), INTPTLAT VARCHAR(11), INTPTLON VARCHAR(12), xmin double generated as (st_minx(shape)), xmax double generated as (st_maxx(shape)), ymin double generated as (st_miny(shape)), ymax double generated as (st_maxy(shape)) ) ORGANIZE BY COLUMN NOT LOGGED INITIALLY;

INSERT INTO GEO_COUNTY (OBJECTID, SHAPE, STATEFP, COUNTYFP, COUNTYNS, NAME, GEOID, NAMELSAD, LSAD, CLASSFP, MTFCC, CSAFP, CBSAFP, METDIVFP, FUNCSTAT, ALAND, AWATER, INTPTLAT, INTPTLON) ( SELECT OBJECTID, ST_MPolyFromText(WKT, 1005), STATEFP, COUNTYFP, COUNTYNS, NAME, GEOID, NAMELSAD, LSAD, CLASSFP, MTFCC, CSAFP, CBSAFP, METDIVFP, FUNCSTAT, ALAND, AWATER, INTPTLAT, INTPTLON FROM GEO_TEMP);

-- Create a regular index on the boxfilter columns.
CREATE INDEX GEO_COUNTY_BF_IDX ON GEO_COUNTY (xmin, ymin, xmax, ymax);

COMMIT;
DROP TABLE GEO_TEMP;

--------------------------
-- Load GEO_CUSTOMER table
--------------------------
-- Load raw data.
CREATE TABLE GEO_TEMP (OBJECTID INTEGER NOT NULL PRIMARY KEY, WKT VARCHAR(256), NAME VARCHAR(254), INSURANCE_VALUE INTEGER) ORGANIZE BY ROW;

-- Adjust the source path and message path as necessary.
LOAD FROM ../data/customer.del OF DEL MODIFIED BY COLDEL| MESSAGES /tmp/customer_load.log INSERT INTO GEO_TEMP(OBJECTID, WKT, NAME, INSURANCE_VALUE);

CREATE TABLE GEO_CUSTOMER (OBJECTID INTEGER NOT NULL PRIMARY KEY, SHAPE SYSIBM.ST_POINT, NAME VARCHAR(254), INSURANCE_VALUE INTEGER, xmin double generated as (st_minx(shape)), xmax double generated as (st_maxx(shape)), ymin double generated as (st_miny(shape)), ymax double generated as (st_maxy(shape))) ORGANIZE BY ROW NOT LOGGED INITIALLY;
INSERT INTO GEO_CUSTOMER (OBJECTID, SHAPE, NAME, INSURANCE_VALUE) ( SELECT OBJECTID ,ST_POINTFROMTEXT(WKT, 1005), NAME ,INSURANCE_VALUE FROM GEO_TEMP );

-- Create a regular index on the boxfilter columns.
CREATE INDEX GEO_CUSTOMER_BF_IDX ON GEO_CUSTOMER (xmin, ymin, xmax, ymax);

-- Create a regular index on the INSURANCE_VALUE column.
CREATE INDEX GEO_CUSTOMER_insurance_value_idx ON GEO_CUSTOMER(INSURANCE_VALUE);

COMMIT;
DROP TABLE GEO_TEMP;






Loading

0 comments on commit aa20693

Please sign in to comment.