diff --git a/README.md b/README.md index fe7d42c..e722fdf 100755 --- a/README.md +++ b/README.md @@ -104,8 +104,9 @@ This library contains functions that cover the following areas: - Utility functions relating to areas including statistical analysis, data preprocessing and array manipulation. - A multi-processing framework to parallelize work across many cores or nodes. - Functions for seamless integration with PyKX or EmbedPy, which ensure seamless interoperability between Python and kdb+/q in either environment. +- A location for the storage and versioning of ML models on-prem along with a common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data. This allows for enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location. -These sections are explained in greater depth within the [FRESH](ml/docs/fresh.md), [cross validation](ml/docs/xval.md), [clustering](ml/docs/clustering/algos.md), [timeseries](ml/docs/timeseries/README.md), [optimization](ml/docs/optimize.md), [graph/pipeline](ml/docs/graph/README.md) and [utilities](ml/docs/utilities/metric.md) documentation. +These sections are explained in greater depth within the [FRESH](ml/docs/fresh.md), [cross validation](ml/docs/xval.md), [clustering](ml/docs/clustering/algos.md), [timeseries](ml/docs/timeseries/README.md), [optimization](ml/docs/optimize.md), [graph/pipeline](ml/docs/graph/README.md), [utilities](ml/docs/utilities/metric.md) and [registry](ml/docs/registry/README.md) documentation. ### nlp @@ -171,3 +172,4 @@ The Machine Learning Toolkit is provided here under an Apache 2.0 license. If you find issues with the interface or have feature requests, please [raise an issue](https://github.com/KxSystems/ml/issues). To contribute to this project, please follow the [contributing guide](CONTRIBUTING.md). + diff --git a/automl/automl.q b/automl/automl.q index 199296d..870cf85 100644 --- a/automl/automl.q +++ b/automl/automl.q @@ -45,4 +45,3 @@ if[all`config`run in commandLineArguments; testRun:`test in commandLineArguments; runCommandLine[testRun]; exit 0] - diff --git a/docker/Dockerfile b/docker/Dockerfile index 4b84f41..d9c4e9f 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -2,6 +2,11 @@ FROM registry.gitlab.com/kxdev/kxinsights/data-science/ml-tools/automl:embedpy-gcc-deb12 +# Java and jq packages required for registry tests +RUN apt-get update && apt-get install -y openjdk-17-jdk && rm -rf /var/lib/apt/lists/* + +ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/ + COPY requirements_pinned.txt /opt/kx/automl/ USER kx diff --git a/ml/docs/registry/README.md b/ml/docs/registry/README.md new file mode 100644 index 0000000..b131e0e --- /dev/null +++ b/ml/docs/registry/README.md @@ -0,0 +1,112 @@ +# ML Registry + +The ML Model Registry defines a centralised location for the storage of the following versioned entities: + +1. Machine Learning Models +2. Model parameters +3. Performance metrics +4. Model configuration +5. Model monitoring information + +The ML Registry is intended to allow models and all important metadata information associated with them to be stored locally. + +In the context of an MLOps offering the model registry is a collaborative location allowing teams to work together on different stages of a machine learning workflow from model experimentation to publishing a model to production. It is designed to aid in this in the following ways: + +1. Provide a solution with which users can store models generated in q/Python to a centralised location on-prem. +2. A common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data. +3. The ability to store information related to model training/monitoring requirements, allowing sysadmins to control the promotion of models to production environments. +4. Enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location. + +## Contents + +- [Quick start](#quick-start) +- [Documentation](#documentation) +- [Testing](#testing) +- [Status](#status) + + +## Quick start + +Start by following the installation step found [here](../../../README.md) or alternatively start a q session using the code below from the `ml` folder + +``` +$ q init.q +q) +``` + +Generate a model registry in the current directory and display the contents + +``` +q).ml.registry.new.registry[::;::]; +q)\ls +"CODEOWNERS" +"CONTRIBUTING.md" +"KX_ML_REGISTRY" +... +q)\ls KX_ML_REGISTRY +"modelStore" +"namedExperiments" +"unnamedExperiments" +``` + +Add an experiment folder to the registry + +``` +q).ml.registry.new.experiment[::;"test";::]; +q)\ls KX_ML_REGISTRY/namedExperiments/ +"test" +``` + +Add a basic q model associated with the experiment + +``` +q).ml.registry.set.model[::;{x+1};"mymodel";"q";enlist[`experimentName]!enlist "test"] +``` + +Check that the model has been added to the modelStore + +``` +q)modelStore +registrationTime experimentName modelName uniqueID .. +-----------------------------------------------------------------------------.. +2021.08.02D10:27:04.863096000 "test" "mymodel" 66f12a71-175b-cd56-7d0.. +``` + +Retrieve the model and model information based on the model name and version + +``` +q).ml.registry.get.model[::;::;"mymodel";1 0] +modelInfo| `major`description`experimentName`folderPath`registryPath`modelSto.. +model | {x+1} +``` + +## Documentation + +### Static Documentation + +Further information on the breakdown of the API for interacting with the ML-Registry and extended examples can be found in [Registry API](api/setting.md) and [Registry Examples](examples/basic.md). + +This provides users with: + +1. A breakdown of the API for interacting with the ML-Registry +2. Examples of interacting with a registry + +# Testing + +Unit tests are provided for testing the operation of this interface both as a local service. In order to facilitate this users must have embedPy or pykx installed alongside the following additional Python requirements, it is also advisable to have the python requirements_pinned.txt installed before running the below. + +``` +$ pip install pyspark xgboost +``` + +The local tests are run using a bespoke q script. The local tests can be run standalone using the instructions outlined below. + +## Local testing + +The below tests are ran from the `ml` directory and test results will output to console + +```bash +$ q ../test.q registry/tests/registry.t +``` + +This should present a summary of results of the unit tests. diff --git a/ml/docs/registry/api/deleting.md b/ml/docs/registry/api/deleting.md new file mode 100644 index 0000000..14c30a9 --- /dev/null +++ b/ml/docs/registry/api/deleting.md @@ -0,0 +1,326 @@ +# Deleting + +While the ML Registry provides a common location for the storage of versioned models, parameters and metrics it is often the case that models/experiments need to be deleted due to changes in team requirements or focus. The `.ml.registry.delete` namespace provides all the callable functions used for the removal of objects from a registry. All functionality within this namespace is described below. + +## `.ml.registry.delete.registry` + +_Delete a registry at a specified location_ + +```q +.ml.registry.delete.registry[folderPath;config] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `config` | `::` |This parameter is presently unused, provided here for future use. | + +**Returns:** + +|type |description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete a registry from the present working directory +```q +q).ml.registry.delete.registry[::;::] +./KX_ML_REGISTRY deleted. +``` + +**Example 2:** Delete the registry from a specified folder +```q +q).ml.registry.delete.registry["test/directory";::] +test/directory/KX_ML_REGISTRY deleted. +``` + +### `.ml.registry.delete.experiment` + +_Delete an experiment from a specified registry_ + +```q +.ml.registry.delete.experiment[folderPath;experimentName] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string` |Name of the experiment to be deleted. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| + +**Returns:** + +|type|description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete experiment 'test1' +```q +// Generate a number of models associated with different experiments +q).ml.registry.set.model[::;"test";{x};"model";"q";::] +q).ml.registry.set.model[::;"test";{x+1};"model";"q";::] +q).ml.registry.set.model[::;"test1";{x+2};"model1";"q";::] +q).ml.registry.set.model[::;"test1";{x+2};"model1";"q";::] + +// Show current contents of the modelStore +q)modelStore +registrationTime experimentName modelName uniqueID .. +-----------------------------------------------------------------------------.. +2021.06.01D10:13:19.517546000 "test" "model" 38e69f30-8956-24a8-0bc.. +2021.06.01D10:13:19.550791000 "test" "model" 7e1eb13b-aa21-cc7f-800.. +2021.06.01D10:13:19.584704000 "test1" "model1" 466c92d0-f610-dbbd-9da.. +2021.06.01D10:13:19.620767000 "test1" "model1" d68d2286-01e0-0867-446.. + +// Delete experiment 'test1' +q).ml.registry.delete.experiment[::;"test1"] +Removing all contents of ./KX_ML_REGISTRY/namedExperiments/test1/ +q)modelStore +registrationTime experimentName modelName uniqueID .. +-----------------------------------------------------------------------------.. +2021.06.01D10:13:19.517546000 "test" "model" 38e69f30-8956-24a8-0bc.. +2021.06.01D10:13:19.550791000 "test" "model" 7e1eb13b-aa21-cc7f-800.. +``` + +### `.ml.registry.delete.model` + +_Delete a model from a specified registry_ + +```q +.ml.registry.delete.model[folderPath;experimentName;modelName;version] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string | ::` |Name of the experiment the model to be deleted is located within. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string | ::` |The name of the model to delete. | +| `version` | `long[] | ::` |The version of the model to delete, a list of length 2 with (major;minor) version number,, if (::) all versions will be deleted. | + +**Returns:** + +| type |description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete version 1.0 of 'model1' +```q +// Generate a number of models within a registry +q).ml.registry.set.model[::;::;{x};"model1";"q";::] +q).ml.registry.set.model[::;::;{x+1};"model1";"q";::] +q).ml.registry.set.model[::;::;{x+2};"model2";"q";::] +q).ml.registry.set.model[::;::;{x+3};"model3";"q";::] +q).ml.registry.set.model[::;::;{x+4};"model3";"q";::] + +// Display current registry contents +q)modelStore +registrationTime experimentName modelName uniqueID .. +-----------------------------------------------------------------------------.. +2021.06.01D10:22:47.360569000 "undefined" "model1" 5c279367-6eac-d645-2f0.. +2021.06.01D10:22:47.393568000 "undefined" "model1" fb56b644-d9f8-22d6-b33.. +2021.06.01D10:22:47.420959000 "undefined" "model2" c9dfd663-500f-8fbf-77e.. +2021.06.01D10:22:47.456099000 "undefined" "model3" e56f9d8f-5dc3-a043-cb9.. +2021.06.01D10:22:47.491306000 "undefined" "model3" fe0f9d6c-f774-9318-941.. + +// Delete all models named 'model3' +q).ml.registry.delete.model[::;::;"model3";::] +Removing all contents of ./KX_ML_REGISTRY/unnamedExperiments/model3 + +q)modelStore +-----------------------------------------------------------------------------.. +2021.06.01D10:22:47.360569000 "undefined" "model1" 5c279367-6eac-d645-2f0.. +2021.06.01D10:22:47.393568000 "undefined" "model1" fb56b644-d9f8-22d6-b33.. +2021.06.01D10:22:47.420959000 "undefined" "model2" c9dfd663-500f-8fbf-77e.. + +// Delete version 1.0 of 'model1' +q).ml.registry.delete.model[::;::;"model1";1 0] +Removing all contents of ./KX_ML_REGISTRY/unnamedExperiments/model1/1 + +q)modelStore +registrationTime experimentName modelName uniqueID .. +-----------------------------------------------------------------------------.. +2021.06.01D10:22:47.393568000 "undefined" "model1" fb56b644-d9f8-22d6-b33.. +2021.06.01D10:22:47.420959000 "undefined" "model2" c9dfd663-500f-8fbf-77e.. +``` + +### `.ml.registry.delete.parameters` + +_Delete a parameter file from a specified model_ + +```q +.ml.registry.delete.parameters[folderPath;experimentName;modelName;version;paramFile] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string | ::` | Name of the experiment folder in which the parameters live. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string` | The name of the model associated to the parameters. | +| `version` | `long[]` |The version of the model to retrieve, a list of length 2 with (major;minor) version number. | +| `paramFile` | `string` | Name of the parameter file to delete. | + +**Returns:** + +| type |description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete parameter file +```q +// Generate a model with a parameter set +q).ml.registry.set.model[::;::;{x};"model1";"q";::] +q).ml.registry.set.parameters[::;::;"model1";1 0;"paramFile";`param1`param2!1 2] + +// Get parameter file +q).ml.registry.get.parameters[::;::;"model1";1 0;`paramFile] +param1| 1 +param2| 2 + +// Delete parameter file +q).ml.registry.delete.parameters[::;::;"model1";1 0;"paramFile"] + +// Get parameter file +q).ml.registry.get.parameters[::;::;"model1";1 0;`paramFile] +'./KX_ML_REGISTRY/unnamedExperiments/model1/1/params/paramFile.json. OS reports: The system cannot find the path specified. +``` + +### `.ml.registry.delete.metrics` + +_Delete a metric table from a specified model_ + +```q +.ml.registry.delete.metrics[folderPath;experimentName;modelName;version] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string | ::` | Name of the experiment folder in which the metrics live. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string` | The name of the model associated to the metrics. | +| `version` | `long[]` |The version of the model to delete metrics from, a list of length 2 with (major;minor) version number. | + +**Returns:** + +| type |description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete parameter file +```q +// Generate a model with a metrics table +q).ml.registry.set.model[::;::;{x};"model1";"q";::] +q).ml.registry.log.metric[::;::;"model1";1 0;`metricName1;1] +q).ml.registry.log.metric[::;::;"model1";1 0;`metricName2;2] + +// Get metrics +q).ml.registry.get.metric[::;::;"model1";1 0;`metricName1`metricName2] +timestamp metricName metricValue +----------------------------------------------------- +2021.06.04D17:02:38.200280000 metricName1 1 +2021.06.04D17:02:43.723946000 metricName2 2 + +// Delete parameter file +q).ml.registry.delete.metrics[::;::;"model1";1 0] + +// Get metrics +q).ml.registry.get.metric[::;::;"model1";1 0;`metricName1`metricName2] +'./KX_ML_REGISTRY/unnamedExperiments/model1/1/metrics/metric. OS reports: The system cannot find the path specified. +``` + +### `.ml.registry.delete.code` + +_Delete a code file from a specified model_ + +```q +.ml.registry.delete.code[folderPath;experimentName;modelName;version;codeFile] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string |::` | Name of the experiment folder in which the code lives. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string` | The name of the model associated to the code. | +| `version` | `long[]` |The version of the model to delete code from, a list of length 2 with version number (major;minor). | +| `codeFile` | `string` |Name of file to be deleted with file extension eg "myfile.py". | + +**Returns:** + +| type |description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete code file my.py +```q +q).ml.registry.delete.code[::;::;"model1";1 0;"my.py"] +``` + +### `.ml.registry.delete.metric` + +_Delete a metric from a specified metric table_ + +``` +.ml.registry.delete.metric[folderPath;experimentName;modelName;version;metricName] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string | ::` | Name of the experiment folder in which the metric lives. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string` | The name of the model associated to the metric. | +| `version` | `long[]` |The version of the model to delete metric from, a list of length 2 with version number (major;minor). | +| `metricName` | `string` |Name of metric to be deleted. | + +**Returns:** + +| type |description| +|------|---| +| `::` | | + +**Examples:** + +**Example 1:** Delete first metric + +```q +// Set metric values +q).ml.registry.log.metric[::;::;"model1";1 0;`metricName1;1] +q).ml.registry.log.metric[::;::;"model1";1 0;`metricName2;2] + +// Show metric values +q).ml.registry.get.metric[::;::;"model1";1 0;`metricName1`metricName2] +timestamp metricName metricValue +----------------------------------------------------- +2021.06.07D08:49:18.296326000 metricName1 1 +2021.06.07D08:49:20.643205000 metricName2 2 + +// Delete first metric +q).ml.registry.delete.metric[::;::;"model1";1 0;"metricName1"] + +// Show metric values +q).ml.registry.get.metric[::;::;"model1";1 0;`metricName1`metricName2] +timestamp metricName metricValue +----------------------------------------------------- +2021.06.07D08:49:20.643205000 metricName2 2 +``` diff --git a/ml/docs/registry/api/doc-layout b/ml/docs/registry/api/doc-layout new file mode 100644 index 0000000..4105180 --- /dev/null +++ b/ml/docs/registry/api/doc-layout @@ -0,0 +1,4 @@ +arrange: + - setting.md + - retrieval.md + - deleting.md diff --git a/ml/docs/registry/api/retrieval.md b/ml/docs/registry/api/retrieval.md new file mode 100644 index 0000000..9d9ab5f --- /dev/null +++ b/ml/docs/registry/api/retrieval.md @@ -0,0 +1,342 @@ +# Loading + +Once saved to the ML Registry following the instructions outlined [here](./setting.md), entities that have been persisted should be accessible to any user permissioned with access to the registry save location. The `.ml.registry.get` namespace provides all the callable functions used for the retrieval of objects from a registry. All functionality within this namespace is described below. + +## `.ml.registry.get.model` + +_Retrieve a model from an ML Registry_ + +```q +.ml.registry.get.model[folderPath;experimentName;modelName;version] +``` + +**Parameters:** + +|name|type|description| +|------------------|---------------|-----------| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of an experiment from which to retrieve a model, if no modelName is provided the newest model within this experiment will be used. If neither modelName or experimentName are defined the newest model within the "unnamedExperiments" section is chosen. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be retrieved in the case this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to retrieve, a list of length 2 with (major;minor) version number, in the case that this is null the newest model is retrieved. | + +**Returns:** + +|type|description| +|---|---| +| `dictionary` | The model and information related to the generation of the model. | + +When using [`.ml.registry.set.model`](setting.md#mlregistrysetmodel) users can include `code` files to be loaded on model retrieval, these files can be `q`,`p`,`py` or `k` extensions. On invocation of this function these files are loaded prior to model retrieval. + +**Examples:** + +**Example 1:** Get the latest version of 'model' +```q +// Set a number of models within a new registry +q).ml.registry.set.model[::;::;{x};"model";"q";::] +q).ml.registry.set.model[::;::;{x+1};"model";"q";::] +q).ml.registry.set.model[::;::;{x+2};"model1";"q";::] + +// Get the latest addition to the Registry +q).ml.registry.get.model[::;::;::;::] +modelInfo| `registry`model`monitoring!(`description`modelInformation`experime.. +model | {x+2} + +// Get the latest version of 'model' +q).ml.registry.get.model[::;::;"model";::] +modelInfo| `registry`model`monitoring!(`description`modelInformation`experime.. +model | {x+1} +``` + +**Example 2:** Get version 1.0 of 'model' +```q +q).ml.registry.get.model[::;::;"model";1 0] +modelInfo| `registry`model`monitoring!(`description`modelInformation`experime.. +model | {x} +``` + +## `.ml.registry.get.modelStore` + +_Retrieve the modelStore table associated with an ML Registry_ + +```q +.ml.registry.get.modelStore[folderPath;config] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `config` | `::` | Currently unused, must be passed as `::`. | + +**Returns:** + +|type|description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Retrieve the modelStore table +```q +q).ml.registry.get.modelStore[::;::] +q)modelStore +registrationTime experimentName modelName uniqueID .. +-----------------------------------------------------------------------------.. +2021.06.01D08:51:28.593730000 "undefined" "mymodel" 7a214d0a-d9d2-890e-014.. +``` + +## `.ml.registry.get.metric` + +_Retrieve metric information associated with a model_ + +```q +.ml.registry.get.metric[folderPath;experimentName;modelName;version;param] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of an experiment from which to retrieve metrics associated with a model, if no modelName is provided the newest model within this experiment will be used. If neither modelName or experimentName are defined the newest model within the "unnamedExperiments" section is chosen. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string | ::` | The name of the model to retrieve metrics from. In the case this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to retrieve metrics from, a list of length 2 with (major;minor) version number, in the case that this is null the newest model is retrieved. | +| `param` | `:: | dictionary | symbol | string` | Search parameters for the retrieval of metrics. In the case when this is a string, it is converted to a symbol. | + +**Returns:** + +|type |description| +|---------|---| +| `table` | The metric table for a specific model, which may potentially be filtered. | + +**Examples:** + +**Example 1:** Retrieve all metrics named `metric1` +```q +// Log a number of metrics associated with a model +q).ml.registry.set.model[::;::;{x};"mymodel";"q";::] +q).ml.registry.log.metric[::;::;::;::;`metric1;2.0] +q).ml.registry.log.metric[::;::;::;::;`metric1;2.1] +q).ml.registry.log.metric[::;::;::;::;`metric2;1.0] +q).ml.registry.log.metric[::;::;::;::;`metric2;1.0] +q).ml.registry.log.metric[::;::;::;::;`metric3;3.0] + +// Retrieve all metrics associated with the model +q).ml.registry.get.metric[::;::;::;::;::] +timestamp metricName metricValue +---------------------------------------------------- +2021.06.01D09:51:35.638489000 metric1 2 +2021.06.01D09:51:35.652863000 metric1 2.1 +2021.06.01D09:51:35.666593000 metric2 1 +2021.06.01D09:51:35.679152000 metric2 1 +2021.06.01D09:51:35.694630000 metric3 3 + +// Retrieve all metrics named `metric1 +q).ml.registry.get.metric[::;::;::;::;`metric1] +timestamp metricName metricValue +---------------------------------------------------- +2021.06.01D09:51:35.638489000 metric1 2 +2021.06.01D09:51:35.652863000 metric1 2.1 +``` + +**Example 2:** Retrieve multiple metrics +```q +q).ml.registry.get.metric[::;::;::;::;`metric2`metric3] +timestamp metricName metricValue +---------------------------------------------------- +2021.06.01D09:51:35.666593000 metric2 1 +2021.06.01D09:51:35.679152000 metric2 1 +2021.06.01D09:51:35.694630000 metric3 3 +``` + +**Example 3:** Equivalently this can be done using a dictionary input +```q +q).ml.registry.get.metric[::;::;::;::;enlist[`metricName]!enlist `metric2`metric3] +timestamp metricName metricValue +---------------------------------------------------- +2021.06.01D09:51:35.666593000 metric2 1 +2021.06.01D09:51:35.679152000 metric2 1 +2021.06.01D09:51:35.694630000 metric3 3 +``` + +## `.ml.registry.get.parameters` + +_Retrieve parameter information associated with a model_ + +```q +.ml.registry.get.parameters[folderPath;experimentName;modelName;version;paramName] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of an experiment from which to retrieve parameters associated with a model. If no modelName is provided the newest model within this experiment will be used. If neither modelName or experimentName are defined the newest model within the "unnamedExperiments" section is chosen. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model from which parameters are to be retrieved. In the case this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to retrieve, a list of length 2 with (major;minor) version number, in the case that this is null the newest model is retrieved. | +| `paramName` | `symbol | string` | The name of the parameter to retrieve. | + +**Returns:** + +|type|description| +|---|---| +| `string | dictionary | table | float` | The value of the parameter associated with a named parameter saved for the model. | + +**Examples:** + +**Example 1:** Retrieve set parameters +```q +// Set a number of parameters associated with a model +q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile1";`param1`param2!1 2] +q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile2";("value1";"value2")] + +// Retrieve the set parameters +q).ml.registry.get.parameters[::;::;::;::;`paramFile1] +param1| 1 +param2| 2 + +q).ml.registry.get.parameters[::;::;::;::;`paramFile2] +"value1" +"value2" +``` + + +## `.ml.registry.get.predict` + +_Retrieve a model from the ML Registry wrapping in a common interface_ + +```q +.ml.registry.get.predict[folderPath;experimentName;modelName;version] +``` + +**Parameters:** + +|name|type|description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of an experiment from which to retrieve a model, if no modelName is provided the newest model within this experiment will be used. If neither modelName or experimentName are defined the newest model within the "unnamedExperiments" section is chosen. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be retrieved in the case this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::`| The specific version of a named model to retrieve, a list of length 2 with (major;minor) version number, in the case that this is null the newest model is retrieved. | + +**Returns:** + +|type|description| +|---|---| +| `function` | A wrapped version of the `model` providing a common callable interface for all models within the ML Registry. This model can accept vector/matrix/table/dictionary input and will return predictions generated by the model. | + +Models within the ML Registry can be of many forms `q`/`Python`/`sklearn`/`keras` etc. As such this function provides a common entry point to allow the models to be retrieved such that they are all callable using the same function call. + +When using [`.ml.registry.set.model`](setting.md#mlregistrysetmodel) users can include `code` files to be loaded on model retrieval, these files can be `q`,`p`,`py` or `k` extensions. On invocation of this function these files are loaded prior to model retrieval. + +**Examples:** + +**Example 1:** Get the latest addition to the Registry +```q +// Set a number of models within a new registry +q).ml.registry.set.model[::;::;{x};"model";"q";::] +q).ml.registry.set.model[::;::;{x+1};"model";"q";::] +q).ml.registry.set.model[::;::;{x+2};"model1";"q";::] + +// Get the latest addition to the Registry +q).ml.registry.get.predict[::;::;::;::] +{x+2}{[data;bool] + dataType:type data; + if[dataType<=20;:data]; + data:$[98h=dat.. +``` + +**Example 2:** Get the latest version of 'model' +```q +q).ml.registry.get.predict[::;::;"model";::] +{x+1}{[data;bool] + dataType:type data; + if[dataType<=20;:data]; + data:$[98h=dat.. +``` + +**Example 3:** Get version 1.0 of 'model' +```q +q).ml.registry.get.predict[::;::;"model";1 0] +{x}{[data;bool] + dataType:type data; + if[dataType<=20;:data]; + data:$[98h=dat.. +``` + +## `.ml.registry.get.update` + +_Retrieve the update method for models within the ML Registry wrapping in a common interface_ + +```q +.ml.registry.get.update[folderPath;experimentName;modelName;version;supervised] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of an experiment from which to retrieve a model, if no modelName is provided the newest model within this experiment will be used. If neither modelName or experimentName are defined the newest model within the "unnamedExperiments" section is chosen. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName`| `string | ::` | The name of the model to be retrieved in the case this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to retrieve, a list of length 2 with (major;minor) version number, in the case that this is null the newest model is retrieved. | +| `supervised` | `boolean` | Is the model being retrieved a supervised (`1b`) or unsupervised (`0b`) model. This changes the number of expected inputs to the returned function. | + +**Returns:** + +|type|description| +|---|---| +| `function` | A wrapped version of the `model` providing a common callable interface for all models within the ML Registry. This model can accept vector/matrix/table/dictionary input and will return an updated version of the originally persisted model. | + +Models stored within the ML Registry can be of many forms ```q/sklearn/keras``` etc. Many of these formats can have an 'update' capability to allow these models to be updated as new data becomes available. As such this function provides a common entry point to allow the models update functionality to be retrieved in a common format. + +In order to be retrieved from the registry the model must contain the following characteristics + +| Model Type | Supported | Requirements | +|------------|-----------|--------------| +| q | Yes | Model originally saved to registry must contain an `update` key. | +| sklearn | Yes | Model originally saved to registry must support the `partial_fit` method. | +| keras | No | | +| Pytorch | No | | + +When using [`.ml.registry.set.model`](setting.md#mlregistrysetmodel) users can include `code` files to be loaded on model retrieval, these files can be `q`,`p`,`py` or `k` extensions. On invocation of this function these files are loaded prior to model retrieval. + +**Examples:** + +**Example 1:** Get the latest sklearn updatable model from the Registry +```q +// Fit models to be persisted to the registry +q)X:100 2#200?1f +q)yReg:100?1f +q)yClass:100?0b +q)online1:.ml.online.clust.sequentialKMeans.fit[flip X;`e2dist;3;::;::] +q)online2:.ml.online.sgd.linearRegression.fit[X;yReg;1b;::] +q)sgdClass:.p.import[`sklearn.linear_model][`:SGDClassifier] +q)sgdModel:sgdClass[pykwargs `max_iter`tol!(1000;0.003)][`:fit] . (X;yClass) + +// Set a number of models within a new registry +q).ml.registry.set.model[::;:::online1;"onlineCluster";"q";::] +q).ml.registry.set.model[::;::;online2;"onlineRegression";"q";::] +q).ml.registry.set.model[::;::;sgdModel;"SklearnSGD";"sklearn";::] + +// Get the latest sklearn updatable model from the Registry +q).ml.registry.get.update[::;::;"SklearnSGD";::;1b] +.[{[f;x]embedPy[f;x]}[foreign]enlist]{(x y;z)}[locked[;0b]] +``` + +**Example 2:** Get a q updatable supervised model from the Registry +```q +q).ml.registry.get.update[::;::;"onlineRegression";::;1b] +.[{[config;secure;features;target] + modelInfo:config`modelInfo; + theta:mode..{(x y;z)}[locked[;0b]] +``` + +**Example 3:** Get a q updatable unsupervised model from the Registry +```q +q).ml.registry.get.update[::;::;"onlineCluster";::;0b] +{[returnInfo;data] + modelInfo:returnInfo`modelInfo; + inputs:modelInfo`input..locked[;0b] +``` diff --git a/ml/docs/registry/api/setting.md b/ml/docs/registry/api/setting.md new file mode 100644 index 0000000..333cb80 --- /dev/null +++ b/ml/docs/registry/api/setting.md @@ -0,0 +1,594 @@ +# Storing + +The ML Registry allows users to persist a variety of versioned entities to disk and cloud storage applications. The ML Registry provides this persistence functionality across a number of namespaces, namely, `.ml.registry.[new/set/log/update]`. All supported functionality within these namespaces is described below. + +## `.ml.registry.new.registry` + +_Generate a new registry_ + +```q +.ml.registry.new.registry[folderPath;config] +``` + +**Parameters:** + +| Name | Type | Description | +|--------------|-------------------|-------------| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `config` | `dictionary | ::` | Any additional configuration needed for initialising the registry.| + +**Returns:** + +| Type | Description | +|--------------|-------------| +| `dictionary` |Updated config dictionary containing relevant registry paths| + +When generating a new registry within the context of cloud vendor interactions the `folderPath` variable is unused and a new registry will be created at the storage location provided. + +**Examples:** + +**Example 1:** Generate a registry in 'pwd' + +```q +q).ml.registry.new.registry[::;::]; +``` + +**Example 2:** Create a folder and generate a registry in that location +```q +q)system"mkdir -p test/folder/location" +q).ml.registry.new.registry["test/folder/location";::]; +``` + +**Example 3:** Generate registry in cloud storage location which is different from current .ml.registry.location +```q +q).ml.registry.location +local| . +q).ml.registry.new.registry[enlist[`aws]!enlist"s3://ml-registry-test";::]; +``` + +## `.ml.registry.new.experiment` + +_Generate a new experiment within an existing registry. If the registry doesn't exist it will be created._ + +```q +.ml.registry.new.experiment[folderPath;experimentName;config] +``` + +Where: + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string` |The name of the experiment to be located under the namedExperiments folder which can be populated by new models associated with the experiment. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `config` |`dictionary | ::` |Any additional configuration needed for initialising the experiment.| + +**Returns:** + +|Type|Description| +|---|---| +|dictionary|Updated config dictionary containing relevant registry paths| + +**Examples:** + +**Example 1:** Create an experiment 'test' in a registry location in 'pwd' +```q +q).ml.registry.new.experiment[::;"test";::]; +``` + +**Example 2:** Create an experiment 'new_test' in a registry located at a different location +```q +q)system"mkdir -p test/folder/location" +q).ml.registry.new.experiment["test/folder/location";"new_test";::]; +``` + +**Example 3:** Create a sub-experiment 'sub_exp' under 'new_test' in the above registry +```q +q).ml.registry.new.experiment["test/folder/location";"new_test/sub_exp";::]; +``` + +**Example 4:** Generate experiment in a cloud storage location which is different from current .ml.registry.location +```q +q).ml.registry.location +local| . +q).ml.registry.new.experiment[enlist[`aws]!enlist"s3://ml-registry-test";"my_test";::]; +``` + +## `.ml.registry.set.model` + +_Add a new model to the ML Registry. If the registry doesn't exist it will be created._ + +```q +.ml.registry.set.model[folderPath;experimentName;model;modelName;modelType;config] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON.| +| `experimentName` | `string | ::` |The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `model` |` embedpy | dictionary | function | projection | symbol | string` | The model to be saved to the registry. | +| `modelName` | `string` |The name to be associated with the model. | +| `modelType` | `string` |The type of model that is being saved, namely `"q"`, `"sklearn"`, `"keras"`, `"python"`, `"torch"`. | +| `config` | `dictionary` |Any additional configuration needed for setting the model. | + +**Returns:** + +|Type|Description| +|---|---| +|`guid`| Returns the unique id for the model | + +**Model Parameter:** + +The model variable defines the item that is to be saved to the registry and used as the `model` when retrieved. This can be an embedPy object defining an underlying Python model, a q function/projection/dictionary or a symbol pointing to a model saved to disk. + +Models can be added under the following qualifying conditions + +Model Type | Saved File Type | Qualifying Conditions | +-----------|-------------------|-----------------------| +q | q-binary | Model must be a q projection, function or dictionary with a `predict` and or `update` key. | +Python | pickled file | The model must be saved using `joblib.dump`. | +Sklearn | pickled file | The model must be saved using `joblib.dump` and contain a `predict` method i.e. is a `fit` scikit-learn model. | +Keras | HDF5 file | The model must be saved using the `save` method provided by Keras and contain a `predict` method i.e. is a `fit` Keras model.| +PyTorch | pickled file/jit | The model must be saved using the `torch.save` functionality. | + +When adding a model from disk the ability for the model to be loaded into the current process will be validated in order to ensure that the model can be loaded into a q process and it is not being added in a manner that will corrupt the registry. + +If setting a q model to the registry the following conditions are important: + +1. When passed as a function/projection a model is expected to require one parameter only, namely the data to be passed to the model for it to be used as a prediction entity +2. If the model is a dictionary + 1. It is expected to have a `predict` key which contains a model meeting the conditions of `1` above. + 2. Optionally it can have an `update` key which defines a function/projection taking feature and target data used to update the model, retrieval of the update functions can be configured for use in supervised and unsupervised use-cases as outlined [here](retrieval.md#mlregistrygetupdate). + +When setting any of the `Python`/`Sklearn`/`Keras`/`PyTorch` models to the registry the following conditions are important: + +1. All functions when used for prediction should accept one parameter, namely the data to be passed to the model to perform a prediction. A breakdown of expectations around how these models are stored is provided in the table above. +2. Scikit-learn models are also supported for use as `updating` models, namely on retrieval of the models using [`.ml.registry.get.update`](retrieval.md#mlregistrygetupdate) when this model has been fit and contains the `partial_fit` method for example: [sklearn.linear_model.SGDClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html). + +**Configuration Parameter:** + +The `config` variable within the `.ml.registry.set.model` function is used extensively within the code to facilitate advanced options within the registry code. The following keys in particular are supported for more advanced functionality, usage of these is outlined within the examples section [here](../examples/basic.md). + +| key | type | Description | +|---|---|---| +| `data` | `any` | If provided with `data` as a key the addition of the model to the registry will also attempt to parse out relevant statistical information associated with the data for use within deployment of the model. | +| `requirements` | `boolean | string[][] | symbol` | Add Python requirements information associated with a model, this can either be a boolean `1b` indicating use of `pip freeze`, a symbol indicating the path to a `requirements.txt` file or a list of strings defining the requirements to be added. | +| `major` | `boolean` | Is the incrementing of a version to be 'major' i.e. should the model be incremented from `1 0` to `2 0` rather than `1 0` to `1 1` as is default. | +| `majorVersion` | `long` | What major version is to be incremented? By default we increment major versions based on the maximal version within the registry, however users can define the major version to be incremented using this option. | +| `code` | `symbol | symbol[]` | Reference to the location of any files `*.py`/`*.p`/`*.k` or `*.q` files. These files are then loaded automatically on retrieval of the models using the `*.get.*` functionality. | +| `axis` | `boolean` | Should the data when passed to the model be `'vertical'` or `'horizontal'` i.e. should the data be retrieved from a table in `flip value flip` (`0b`) or `value flip` (`1b`) format. This allows flexibility in model design. | +| `supervise` | `string[]` | List of metrics to be used for supervised monitoring of the model. | + +**Examples:** + +**Example 1:** Add a vanilla model to a registry in 'pwd' +```q +q).ml.registry.set.model[::;::;{x};"model";"q";::] +440482bb-5404-b22d-6c53-c847f09acf0a +``` + +**Example 2:** Add a vanilla model to a registry in 'pwd' under experiment EXP1 +```q +q).ml.registry.set.model[::;"EXP1";{x};"model";"q";::] +440482bb-5404-b22d-6c53-c847f09acf0a +``` + +**Example 3:** Add a vanilla model to a registry in 'pwd' under sub-experiment EXP1/SUBEXP1 +```q +q).ml.registry.set.model[::;"EXP1/SUBEXP1";{x};"model";"q";::] +440482bb-5404-b22d-6c53-c847f09acf0a +``` + +**Example 4:** Add an sklearn model to a registry +```q +q)skldata:.p.import`sklearn.datasets +q)blobs:skldata[`:make_blobs;<] +q)dset:blobs[`n_samples pykw 1000;`centers pykw 2;`random_state pykw 500] +q)skmdl :.p.import[`sklearn.cluster][`:AffinityPropagation][`damping pykw 0.8][`:fit]dset 0 +q).ml.registry.set.model[::;::;skmdl;"skmodel";"sklearn";::] +6048775b-01e9-33b7-302a-8307ff8e132c +``` + +**Example 5:** Generate a major version of the "model" within the registry +```q +q).ml.registry.set.model[::;::;{x+1};"model";"q";enlist[`major]!enlist 1b] +95ed27df-072d-6bd6-713d-c49fae255840 +``` + +**Example 6:** Associate some Python requirements with the next version of the sklearn model +```q +q)requirements:enlist[`requirements]!enlist ("scikit-learn";"numpy") +q).ml.registry.set.model[::;::;skmdl;"skmodel";"sklearn";requirements] +440482bb-5404-b22d-6c53-c847f09acf0a +``` + +**Example 7:** Add a q model saved to disk (this assumes running from the root of the registry repo) +```q +q).ml.registry.set.model[::;::;`:examples/models/qModel;"qModel";"q";::] +bea225d4-f8e5-dd3a-32da-51ecc91a6d9e +``` + +## `.ml.registry.set.parameters` + +_Generate a JSON file containing parameters to be associated with a model. These parameters define any information that a user believes to be important to the models generation, it may include hyperparameter sets used when fitting or information about training._ + +```q +.ml.registry.set.parameters[folderPath;experimentName;modelName;version;paramName;params] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to which the parameters are to be set. If this is null, the newest model associated with the experiment is used. | +| `version` | `long[] | ::` | The specific version of a named model to set the parameters to, a list of length 2 with (major;minor) version number. If this is null the newest model is used. | +| `paramName` |` string | symbol` | The name of the parameter to be saved. | +| `params` | `dictionary | table | string` | The parameters to save to file. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +When adding new parameters associated with a model within the context of cloud vendor interactions the `folderPath` variable is unused and the registry location is assumed to be the storage location provided on initialisation. + +**Examples:** + +**Example 1:** Save a dictionary parameter associated with a model 'mymodel' +```q +// Add a model to the registry +q).ml.registry.set.model[::;::;{x+2};"mymodel";"q";::] + +// Save a dictionary parameter associated with a model 'mymodel' +q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile";`param1`param2!1 2] +``` + +**Example 2:** Save a list of strings as parameters associated with a model 'mymodel' +```q +q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile2";("value1";"value2")] +``` + +## `.ml.registry.log.metric` + +_Log metric values associated with a model_ + +```q +.ml.registry.log.metric[folderPath;experimentName;modelName;version;metricName;metricValue] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to which the metrics are to be associated. If this is null, the newest model associated with the experiment is used. | +| `version` | `long[] | ::` | The specific version of a named model to be used, a list of length 2 with (major;minor) version number. If this is null the newest model is used. | +| `metricName` | `symbol | string` | The name of the metric to be persisted. In the case when this is a string, it is converted to a symbol. | +| `metricValue` | `float` | The value of the metric to be persisted. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +When logging metrics a persisted binary table is generated within the model registry containing the following information + +1. The time the metric value was added +2. The name of the persisted metric +3. The value of the persisted metric + +When adding metrics associated with a model within the context of cloud vendor interactions the `folderPath` variable is unused and the registry location is assumed to be the storage location provided on initialisation. + +**Examples:** + +**Example 1:** Log metric values associated with various metric names +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x+1};"metricModel";"q";::] + +// Log metric values associated with various metric names +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func1;2.4] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func1;3] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func2;10.2] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func3;9] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func3;11.2] +``` + +## `.ml.registry.update.latency` + +_Update monitoring config with new latency information_ + +```q +.ml.registry.update.latency[cli;folderPath;experimentName;modelName;version;model;data] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::` | The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1.| +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `model` | `fn` | The function whos latency is to be monitored. | +| `data` | `table` | Sample data on which to evaluate the function. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Update model latency config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Get predict function +q)p:.ml.registry.get.predict[::;::;"configModel";::] + +// Update model latency config +q).ml.registry.update.latency[::;::;"configModel";::;p;([]1000?1f)] +``` + +## `.ml.registry.update.nulls` + +_Update monitoring config with new null information_ + +```q +.ml.registry.update.nulls[cli;folderPath;experimentName;modelName;version;data] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `data` | `table` | Sample data on which to evaluate the median value. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + + +**Examples:** + +**Example 1:** Update model nulls config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Update model nulls config +q).ml.registry.update.nulls[::;::;"configModel";::;([]1000?1f)] +``` + +## `.ml.registry.update.infinity` + +_Update monitoring config with new infinity information_ + +```q +.ml.registry.update.infinity[cli;folderPath;experimentName;modelName;version;data] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `data` | `table` | Sample data on which to evaluate the min/max value. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Update model infinity config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Update model infinity config +q).ml.registry.update.infinity[::;::;"configModel";::;([]1000?1f)] +``` + +## `.ml.registry.update.csi` + +_Update monitoring config with new csi information_ + +```q +.ml.registry.update.csi[cli;folderPath;experimentName;modelName;version;data] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `data` | `table` | Sample data on which to evaluate the historical distributions. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Update model csi config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Update model csi config +q).ml.registry.update.csi[::;::;"configModel";::;([]1000?1f)] +``` + +## `.ml.registry.update.psi` + +_Update monitoring config with new psi information_ + +```q +.ml.registry.update.psi[cli;folderPath;experimentName;modelName;version;model;data] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `model` | `fn` | Prediction function. | +| `data` | `table` | Sample data on which to evaluate the historical predictions. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Update model psi config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Get predict function +q)p:.ml.registry.get.predict[::;::;"configModel";::] + +// Update model psi config +q).ml.registry.update.psi[::;::;"configModel";::;p;([]1000?1f)] +``` + +## `.ml.registry.update.type` + +_Update monitoring config with new type information_ + +```q +.ml.registry.update.type[cli;folderPath;experimentName;modelName;version;format] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `format` | `string` | Model type. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Update model type config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Update model type config +q).ml.registry.update.type[::;::;"configModel";::;"sklearn"] +``` + + +## `.ml.registry.update.supervise` + +_Update monitoring config with new supervise information_ + +```q +.ml.registry.update.supervise[cli;folderPath;experimentName;modelName;version;metrics] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `metrics` | `string[]` | Metrics to monitor. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + +**Examples:** + +**Example 1:** Update model supervise config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Update model supervise config +q).ml.registry.update.supervise[::;::;"configModel";::;enlist[".ml.mse"]] +``` + +## `.ml.registry.update.schema` + +_Update monitoring config with new schema information_ + +```q +.ml.registry.update.schema[cli;folderPath;experimentName;modelName;version;data] +``` + +**Parameters:** + +|Name|Type|Description| +|---|---|---| +| `folderPath` | `dictionary | string | ::` | A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. ```enlist[`local]!enlist"myReg"```; a string indicating the local path; or a generic null to use the current `.ml.registry.location` pulled from CLI/JSON. | +| `experimentName` | `string | ::`| The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. | +| `modelName` | `string | ::` | The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. | +| `version` | `long[] | ::` | The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. | +| `data` | `table` | Table from which to retreive schema. | + +**Returns:** + +|Type|Description| +|---|---| +|`::`|| + + +**Examples:** + +**Example 1:** Update model supervise config +```q +// Create a model within the registry +q).ml.registry.set.model[::;::;{x};"configModel";"q";::] + +// Update model supervise config +q).ml.registry.update.schema[::;::;"configModel";::;([]til 7)] +``` diff --git a/ml/docs/registry/examples/basic.md b/ml/docs/registry/examples/basic.md new file mode 100644 index 0000000..792555d --- /dev/null +++ b/ml/docs/registry/examples/basic.md @@ -0,0 +1,291 @@ +# Registry Examples + +The purpose of this page is to outline some example usage of the ML-Registry. +For most users, these examples will be the first entry point to the use of the ML-Registry and outlines the function calls that are used across the interface when interacting with the Registry. + +## Basic Interactions + +After installing the relevant dependencies, we can explore the q model registry functionality by following the examples below: + +* Start up a q session +``` +$ q init.q +``` + +* Generate a new model registry +```q +q).ml.registry.new.registry[::;::]; +``` + +* Retrieve the 'modelStore' defining the current models within the registry +```q +q).ml.registry.get.modelStore[::;::]; +``` + +* Display the modelStore +```q +q)show modelStore +registrationTime experimentName modelName uniqueID modelType version +-------------------------------------------------------------------- +``` + +* Add several models to the registry +```q +// Increment minor versions +q)modelName:"basic-model" +q).ml.registry.set.model[::;::;{x} ;modelName;"q";::] +q).ml.registry.set.model[::;::;{x+1};modelName;"q";::] +q).ml.registry.set.model[::;::;{x+2};modelName;"q";::] + +// Set major version and increment from '2.0' +q).ml.registry.set.model[::;::;{x+3};modelName;"q";enlist[`major]!enlist 1b] +q).ml.registry.set.model[::;::;{x+4};modelName;"q";::] + +// Add another version of '1.x' +q).ml.registry.set.model[::;::;{x+5};modelName;"q";enlist[`majorVersion]!enlist 1] +``` + +* Display the modelStore +```q +q)show modelStore +registrationTime experimentName modelName uniqueID modelType version +----------------------------------------------------------------------------------------------------------------- +2021.07.20D18:26:17.904115000 "undefined" "basic-model" e1636884-f7d8-93e5-9e72-fb23f7407473 ,"q" 1 0 +2021.07.20D18:26:17.914201000 "undefined" "basic-model" edaa5221-8e4f-4aef-52df-25d8794b28fe ,"q" 1 1 +2021.07.20D18:26:17.925254000 "undefined" "basic-model" a667b0f2-ce0c-e4bd-d870-6aab04579859 ,"q" 1 2 +2021.07.20D18:26:17.932588000 "undefined" "basic-model" 56be5696-cd31-f846-57d2-86f0dd92fe2e ,"q" 2 0 +2021.07.20D18:26:17.939366000 "undefined" "basic-model" bbf3120c-d75b-4f5a-21c0-368189291792 ,"q" 2 1 +2021.07.20D18:26:21.086221000 "undefined" "basic-model" 5386500e-7cee-fdf6-a493-d7a5c03c8280 ,"q" 1 3 +``` + +* Add models associated with experiments +```q +q)modelName:"new-model" + +// Incrementing versions from '1.0' +q).ml.registry.set.model[::;"testExperiment";{x} ;modelName;"q";::] +q).ml.registry.set.model[::;"testExperiment";{x+1};modelName;"q";enlist[`major]!enlist 1b] +q).ml.registry.set.model[::;"testExperiment";{x+2};modelName;"q";::] +``` + +* Display the modelStore +```q +q)show modelStore +registrationTime experimentName modelName uniqueID modelType version +------------------------------------------------------------------------------------------------------------------- +2021.07.20D18:26:17.904115000 "undefined" "basic-model" e1636884-f7d8-93e5-9e72-fb23f7407473 ,"q" 1 0 +2021.07.20D18:26:17.914201000 "undefined" "basic-model" edaa5221-8e4f-4aef-52df-25d8794b28fe ,"q" 1 1 +2021.07.20D18:26:17.925254000 "undefined" "basic-model" a667b0f2-ce0c-e4bd-d870-6aab04579859 ,"q" 1 2 +2021.07.20D18:26:17.932588000 "undefined" "basic-model" 56be5696-cd31-f846-57d2-86f0dd92fe2e ,"q" 2 0 +2021.07.20D18:26:17.939366000 "undefined" "basic-model" bbf3120c-d75b-4f5a-21c0-368189291792 ,"q" 2 1 +2021.07.20D18:26:21.086221000 "undefined" "basic-model" 5386500e-7cee-fdf6-a493-d7a5c03c8280 ,"q" 1 3 +2021.07.20D18:28:15.902359000 "testExperiment" "new-model" 86423ef3-cca0-7e2b-051a-e53fbaab761d ,"q" 1 0 +2021.07.20D18:28:15.911149000 "testExperiment" "new-model" ab143727-4164-2f08-fd1f-66e1994873d7 ,"q" 2 0 +2021.07.20D18:28:19.294837000 "testExperiment" "new-model" 6fa608cc-0a87-46b5-d61c-ce2cf7abc0a6 ,"q" 2 1 +``` + +* Retrieve models from the registry +```q +// Retrieve version 1.1 of the 'basic-model' +q).ml.registry.get.model[::;::;"basic-model";1 1]`model +{x+1} + +// Retrieve the most up to date model associated with the 'testExperiment' +q).ml.registry.get.model[::;"testExperiment";"new-model";::]`model +{x+2} + +// Retrieve the last model added to the registry +q).ml.registry.get.model[::;::;::;::]`model +{x+2} +``` + +* Delete models, experiments, and the registry +```q +// Delete the experiment from the registry +q).ml.registry.delete.experiment[::;"testExperiment"] + +// Display the modelStore following experiment deletion +q)show modelStore +registrationTime experimentName modelName uniqueID modelType version +----------------------------------------------------------------------------------------------------------------- +2021.07.20D18:26:17.904115000 "undefined" "basic-model" e1636884-f7d8-93e5-9e72-fb23f7407473 ,"q" 1 0 +2021.07.20D18:26:17.914201000 "undefined" "basic-model" edaa5221-8e4f-4aef-52df-25d8794b28fe ,"q" 1 1 +2021.07.20D18:26:17.925254000 "undefined" "basic-model" a667b0f2-ce0c-e4bd-d870-6aab04579859 ,"q" 1 2 +2021.07.20D18:26:17.932588000 "undefined" "basic-model" 56be5696-cd31-f846-57d2-86f0dd92fe2e ,"q" 2 0 +2021.07.20D18:26:17.939366000 "undefined" "basic-model" bbf3120c-d75b-4f5a-21c0-368189291792 ,"q" 2 1 +2021.07.20D18:26:21.086221000 "undefined" "basic-model" 5386500e-7cee-fdf6-a493-d7a5c03c8280 ,"q" 1 3 + +// Delete version 1.3 of the 'basic-model' +q).ml.registry.delete.model[::;::;"basic-model";1 3]; + +// Display the modelStore following deletion of 1.3 of the 'basic-model' +q)show modelStore +registrationTime experimentName modelName uniqueID modelType version +----------------------------------------------------------------------------------------------------------------- +2021.07.20D18:26:17.904115000 "undefined" "basic-model" e1636884-f7d8-93e5-9e72-fb23f7407473 ,"q" 1 0 +2021.07.20D18:26:17.914201000 "undefined" "basic-model" edaa5221-8e4f-4aef-52df-25d8794b28fe ,"q" 1 1 +2021.07.20D18:26:17.925254000 "undefined" "basic-model" a667b0f2-ce0c-e4bd-d870-6aab04579859 ,"q" 1 2 +2021.07.20D18:26:17.932588000 "undefined" "basic-model" 56be5696-cd31-f846-57d2-86f0dd92fe2e ,"q" 2 0 +2021.07.20D18:26:17.939366000 "undefined" "basic-model" bbf3120c-d75b-4f5a-21c0-368189291792 ,"q" 2 1 + +// Delete all models associated with the 'basic-model' +q).ml.registry.delete.model[::;::;"basic-model";::] + +// Display the modelStore following deletion of 'basic-model' +q)show modelStore +registrationTime experimentName modelName uniqueID modelType version +-------------------------------------------------------------------- + +// Delete the registry +q).ml.registry.delete.registry[::;::] +``` + +## Externally generated model addition + +Not all models that a user may want to use within the registry will have been generated in the q session being used to add the model to the registry. +In reality, they may not have been generated using q/embedPy at all. +For example, in the case of Python objects/models saved as `pickled files`/`h5 files` in the case of Keras models. + +As such, the `.ml.registry.set.model` functionality also allows users to take the following file types (with appropriate limitations) and add them to the registry such that they can be retrieved. + +Model Type | File Type | Qualifying Conditions +-----------|-------------------|---------------------- +q | q-binary | Retrieved model must be a q projection, function or dictionary with a predict key +Python | pickled file | The file must be loadable using `joblib.load` +Sklearn | pickled file | The file must be loadable using `joblib.load` and contain a `predict` method i.e. is a `fit` scikit-learn model +Keras | HDF5 file | The file must be loadable using `keras.models.load_model` and contain a `predict` method i.e. is a `fit` Keras model +PyTorch | pickled file/jit | The file must be loadable using `torch.jit.load` or `torch.load`, invocation of the function on load is expected to return predictions as a tensor + +The following example invocations shows how q and sklearn models generated previously can be added to the registry: + +* Load the repository +```q +$ q init.q +q) +``` + +* Add a saved q model (Clustering algorithm) to the ML Registry +```q +// Generate and save to disk a q clustering model +q)`:qModel set .ml.clust.kmeans.fit[2 200#400?1f;`e2dist;3;::] + +q).ml.registry.set.model[::;::;`:qModel;"qModel";"q";::] +q).ml.registry.get.model[::;::;::;::] +modelInfo| `registry`model`monitoring!(`description`modelInformation`experime.. +model | `modelInfo`predict!(`repPts`clust`data`inputs!((0.7396003 0.256620.. +``` + +* Add a saved Sklearn model to the ML Registry +```q +// Generate and save an sklearn model to disk +q)clf:.p.import[`sklearn.svm][`:SVC][] +q)mdl:clf[`:fit][100 2#200?1f;100?3] +q).p.import[`joblib][`:dump][mdl;"skmdl.pkl"] + +q).ml.registry.set.model[::;::;`:skmdl.pkl;"skModel";"sklearn";::] +q).ml.registry.get.model[::;::;::;::] +modelInfo| `registry`model`monitoring!(`description`modelInformation`experime.. +model | {[f;x]embedPy[f;x]}[foreign]enlist +``` + +## Adding Python requirements with individually set models + +By default, the addition of models to the registry as individual analytics includes: + +1. Configuration outlined within `config/modelInfo.json`. +2. The model (Python/q) within a `model` folder. +3. A `metrics` folder for the storage of metrics associated with a model +4. A `parameters` folder for the storage parameter information associated with the model or associated data +5. A `code` folder which can be used to populate code that will be loaded on retrieval of a model. + +What is omitted from this are the Python requirements that are necessary for the running of the models, these can be added as part of the `config` parameter in the following ways. + +1. Setting the value associated with the `requirements` key to `1b` when in a virtualenv will `pip freeze` the current environment and save as a `requirements.txt` file. +2. Setting the value associated with the `requirements` key to a `symbol`/`hsym` which points to a file will copy that file as the `requirements.txt` file for that model, thus allowing users to point to a previously generated requirements file. +3. Setting the value associated with the `requirements` key to a list of strings will populate a `requirements.txt` file for the model containing each of the strings as an independent requirement + +The following example shows how each of the above cases would be invoked: + +* Freezing the current environment using pip freeze when in a virtualenv +```q +q).ml.registry.set.model[::;::;{x};"reqrModel";"q";enlist[`requirements]!enlist 1b] +``` + +* Pointing to an existing requirements file using relative or full path +```q +q).ml.registry.set.model[::;::;{x+1};"reqrModel";"q";enlist[`requirements]!enlist `:requirements.txt] +``` + +* Adding a list of strings as the requirements +```q +q)requirements:enlist[`requirements]!enlist ("numpy";"pandas";"scikit-learn") +q).ml.registry.set.model[::;::;{x+2};"reqrModel";"q";requirements] +``` + +## Associate metrics with a model + +Metric information can be persisted with a saved model to create a table within the model registry to which data associated with the model can be stored. + +The following shows how interactions with this functionality are facilitated: + +* Set a model within the model registry +```q +q).ml.registry.set.model[::;"test";{x+1};"metricModel";"q";::]; +``` + +* Log various metrics associated with a named model +```q +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func1;2.4] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func1;3] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func2;10.2] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func3;9] +q).ml.registry.log.metric[::;::;"metricModel";1 0;`func3;11.2] +``` + +* Retrieve all metrics associated with the model `metricModel` +```q +q).ml.registry.get.metric[::;::;"metricModel";1 0;::] +timestamp metricName metricValue +---------------------------------------------------- +2021.04.23D10:21:46.690671000 func1 2.4 +2021.04.23D10:21:52.523227000 func1 3 +2021.04.23D10:21:57.338468000 func2 10.2 +2021.04.23D10:22:04.314963000 func3 9 +2021.04.23D10:22:08.899301000 func3 11.2 +``` + +* Retrieve metric information related to a single named model +```q +q).ml.registry.get.metric[::;::;"metricModel";1 0;enlist[`metricName]!enlist `func1] +timestamp metricName metricValue +---------------------------------------------------- +2021.04.23D10:21:46.690671000 func1 2.4 +2021.04.23D10:21:52.523227000 func1 3 +``` + +## Associating parameters with a model + +Parameter information can be added to a saved model, this creates a json file within the models registry associated with a particular parameter. + +* Set a model within the model registry +```q +q).ml.registry.set.model[::;::;{x+2};"paramModel";"q";::] +``` + +* Set parameters associated with the model +```q +q).ml.registry.set.parameters[::;::;"paramModel";1 0;"paramFile";`param1`param2!1 2] + +q).ml.registry.set.parameters[::;::;"paramModel";1 0;"paramFile2";`value1`value2] +``` + +* Retrieve saved parameters associated with a model +```q +q).ml.registry.get.parameters[::;::;"paramModel";1 0;"paramFile"] +param1| 1 +param2| 2 + +q).ml.registry.get.parameters[::;::;"paramModel";1 0;"paramFile2"] +"value1" +"value2" +``` diff --git a/ml/docs/registry/index.md b/ml/docs/registry/index.md new file mode 100644 index 0000000..8aba8c0 --- /dev/null +++ b/ml/docs/registry/index.md @@ -0,0 +1,18 @@ +# Introduction + +The _KX ML Registry Library_ contains functionality to create centralized registry locations for the storage of versioned machine learning models, workflows and advanced analytics, alongside parameters, metrics and other important artefacts. + +The ML Registry functionality, provided within the `.ml.registry` namespace in q, is intended to provide a key component in any MLOps stack built upon KX technology. Registries provide a location to which information required for model monitoring can be stored, retrained pipelines can be pushed and models for deployment can be retrieved. + +The functionality aims to enhance our offering and provide users of kdb Insights with: + +1. A method of introducing a users own models generated, with wrapped functionality allowing these models to be integrated seamlessly with specified limitations. +2. A method to understand stored models. +3. A single storage location for all `q/python models`. + +## Sections + +Documentation is broken into the following sections: + +* [Registry API](api/setting.md) +* [Examples](examples/basic.md) diff --git a/ml/examples/code/torch/torch.p b/ml/examples/code/torch/torch.p new file mode 100644 index 0000000..3944dcf --- /dev/null +++ b/ml/examples/code/torch/torch.p @@ -0,0 +1,34 @@ +import torch.nn as nn +import torch.nn.functional as F + +class classifier(nn.Module): + + def __init__(self,input_dim, hidden_dim, dropout = 0.4): + super().__init__() + + self.fc1 = nn.Linear(input_dim, hidden_dim) + self.fc2 = nn.Linear(hidden_dim, hidden_dim) + self.fc3 = nn.Linear(hidden_dim, 1) + self.dropout = nn.Dropout(p = dropout) + + + def forward(self,x): + x = self.dropout(F.relu(self.fc1(x))) + x = self.dropout(F.relu(self.fc2(x))) + x = self.fc3(x) + + return x + +def runmodel(model,optimizer,criterion,dataloader,n_epoch): + for epoch in range(n_epoch): + train_loss=0 + for idx, data in enumerate(dataloader, 0): + inputs, labels = data + model.train() + optimizer.zero_grad() + outputs = model(inputs) + loss = criterion(outputs,labels.view(-1,1)) + loss.backward() + optimizer.step() + train_loss += loss.item()/len(dataloader) + return model diff --git a/ml/examples/code/torch/torch.q b/ml/examples/code/torch/torch.q new file mode 100644 index 0000000..ca9959e --- /dev/null +++ b/ml/examples/code/torch/torch.q @@ -0,0 +1,16 @@ +\d .torch + +// Example invocation of a torch model being fit using embedPy +fitModel:{[xtrain;ytrain;model] + optimArg:enlist[`lr]!enlist 0.9; + optimizer:.p.import[`torch.optim][`:Adam][model[`:parameters][];pykwargs optimArg]; + criterion:.p.import[`torch.nn][`:BCEWithLogitsLoss][]; + dataX:.p.import[`torch][`:from_numpy][.p.import[`numpy][`:array][xtrain]][`:float][]; + dataY:.p.import[`torch][`:from_numpy][.p.import[`numpy][`:array][ytrain]][`:float][]; + tensorXY:.p.import[`torch.utils.data][`:TensorDataset][dataX;dataY]; + modelValues:(count first xtrain;1b;0); + modelArgs:`batch_size`shuffle`num_workers!$[.pykx.loaded;.pykx.topy each modelValues;modelValues]; + dataLoader:.p.import[`torch.utils.data][`:DataLoader][tensorXY;pykwargs modelArgs]; + nEpochs:10|`int$(count[xtrain]%1000); + .p.get[`runmodel][model;optimizer;criterion;dataLoader;nEpochs] + } diff --git a/ml/examples/models/kerasModel.h5 b/ml/examples/models/kerasModel.h5 new file mode 100644 index 0000000..b060006 Binary files /dev/null and b/ml/examples/models/kerasModel.h5 differ diff --git a/ml/examples/models/model.graph b/ml/examples/models/model.graph new file mode 100644 index 0000000..600cf55 Binary files /dev/null and b/ml/examples/models/model.graph differ diff --git a/ml/examples/models/pythonModel.pkl b/ml/examples/models/pythonModel.pkl new file mode 100644 index 0000000..7ddacd8 Binary files /dev/null and b/ml/examples/models/pythonModel.pkl differ diff --git a/ml/examples/models/qModel b/ml/examples/models/qModel new file mode 100644 index 0000000..e652e87 Binary files /dev/null and b/ml/examples/models/qModel differ diff --git a/ml/examples/models/sklearnModel.pkl b/ml/examples/models/sklearnModel.pkl new file mode 100644 index 0000000..45f6cce Binary files /dev/null and b/ml/examples/models/sklearnModel.pkl differ diff --git a/ml/examples/models/theanoModel.pkl b/ml/examples/models/theanoModel.pkl new file mode 100644 index 0000000..1038671 Binary files /dev/null and b/ml/examples/models/theanoModel.pkl differ diff --git a/ml/examples/models/torchModel.pt b/ml/examples/models/torchModel.pt new file mode 100644 index 0000000..b71929e Binary files /dev/null and b/ml/examples/models/torchModel.pt differ diff --git a/ml/examples/q/deploy.q b/ml/examples/q/deploy.q new file mode 100644 index 0000000..2a75bd9 --- /dev/null +++ b/ml/examples/q/deploy.q @@ -0,0 +1,102 @@ +\l init.q + +// Retrieve command line arguments and ensure a user is +// cognizant that they will delete the current registry +// if they invoke the example by accident +cmdLine:.Q.opt .z.x +if[not `run in key cmdLine; + -1"This example will delete the registry", + " in your current folder, use '-run' command line arg"; + exit 1; + ]; + +.[.ml.registry.delete.registry;(::;::);{}] + +// All models solving the clustering problem are associated with the +// "cluster" experiment +experiment:enlist[`experimentName]!enlist "cluster" + +// Generate and format the dataset + +skldata:.p.import`sklearn.datasets +blobs:skldata[`:make_blobs;<] +dset:blobs[`n_samples pykw 1000;`centers pykw 2;`random_state pykw 500] + +// Generate two separate Affinity Propagation models using the ML Toolkit +qmdl :.ml.clust.ap.fit[flip dset 0;`nege2dist;0.8;min;::] +qmdl2:.ml.clust.ap.fit[flip dset 0;`nege2dist;0.5;min;::] + +// Add the two q models to the KX_ML_REGISTRY +.ml.registry.set.model[::;"cluster";qmdl ;"qAPmodel";"q";enlist[`axis]!enlist 1b] +.ml.registry.set.model[::;"cluster";qmdl2;"qAPmodel";"q";enlist[`axis]!enlist 1b] + +// Generate equivalent Affinity Propagation models using Scikit-Learn +skmdl :.p.import[`sklearn.cluster][`:AffinityPropagation][`damping pykw 0.8][`:fit]dset 0 +skmdl2:.p.import[`sklearn.cluster][`:AffinityPropagation][`damping pykw 0.5][`:fit]dset 0 + +// Add the two models to the KX_ML_REGISTRY with the second model version 2.0 not 1.1 +.ml.registry.set.model[::;"cluster";skmdl ;"skAPmodel";"sklearn";::] +.ml.registry.set.model[::;"cluster";skmdl2;"skAPmodel";"sklearn";enlist[`major]!enlist 1b] + +// Generate and fit two Keras models adding these to the registry +if[@[{.p.import[x];1b};`keras;0b]; + seq :.p.import[`keras.models][`:Sequential]; + dense:.p.import[`keras.layers][`:Dense]; + nparray:.p.import[`numpy]`:array; + + kerasModel:seq[]; + kerasModel[`:add]dense[4;pykwargs `input_dim`activation!(2;`relu)]; + kerasModel[`:add]dense[4;`activation pykw `relu]; + kerasModel[`:add]dense[1;`activation pykw `sigmoid]; + kerasModel[`:compile][pykwargs `loss`optimizer!`binary_crossentropy`adam]; + kerasModel[`:fit][nparray dset 0;dset 1;pykwargs `epochs`verbose!200 0]; + + kerasModel2:seq[]; + kerasModel2[`:add]dense[4;pykwargs `input_dim`activation!(2;`relu)]; + kerasModel2[`:add]dense[4;`activation pykw `relu]; + kerasModel2[`:add]dense[1;`activation pykw `sigmoid]; + kerasModel2[`:compile][pykwargs `loss`optimizer!`mse`adam]; + kerasModel2[`:fit][nparray dset 0;dset 1;pykwargs `epochs`verbose!10 0]; + + // Add the two models to the KX_ML_REGISTRY + .ml.registry.set.model[::;"cluster";kerasModel ;"kerasModel";"keras";::]; + .ml.registry.set.model[::;"cluster";kerasModel2;"kerasModel";"keras";::]; + ]; + + +// Generate and add two Python functions to the KX_ML_REGISTRY. +// These are not associated with a named experiment or solve the problem that +// the above do, they are purely for demonstration +if[@[{.p.import x;1b};`statsmodels;0b]; + pyModel :.p.import[`statsmodels.api][`:OLS]; + pyModel2:.p.import[`statsmodels.api][`:WLS]; + + // Add the two functions to the KX_ML_REGISTRY. + .ml.registry.set.model[::;::;pyModel ;"pythonModel";"python";::]; + .ml.registry.set.model[::;::;pyModel2;"pythonModel";"python";::] + ] + + +// Online/out-of-core Models + +// Generate and add two q 'online' models to the KX_ML_REGISTRY. +// These models contain an 'update' key which allows the models to +// be updated as new data becomes available +online1:.ml.online.clust.sequentialKMeans.fit[2 200#400?1f;`e2dist;3;::;::] +online2:.ml.online.sgd.linearRegression.fit[100 2#400?1f;100?1f;1b;::] +online3:.ml.online.sgd.logClassifier.fit[100 2#400?1f;100?0b;1b;::] + +.ml.registry.set.model[::;::;online1;"onlineCluster" ;"q";::] +.ml.registry.set.model[::;::;online2;"onlineRegression";"q";::] +.ml.registry.set.model[::;::;online3;"onlineClassifier";"q";::] + +// Generate and add two Python 'online' models to the KX_ML_REGISTRY. +// These models must contain a 'partial_fit' method in order to be +// considered suitable for retrieval as update functions + +sgdClass:.p.import[`sklearn.linear_model][`:SGDClassifier] +sgdModel:sgdClass[pykwargs `max_iter`tol!(1000;0.003) ][`:fit] . dset 0 1 + +.ml.registry.set.model[::;::;sgdModel;"SklearnSGD";"sklearn";::] + +exit 0 diff --git a/ml/examples/q/registry.q b/ml/examples/q/registry.q new file mode 100644 index 0000000..f40f040 --- /dev/null +++ b/ml/examples/q/registry.q @@ -0,0 +1,82 @@ +// Initialize all relevant functionality +\l init.q + +// Set the screen width/lengths for better display +\c 200 200 + +// Retrieve command line arguments and ensure a user is +// cognizant that they will delete the current registry +// if they invoke the example by accident +cmdLine:.Q.opt .z.x +if[not `run in key cmdLine; + -1"This example will delete the registry", + " in your current folder, use '-run' command line arg"; + exit 1; + ]; + +.[.ml.registry.delete.registry;(::;::);{}] + +-1"Generate a model registry and retrieve the 'modelStore'"; +.ml.registry.new.registry[::;::]; +.ml.registry.get.modelStore[::;::]; +show modelStore; + +-1"\nAdd several 'basic q models' to the registry\n"; +modelName:"basic-model" +// Incrementing versions from '1.0' +.ml.registry.set.model[::;{x} ;modelName;"q";::] +.ml.registry.set.model[::;{x+1};modelName;"q";::] +.ml.registry.set.model[::;{x+2};modelName;"q";::] + +// Set major version and increment from '2.0' +.ml.registry.set.model[::;{x+3};modelName;"q";enlist[`major]!enlist 1b] +.ml.registry.set.model[::;{x+4};modelName;"q";::] + +// Add another version of '1.x' +.ml.registry.set.model[::;{x+5};modelName;"q";enlist[`majorVersion]!enlist 1] + +-1"Display the modelStore following model addition"; +show modelStore; + +-1"\nAdd models associated with an experiment\n"; +modelName:"new-model" +experiment:enlist[`experimentName]!enlist "testExperiment" +// Incrementing versions from '1.0' +.ml.registry.set.model[::;{x} ;modelName;"q";experiment] +.ml.registry.set.model[::;{x+1};modelName;"q";experiment,enlist[`major]!enlist 1b] +.ml.registry.set.model[::;{x+2};modelName;"q";experiment] + +-1"Display the modelStore following experiment addition"; +show modelStore; + +-1"\nRetrieve version 1.1 of the 'basic-model':\n"; +.ml.registry.get.model[::;::;"basic-model";1 1]`model + +-1"\nRetrieve the most up to date model associated with the 'testExperiment':\n"; +.ml.registry.get.model[::;"testExperiment";"new-model";::]`model + +-1"\nRetrieve the last model added to the registry:\n"; +.ml.registry.get.model[::;::;::;::]`model + +-1"\nDelete the experiment from the registry"; +.ml.registry.delete.experiment[::;"testExperiment"] + +-1"\nDisplay the modelStore following experiment deletion"; +show modelStore + +-1"\nDelete version 1.3 of the 'basic-model'"; +.ml.registry.delete.model[::;::;"basic-model";1 3]; + +-1"\nDisplay the modelStore following deletion of 1.3 of the 'basic-model'"; +show modelStore + +-1"\nDelete all models associated with the 'basic-model'"; +.ml.registry.delete.model[::;::;"basic-model";::] + +-1"\nDisplay the modelStore following deletion of 'basic-model'"; +show modelStore + +// Delete the registry +.ml.registry.delete.registry[::;::] + +exit 0 diff --git a/ml/fresh/tests/sigtests.t b/ml/fresh/tests/sigtests.t index 7da6911..f8ccfda 100644 --- a/ml/fresh/tests/sigtests.t +++ b/ml/fresh/tests/sigtests.t @@ -11,6 +11,7 @@ In each case significance tests implemented within freshq are compared to equivalent significance tests implemented previously in python. \ +\S -314159 \l ml.q \l fresh/init.q \l fresh/tests/significancetests.p diff --git a/ml/init.q b/ml/init.q index c17e0f2..ebd73e1 100644 --- a/ml/init.q +++ b/ml/init.q @@ -14,3 +14,6 @@ loadfile`:xval/init.q loadfile`:graph/init.q loadfile`:optimize/init.q loadfile`:timeseries/init.q +loadfile`:mlops/init.q +loadfile`:registry/init.q + diff --git a/ml/ml.q b/ml/ml.q index 71813f2..c20cd9d 100644 --- a/ml/ml.q +++ b/ml/ml.q @@ -22,6 +22,9 @@ csym:coerse 0b; // Ensure plain python string (avoid b' & numpy arrays) pydstr:$[.pykx.loaded;{.pykx.eval["lambda x:x.decode()"].pykx.topy x};::] +// Return python library version +pygetver:$[.pykx.loaded;{string .pykx.eval["lambda x:str(x)";<].p.import[`$x][`:__version__]};{.p.import[`$x][`:__version__]`}] + version:@[{TOOLKITVERSION};`;`development] path:{string`ml^`$@[{"/"sv -1_"/"vs ssr[;"\\";"/"](-3#get .z.s)0};`;""]}` loadfile:{$[.z.q;;-1]"Loading ",x:_[":"=x 0]x:$[10=type x;;string]x;system"l ",path,"/",x;} @@ -39,3 +42,10 @@ i.ignoreWarning:0b // @category utilities // @fileoverview Change ignoreWarnings updateIgnoreWarning:{[]i.ignoreWarning::not i.ignoreWarning} + +@[value;".log.initns[]";{::}] +logging.info :{@[{log.info x};x;{[x;y] -1 x;}[x]]} +logging.warn :{@[{log.warn x};x;{[x;y] -1 x;}[x]]} +logging.error:{@[{log.error x};x;{::}];'x} +logging.fatal:{@[{log.fatal x};x;{[x;y] -2 x;}[x]];exit 2} + diff --git a/ml/mlops/README.md b/ml/mlops/README.md new file mode 100644 index 0000000..11cf60e --- /dev/null +++ b/ml/mlops/README.md @@ -0,0 +1,91 @@ +# MLOps Tools + +The purpose of this repository is to act as a central location for common utilities used across the MLOps functionality - Model Training, Monitoring and Packaging. + +## Contents + +- [Requirements](#requirements) +- [Quick start](#quick-start) +- [File structure](#file-structure) +- [Examples](#examples) + + +## Requirements + +- kdb+ > 3.5 +- embedPy +- pykx + +## Quick Start + +This quick start guide is intended to show how the functionality can be initialized and run. + +### Initialize the code base + +From the root of this repository only run the following to initialize the code base + +```bash +$ q init.q +``` + +## File structure + +The application consists of an _init.q_ as the entrypoint script. + +```bash +$ tree -a +. +├── init.q +├── README.md +├── src +│   ├── lint.config +│   └── q +│   ├── check.q +│   ├── create.q +│   ├── init.q +│   ├── misc.q +│   ├── paths.q +│   └── search.q +│   └── update.q +└── tests + ├── main.q + ├── performance + │   ├── benchmark1 + │   │   └── performance.q + │   └── load.q + └── template.quke +``` + +## Example + +```q +$ q init.q + | :: +init | ()!() +path | "/home/deanna/2021projects/mlops-tools" +loadfile | { + filePath:_[":"=x 0]x:$[10=type x;;string]x; + @[system"l ",; +.. +check | ``registry`folderPath`config!(::;{[folderPath;config] + folderPat.. +create | ``binExpected`splitData`percSplit!(::;{[expected;nGroups] + expec.. +percentile| {[array;perc] + array:array where not null array; + percent:perc*.. +util | ``ap!(::;{[func;data] + $[0=type data; + func each data; + .. +infReplace| {[func;data] + $[0=type data; + func each data; + 98=type d.. +paths | ``modelFolder!(::;{[registryPath;config;folderType] + folder:$[fo.. +search | ``model!(::;{[experimentName;modelName;version;config] + infoKeys.. +update | ``latency`nulls`infinity`csi!(::;{[config;model;data] + func:{{sy.. +``` diff --git a/ml/mlops/init.q b/ml/mlops/init.q new file mode 100644 index 0000000..f1c1876 --- /dev/null +++ b/ml/mlops/init.q @@ -0,0 +1,28 @@ +\d .ml + +@[system"l ",;"p.q";{::}] + +// @desc Retrieve initial command line configuration +mlops.init:.Q.opt .z.x + +// @desc Define root path from which scripts are to be loaded +mlops.path:{ + module:`$"mlops-tools"; + string module^`$@[{"/"sv -1_"/"vs ssr[;"\\";"/"](-3#get .z.s)0};`;""] + }` + +// @kind function +// @desc Load an individual file +// @param x {symbol} '.q/.p/.k' file which is to be loaded into the current +// process. Failure to load the file at location 'path,x' or 'x' will +// result in an error message +// @return {null} +mlops.loadfile:{ + filePath:_[":"=x 0]x:$[10=type x;;string]x; + @[system"l ",; + mlops.path,"/",filePath; + {@[system"l ",;y;{'"Library load failed with error :",x}]}[;filePath] + ]; + } + +mlops.loadfile`:src/q/init.q diff --git a/ml/mlops/src/lint.config b/ml/mlops/src/lint.config new file mode 100644 index 0000000..ef91b48 --- /dev/null +++ b/ml/mlops/src/lint.config @@ -0,0 +1 @@ +INSUFFICIENT_INDENT : false, warning { tab_size : 2 } diff --git a/ml/mlops/src/q/check.q b/ml/mlops/src/q/check.q new file mode 100644 index 0000000..48dcf9f --- /dev/null +++ b/ml/mlops/src/q/check.q @@ -0,0 +1,166 @@ +\d .ml + +// Check that the q model being set/retrieved from the model registry +// is of an appropriate type +// +// @param model {fn|proj|dictionary} The model to be saved to the registry. +// In the case this is a dictionary it is assumed that a 'predict' key +// exists such that the model can be used on retrieval +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.q:{[model;getOrSet] + if[not type[model]in 99 100 104h; + printString:$[getOrSet;"retrieved is not";"must be"]; + '"model ",printString," a q function/projection/dictionary" + ]; + if[99h=type model; + if[not `predict in key model; + printString:$[getOrSet;"retrieved";"saved"]; + '"q dictionaries being ",printString," must contain a 'predict' key" + ]; + ]; + } + +// Check that the Python object model being set/retrieved from the model +// registry is of an appropriate type +// +// @param model {<} The model to be saved to the registry. This must be +// an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.python:{[model;getOrSet] + if[not type[model]in 105 112h; + printString:$[getOrSet;"retrieved is not";"must be"]; + '"model ",printString," an embedPy object" + ]; + } + +// Check that a model that is being added to the or retrieved from the +// registry is an sklearn model with a predict method +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.sklearn:{[model;getOrSet] + mlops.check.python[model;getOrSet]; + mlops.check.pythonlib[model;"sklearn"]; + @[{x[`:predict]};model;{[x]'"model must contain a predict method"}] + } + +// Check that a model that is being added to the or retrieved from the +// registry is an xgboost model with a predict method +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.xgboost:{[model;getOrSet] + mlops.check.python[model;getOrSet]; + mlops.check.pythonlib[model;"xgboost"]; + @[{x[`:predict]};model;{[x]'"model must contain a predict method"}] + } + +// Check that a model that is being added to the or retrieved from the +// registry is a Keras model with a predict method +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.keras:{[model;getOrSet] + mlops.check.python[model;getOrSet]; + mlops.check.pythonlib[model;"keras"]; + @[{x[`:predict]};model;{[x]'"model must contain a predict method"}] + } + +// Check that a model that is being added to the or retrieved from the +// registry is a Theano model with a predict method +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.theano:{[model;getOrSet] + mlops.check.python[model;getOrSet]; + mlops.check.pythonlib[model;"theano"] + } + +// Check that a model that is being added to or retrieved from the +// registry is a PyTorch model +// +// TO-DO +// - Increase type checking on torch objects +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.torch:{[model;getOrSet] + mlops.check.python[model;getOrSet]; + } + +// Check that a DAG being saved/retrieved is appropriately formatted +// +// @param model {dictionary} The DAG to be saved/retrieved +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.graph:{[model;getOrSet] + if[not 99h=type model; + printString:$[getOrSet;"retrieved is not";"must be"]; + '"graph ",printString," a q dictionary" + ]; + if[not `vertices`edges~key model; + '"graph does not contain 'vertices' and 'edges' keys expected" + ]; + } + +// Check that a model that is being added to or retrieved from the +// registry is a pyspark pipeline with a transform method +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param getOrSet {boolean} Is the model being retrieved or persisted, this +// modifies the error statement on issue with invocation +// @return {::} Function will error on unsuccessful invocation otherwise +// generic null is returned +mlops.check.pyspark:{[model;getOrSet] + .ml.mlops.check.python[model;getOrSet]; + @[{x[`:transform]};model;{[x]'"model/pipeline must contain a transform method"}] + } + +// Check that the python object that is retrieved contains an appropriate +// indication that it comes from the library that it is expected to come +// from. +// +// @param model {<} The model to be saved to or retrieved from the registry. +// This must be an embedPy or foreign object +// @param library {string} The name of the library that is being checked +// against, this is sufficient in the case of fit sklearn/xgboost/keras models +// but may not be generally applicable +// @return {::} Function willerror on unsuccessful invocation otherwise +// generic null is returned +mlops.check.pythonlib:{[model;library] + builtins:.p.import[`builtins]; + stringRepr:builtins[`:str;<][builtins[`:type]model]; + if[not stringRepr like "*",library,"*"; + '"Model retrieved not a python object derived from the library '", + library,"'." + ]; + } diff --git a/ml/mlops/src/q/create.q b/ml/mlops/src/q/create.q new file mode 100644 index 0000000..9cffdbd --- /dev/null +++ b/ml/mlops/src/q/create.q @@ -0,0 +1,33 @@ +\d .ml + +// .ml.registry.util.create.binExpected - Separate the expected values into bins +// @param expected {float[]} The expected data +// @param nGroups {long} The number of groups +// @returns {dict} The splitting values and training distributions +mlops.create.binExpected:{[expected;nGroups] + expected:@["f"$;expected;{'"Cannot convert the data to floats"}]; + splits:mlops.create.splitData[expected;nGroups],0w; + expectDist:mlops.create.percSplit[expected;splits]; + (`$string splits)!expectDist + } + +// .ml.registry.util.create.splitData - Split the data into equallly distributed +// bins +// @param expected {float[]} The expected predictions +// @param nGroups {int} The number of data groups +// @return {float[]} The splitting points in the expected set +mlops.create.splitData:{[expected;nGroups] + n:1%nGroups; + mlops.percentile[expected;-1_n*1+til nGroups] + } + +// .ml.registry.util.create.percSplit - Get the percentage of data points that +// are in each distribution bin +// @param data {float[]} The data to be split +// @param split {float[]} The splitting values defining how the data is to be +// distributed +// @return {float[]} The splitting values and training distributions +mlops.create.percSplit:{[data;splits] + groups:deltas 1+bin[asc data;splits]; + groups%count data + } diff --git a/ml/mlops/src/q/get.q b/ml/mlops/src/q/get.q new file mode 100644 index 0000000..f8530c4 --- /dev/null +++ b/ml/mlops/src/q/get.q @@ -0,0 +1,109 @@ +\d .ml + +// Retrieve a q model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {dict|fn|proj} The model previously saved to disk +// registry +mlops.get.q:{[filePath] + mlops.get.typedModel[`q;filePath;get] + } + +// Retrieve a Python model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +mlops.get.python:{[filePath] + func:.p.import[`joblib]`:load; + mlops.get.typedModel[`python;filePath;func] + } + +// Retrieve a sklearn model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +mlops.get.sklearn:{[filePath] + func:.p.import[`joblib]`:load; + mlops.get.typedModel[`sklearn;filePath;func] + } + +// Retrieve a xgboost model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +mlops.get.xgboost:{[filePath] + func:.p.import[`joblib]`:load; + mlops.get.typedModel[`xgboost;filePath;func] + } + +// Retrieve a Keras model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +mlops.get.keras:{[filePath] + func:.p.import[`keras.models]`:load_model; + mlops.get.typedModel[`keras;filePath;func] + } + +// Retrieve a Theano model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +mlops.get.theano:{[filePath] + func:.p.import[`joblib]`:load; + mlops.get.typedModel[`theano;filePath;func] + } + +// Retrieve a PyTorch model from disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +mlops.get.torch:{[filePath] + torch:.p.import`torch; + model:@[torch`:load; + filePath; + {[torch;filePath;err] + @[torch`:jit.load; + filePath; + {[x;y]'"Could not retrieve the requested model at ",x}[filePath] + ] + }[torch;filePath] + ]; + mlops.check.torch[model;1b]; + model + } + +// Retrieve a DAG from a location on disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {dictionary} The dictionary defining a saved workflow +mlops.get.graph:{[filePath] + func:.dag.loadGraph; + mlops.get.typedModel[`graph;filePath;func] + } + +// Retrieve a pyspark model from a location on disk +// +// @param filePath {string} The full path the model to be retrieved +// @return {<} The embedPy object associated with the model saved +.ml.mlops.get.pyspark:{[modelPath] + pipe:.p.import[`pyspark.ml]`:PipelineModel; + func:pipe`:load; + @[func;modelPath;{[x;y]'"Could not retrieve the requested model at ",x}[modelPath]] + }; + +// Retrieve a model from disk. +// +// @param typ {symbol} Type of model being retrieved +// @param filePath {string} The full path to the desired model +// @param func {function} Function used to retrieve model object +// @return {dict|fn|proj} The model previously saved to disk within the +// registry +mlops.get.typedModel:{[typ;filePath;func] + mdl:$[typ~`q;hsym`$filePath;filePath]; + model:@[func;mdl; + {[x;y]'"Could not retrieve the requested model at ",x}[filePath] + ]; + mlops.check[typ][model;1b]; + model + } diff --git a/ml/mlops/src/q/init.q b/ml/mlops/src/q/init.q new file mode 100644 index 0000000..9c88d34 --- /dev/null +++ b/ml/mlops/src/q/init.q @@ -0,0 +1,11 @@ +// init.q - Initialise functionality related to MLOps Tools +// Copyright (c) 2021 Kx Systems Inc + +// Functionality relating to MLOps Tools + +// Load all functionality +mlops.loadfile`:src/q/create.q +mlops.loadfile`:src/q/check.q +mlops.loadfile`:src/q/get.q +mlops.loadfile`:src/q/misc.q +mlops.loadfile`:src/q/update.q diff --git a/ml/mlops/src/q/misc.q b/ml/mlops/src/q/misc.q new file mode 100644 index 0000000..cdae936 --- /dev/null +++ b/ml/mlops/src/q/misc.q @@ -0,0 +1,169 @@ +\d .ml + +// .ml.registry.util.percentile - Functionality from the ml-toolkit. Percentile +// calculation for an array +// @param array {number[]} A numerical array +// @param perc {float} Percentile of interest +// @returns {float} The value below which `perc` percent of the observations +// within the array are found +mlops.percentile:{[array;perc] + array:array where not null array; + percent:perc*-1+count array; + i:0 1+\:floor percent; + iDiff:0^deltas asc[array]i; + iDiff[0]+(percent-i 0)*last iDiff + } + +// Apply function to data of various types +// @param func {fn} Function to apply to data +// @param data {any} Data of various types +// @return {fn} function to apply to data +mlops.ap:{[func;data] + $[0=type data; + func each data; + 98=type data; + flip func each flip data; + 99<>type data; + func data; + 98=type key data; + key[data]!.z.s[func] value data; + func each data + ] + } + +// Replace +/- infinities with data min/max +// @param data {table|dictionary|number[]} Numerical data +// @return {table|dictionary|number[]} Data with positive/negative +// infinities are replaced by max/min values +mlops.infReplace:mlops.ap{[data;inf;func] + t:.Q.t abs type first first data; + if[not t in "hijefpnuv";:data]; + i:$[t;]@/:(inf;0n); + @[data;i;:;func@[data;i:where data=i 0;:;i 1]] + }/[;-0w 0w;min,max] + +// Load code with the file extension '*.py' +// +// @param codePath {string} The absolute path to the 'code' +// folder containing any source code +// @param files {symbol|symbol[]} Python files which should be loadable +// return {::} +mlops.load.py:{[codePath;files] + sys:.p.import`sys; + sys[`:path.append][codePath]; + pyfiles:string files; + {.p.e "import ",x}each -3_/:$[10h=type pyfiles;enlist;]pyfiles + } + +// Wrap models such that they all have a predict key regardless of where +// they originate +// +// @param mdlType {symbol} Form of model being used `q`sklearn`xgboost`keras`torch`theano, +// this defines how the model gets interpreted in the case it is Python code +// in particular. +// @param model {dictionary|fn|proj|<|foreign} Model retrieved from registry +// @return {fn|proj|<|foreign} The predict function +mlops.format:{[mdlType;model] + $[99h=type model; + model[`predict]; + type[model]in 105 112h; + $[mdlType in `sklearn`xgboost; + {[model;data] + model[`:predict;<]$[98h=type data;tab2df;]data + }[model]; + mdlType~`keras; + raze model[`:predict;<] .p.import[`numpy][`:array]::; + mdlType~`torch; + (raze/){[model;data] + data:$[type data<0;enlist;]data; + prediction:model .p.import[`torch][`:Tensor][data]; + prediction[`:cpu][][`:detach][][`:numpy][]` + }[model]each::; + mdlType~`theano; + {x`}model .p.import[`numpy][`:array]::; + mdlType~`pyspark; + {[model;data] + $[.pykx.loaded; + {.pykx.eval["lambda x: x.asDict()"][x]`} each model[`:transform][data][`:select][`prediction][`:collect][]`; + first flip model[`:transform][data][`:select]["prediction"][`:collect][]` + ] + }[model]; + model + ]; + model + ] + } + +// Transform data incoming into an appropriate format +// this is important because data that is being passed to the Python +// models and data that is being passed to the KX models relies on a +// different 'formats' for the data (Custom models in q would expect data) +// in 'long' format rather than 'wide' in current implementation +// +// @param data {any} Input data being passed to the model +// @param axis {boolean} Whether the data is to be in a 'long' or 'wide' format +// @param mdlType {symbol} Form of model being used `q`sklearn`xgboost`keras`torch`theano, +// this defines how the model gets interpreted in the case it is Python code +// in particular. +// @return {any} The data in the appropriate format +.ml.mlops.transform:{[data;axis;mdlType] + dataType:type data; + if[mdlType=`pyspark; + :.ml.mlops.pysparkInput data]; + if[dataType<=20;:data]; + if[mdlType in `xgboost`sklearn; + $[(98h=type data); + :tab2df data; + :data]]; + data:$[98h=dataType; + value flip data; + 99h=dataType; + value data; + dataType in 105 112h; + @[{value flip .ml.df2tab x};data;{'"This input type is not supported"}]; + '"This input type is not supported" + ]; + if[98h<>type data; + data:$[axis;;flip]data + ]; + data + } + +// Utility function to transform data suitable for a pyspark model +// +// @param data {table|any[][]} Input data +// @param {<} An embedPy object representing a Spark dataframe +mlops.pysparkInput:{[data] + if[not type[data] in 0 98h; + '"This input type is not supported" + ]; + $[98h=type data; + [df:.p.import[`pyspark.sql][`:SparkSession.builder.getOrCreate][] + [`:createDataFrame] .ml.tab2df data; + :df:.p.import[`pyspark.ml.feature][`:VectorAssembler] + [`inputCols pykw df[`:columns];`outputCol pykw `features] + [`:transform][df] + ]; + [data:flip (`$string each til count data[0])!flip data; + .z.s data] + ] + } + +// Wrap models retrieved such that they all have the same format regardless of +// from where they originate, the data passed to the model will also be transformed +// to the appropriate format +// +// @param mdlType {symbol} Form of model being used `q`sklearn`xgboost`keras`torch`theano, +// this defines how the model gets interpreted in the case it is Python code +// in particular. +// @param model {dictionary|fn|proj|<|foreign} Model retrieved from registry +// @param axis {boolean} Whether the data should be in a 'long' (0b ) or +// 'wide' (1b) format +// @return {fn|proj|<|foreign} The predict function wrapped with a transformation +// function +mlops.wrap:{[mdlType;model;axis] + model:mlops.format[mdlType;model]; + transform:mlops.transform[;axis;mdlType]; + model transform:: + } + diff --git a/ml/mlops/src/q/update.q b/ml/mlops/src/q/update.q new file mode 100644 index 0000000..a6214d9 --- /dev/null +++ b/ml/mlops/src/q/update.q @@ -0,0 +1,227 @@ +\d .ml + +// Update the latency monitoring details of a saved model +// @param config {dictionary} Any additional configuration needed for +// setting the model + +// @param cli {dictionary} Command line arguments as passed to the system on +// initialisation, this defines how the fundamental interactions of +// the interface are expected to operate. +// @param model {function} Model to be applied +// @param data {table} The data which is to be used to calculate the model +// latency +// @return {::} +mlops.update.latency:{[fpath;model;data] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + func:{{system"sleep 0.0005";t:.z.p;x y;(1e-9)*.z.p-t}[x]each 30#y}model; + updateData:@[`avg`std!(avg;dev)@\:func::; + data; + {'"Unable to generate appropriate configuration for latency with error: ",x} + ]; + config[`monitoring;`latency;`values]:updateData; + config[`monitoring;`latency;`monitor]:1b; + // need to add deps for .com_kx_json + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {} + ]; + } + +// Update configuration information related to null value replacement from +// within a q process +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param data {table} Representative/training data suitable for providing +// statistics about expected system behaviour +// @return {::} +mlops.update.nulls:{[fpath;data] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + if[98h<>type data; + -1"Updating schema information only supported for tabular data"; + :(::) + ]; + func:{med each flip mlops.infReplace x}; + updateData:@[func; + data; + {'"Unable to generate appropriate configuration for nulls with error: ",x} + ]; + config[`monitoring;`nulls;`values]:updateData; + config[`monitoring;`nulls;`monitor]:1b; + // need to add deps for .com_kx_json + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {'"Could not persist configuration to JSON file with error: ",x} + ]; + + } + +// Update configuration information related to infinity replacement from +// within a q process +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param data {table} Representative/training data suitable for providing +// statistics about expected system behaviour +// @return {::} +mlops.update.infinity:{[fpath;data] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + if[98h<>type data; + -1"Updating schema information only supported for tabular data"; + :(::) + ]; + func:{(`negInfReplace`posInfReplace)!(min;max)@\:mlops.infReplace x}; + updateData:@[func; + data; + {'"Unable to generate appropriate configuration for infinities with error: ",x} + ]; + config[`monitoring;`infinity;`values]:updateData; + config[`monitoring;`infinity;`monitor]:1b; + // need to add deps for .com_kx_json + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {'"Could not persist configuration to JSON file with error: ",x} + ]; + } + +// Update configuration information related to CSI from within a q process +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param data {table} Representative/training data suitable for providing +// statistics about expected system behaviour +// @return {::} +mlops.update.csi:{[fpath;data] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + if[98h<>type data; + -1"Updating CSI information only supported for tabular data"; + :(::) + ]; + bins:first 10^@["j"$;(count data)&@[{"J"$.ml.monitor.config.args x};`bins;{0N}];{0N}]; + updateData:@[{mlops.create.binExpected[;y]each flip x}[;bins]; + data; + {'"Unable to generate appropriate configuration for CSI with error: ",x} + ]; + config[`monitoring;`csi;`values]:updateData; + config[`monitoring;`csi;`monitor]:1b; + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {} + ]; + } + +// Update configuration information related to PSI from within a q process +// +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param model {function} Prediction function to be used to generate +// representative predictions for population stability calculation +// @param data {table} Representative/training data suitable for providing +// statistics about expected system behaviour +// @return {::} +mlops.update.psi:{[fpath;model;data] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + if[98h<>type data; + -1"Updating PSI information only supported for tabular data"; + :(::) + ]; + bins:first 10^@["j"$;(count data)&@[{"J"$.ml.monitor.config.args x};`bins;{0N}];{0N}]; + func:{mlops.create.binExpected[raze x y;z]}[model;;bins]; + updateData:@[func; + data; + {'"Unable to generate appropriate configuration for PSI with error: ",x} + ]; + config[`monitoring;`psi;`values]:updateData; + config[`monitoring;`psi;`monitor]:1b; + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {} + ]; + .ml.monitor.config.model::config; + } + +// Update configuration information related to type information for models +// retrieved from disk +// +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param format {string} Type/format of the model that is being retrieved from disk +// @return {::} +mlops.update.type:{[fpath;format] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + config[`model;`type]:format; + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {} + ]; + .ml.monitor.config.model::config; + } + +// Update supervised monitoring information +// +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param metrics {string[]} Type/format of the model that is being retrieved from disk +// @return {::} +mlops.update.supervise:{[fpath;metrics] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + config[`monitoring;`supervised;`values]:metrics; + config[`monitoring;`supervised;`monitor]:1b; + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {} + ]; + .ml.monitor.config.model::config; + } + + +// Update configuration information related to expected from within a q process +// +// @param fpath {string|symbol|hsym} Path to a JSON file to be used to +// overwrite initially defined configuration +// @param data {table} Representative/training data suitable for providing +// statistics about expected system behaviour +// @return {::} +mlops.update.schema:{[fpath;data] + fpath:hsym $[-11h=ty:type fpath;;10h=ty;`$;'"unsupported fpath"]fpath; + config:@[.j.k raze read0::; + fpath; + {'"Could not load configuration file at ",x," with error: ",y}[1_string fpath] + ]; + if[98h<>type data; + -1"Updating schema information only supported for tabular data"; + :(::) + ]; + config[`monitoring;`schema;`values]:(!) . (select c,t from meta data)`c`t; + config[`monitoring;`schema;`monitor]:1b; + .[{[fpath;config] fpath 0: enlist .j.j config}; + (fpath;config); + {'"Could not persist configuration to JSON file with error: ",x} + ]; + .ml.monitor.config.model::config; + } diff --git a/ml/mlops/tests/main.q b/ml/mlops/tests/main.q new file mode 100644 index 0000000..732e0a4 --- /dev/null +++ b/ml/mlops/tests/main.q @@ -0,0 +1 @@ +\l init.q diff --git a/ml/mlops/tests/performance/benchmark1/performance.q b/ml/mlops/tests/performance/benchmark1/performance.q new file mode 100644 index 0000000..fe4640c --- /dev/null +++ b/ml/mlops/tests/performance/benchmark1/performance.q @@ -0,0 +1,30 @@ +\l tests/performance/load.q + +// @desc All data required for running the expected analytic, this should +// be structured such that each dictionary key has the same count +// and each 'vertical slice' describes an individual set of parameters to +// be passed to the function +data:`X`y!(til 5;2+til 5) + +// @desc Number of unique evaluations being completed +idxs:count first data + +// @desc Function to be invoked when generating metrics +// @param iter {long} Number of iterations of the function to be completed +// @param idx {long} Index of the dictionary 'data' to use +.performance.func:{[iter;idx] + n:string count data[;idx]`X; + timing:first system "ts:",string[iter]," .template.example . value data[;",string[idx],"]"; + .performance.metricStore,:.performance.metric[".template.example";n;timing;iter] + } + +// Execute the performance function +.performance.func[100] each til idxs + +// Publish the metrics in accordance with requirements for prometheus +-1 "===METRICS==="; +-1 @'.performance.metricStore; +-1 "===END METRICS==="; + +// Exit execution +exit 0; diff --git a/ml/mlops/tests/performance/load.q b/ml/mlops/tests/performance/load.q new file mode 100644 index 0000000..4bf193c --- /dev/null +++ b/ml/mlops/tests/performance/load.q @@ -0,0 +1,18 @@ +\l init.q + +// Load the functionality and variables required when running individual performance tests +// NOTE: This is configurable and developers should modify/add functions that are needed for +// specific benchmarks as required + +// @desc Global for metric storage +.performance.metricStore:() + +// @desc Function to be invoked for the generation of metrics to be published to gitlab +// @param name {string} The name of the metric which is to be published +// @param n {string} The number of datapoints being used in the evaluation +// @param duration {long} The total length of time taken to run the metric evaluation +// @param times {long} The total number of repetitions of the function being evaluated +// @return {string} The metric information that is to be publised to gitlab +.performance.metric:{[name;n;duration;times] + enlist name,"_performance{count=",n,",times=",string[times],"} ",string duration % times + } diff --git a/ml/mlops/tests/template.quke b/ml/mlops/tests/template.quke new file mode 100644 index 0000000..250a25d --- /dev/null +++ b/ml/mlops/tests/template.quke @@ -0,0 +1,5 @@ +feature Hello + should Say hello + expect Correct message + "Hello world" ~ "Hello world" + diff --git a/ml/registry/config/README.md b/ml/registry/config/README.md new file mode 100644 index 0000000..375d712 --- /dev/null +++ b/ml/registry/config/README.md @@ -0,0 +1,7 @@ +# Config + +This section of the repository contains all information relating to predefined configuration needed for setting models/experiments to the registry and deploying models. All configuration should be written as JSON dictionaries which can be parsed by `.j.k`. At present there are two distinct configurations being defined within this section of the repository. + +1. `command-line.json` - Default information relating to command line retrieval of the model and how data being passed to these models should be managed. +2. `default.json` - Default model information and definition of how by default model versions are to be incremented. +3. `model.json` - Basic model information used to define model behaviour and monitoring configuration. diff --git a/ml/registry/config/command-line.json b/ml/registry/config/command-line.json new file mode 100644 index 0000000..1574703 --- /dev/null +++ b/ml/registry/config/command-line.json @@ -0,0 +1,8 @@ +{ + "modelName":"", + "version":"inf", + "vendor":"", + "bins":10, + "deployType":false, + "code":"" +} diff --git a/ml/registry/config/config.q b/ml/registry/config/config.q new file mode 100644 index 0000000..35c176a --- /dev/null +++ b/ml/registry/config/config.q @@ -0,0 +1,101 @@ +// config.q - Configuration used by the default usage of the registry functions +// Copyright (c) 2021 Kx Systems Inc +// +// @category Model-Registry +// @subcategory Configuration + +\d .ml + +// @kind function +// @category config +// +// @overview +// Retreive default dictionary values from JSON file +// +// @param file {string} File to retrieve +// +// @return {dict} Default JSON values +getJSON:{[file] + registry.config.util.getJSON"registry/config/",file,".json" + } + +// @private +registry.config.default:getJSON"model" + +// @private +registry.config.model:getJSON"default" + +// @private +/registry.config.cloudDefault:getJSON"cloud" + +// @private +registry.config.cliDefault:getJSON"command-line" + +// @private +symConvert:`modelName`version`vendor`code + +// @private +registry.config.cliDefault[symConvert]:`$registry.config.cliDefault symConvert + + +// @kind function +// @category config +// +// @overview +// Convert CLI version to correct format +// +// @param cfg {dict} CLI config dictionary +// +// @return {string|null} Updated version +convertVersion:{[cfg] + $[`inf~cfg`version;(::);raze"J"$"."vs string cfg`version] + } + +// @private +registry.config.commandLine:.Q.def[registry.config.cliDefault].Q.opt .z.x + +// @private +registry.config.commandLine[`version]:convertVersion registry.config.commandLine + +// Ensure only one cloud vendor is to be used +// @private +cloudVendors:`aws`azure`gcp +if[1type aws;`$;]aws; + } + +// @private +// +// @overview +// Update the GCP default configuration if required and validate +// configuration is suitable, error if configuration is not appropriate +// as command line input or within the default configuration. +// +// @return {null} +registry.config.util.updateGCP:{[] + cli :registry.config.commandLine`gcp; + json:registry.config.cloudDefault[`gcp;`bucket]; + bool:`~.ml.registry.config.commandLine`gcp; + gcp :$[bool;json;cli]; + if[not gcp like "gs://*"; + .ml.log.fatal "GCP bucket must be defined via command line or in JSON config in the form gs://*"; + ]; + .ml.registry.config.commandLine[`gcp]:$[-11h<>type gcp;`$;]gcp; + } + +// @private +// +// @overview +// Update the Azure default configuration if required and validate +// configuration is suitable, error if configuration is not appropriate +// as command line input or within the default configuration. +// +// @return {null} +registry.config.util.updateAzure:{[] + cli :registry.config.commandLine`azure; + json :`${x[0],"?",x 1}registry.config.cloudDefault[`azure;`blob`token]; + bool :`~.ml.registry.config.commandLine`azure; + azure:$[bool;json;cli]; + if[not like[azure;"ms://*"]|all like[azure]each("*?*";"http*"); + .ml.log.fatal "Azure blob definition via command line or in JSON config in the form http*?* | ms://*"; + ]; + .ml.registry.config.commandLine[`azure]:$[-11h<>type azure;`$;]azure; + } diff --git a/ml/registry/init.q b/ml/registry/init.q new file mode 100644 index 0000000..09293d0 --- /dev/null +++ b/ml/registry/init.q @@ -0,0 +1,16 @@ +\d .ml + +restinit:0b; //Not applicable functionality + +if[not @[get;".ml.registry.init";0b]; + /loadfile`:src/analytics/util/init.q; + registry.config.init:.Q.opt .z.x; + loadfile`:registry/config/utils.q; + loadfile`:registry/config/config.q; + loadfile`:registry/q/init.q; + if[restinit; + loadfile`:registry/q/rest/init.q + ] + ] + +.ml.registry.init:1b diff --git a/ml/registry/q/README.md b/ml/registry/q/README.md new file mode 100644 index 0000000..261ff30 --- /dev/null +++ b/ml/registry/q/README.md @@ -0,0 +1,10 @@ +# q functionality + +This section of the repository contains all code related to the most fundamental usage of this repository. It does not contain anything related to use case/optional functionality specific code. As such the code here relates to the following: + +1. Users can generate `new` registries and experiments +2. Users can `set` q/Python models and config within the registry +3. Users can `get` q/Python models and config from the registry +4. Users can `delete` models, experiments and the registry from specified locations +5. Users can `log` metrics relating to specific models within the registry +6. Users can `update` config within the registry diff --git a/ml/registry/q/init.q b/ml/registry/q/init.q new file mode 100644 index 0000000..ec11b86 --- /dev/null +++ b/ml/registry/q/init.q @@ -0,0 +1,17 @@ +// init.q - Initialise q functionality related to the model registry +// Copyright (c) 2021 Kx Systems Inc +// +// Functionality relating to all basic interactions with the registry + +\d .ml + +if[not @[get;"registry.q.init";0b]; + // Load all utilities + loadfile`:registry/q/main/utils/init.q; + // Load all functionality; + loadfile`:registry/q/main/init.q; + loadfile`:registry/q/local/init.q; + /loadfile`:registry/q/cloud/init.q; + ] + +registry.q.init:1b diff --git a/ml/registry/q/local/delete.q b/ml/registry/q/local/delete.q new file mode 100644 index 0000000..98f6608 --- /dev/null +++ b/ml/registry/q/local/delete.q @@ -0,0 +1,32 @@ +// delete.q - Functionality for the deletion of items locally +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Delete local items +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category local +// @subcategory delete +// +// @overview +// Delete a registry and the entirety of its contents locally +// +// @param cli {dict} UNUSED +// @param folderPath {string|null} A folder path indicating the location +// the registry to be deleted or generic null to remove registry in the current +// directory +// @param config {dict} Information relating to registry being deleted +// +// @return {null} +registry.local.delete.registry:{[folderPath;config] + config:registry.util.getRegistryPath[folderPath;config]; + registry.util.delete.folder config`registryPath; + -1 config[`registryPath]," deleted."; + } diff --git a/ml/registry/q/local/init.q b/ml/registry/q/local/init.q new file mode 100644 index 0000000..91c00dd --- /dev/null +++ b/ml/registry/q/local/init.q @@ -0,0 +1,19 @@ +// init.q - Initialise functionality for local FS interactions +// Copyright (c) 2021 Kx Systems Inc +// +// Functionality relating to all interactions with local +// file system storage + +\d .ml + +if[not @[get;".ml.registry.q.local.init";0b]; + // Load all utilities + loadfile`:registry/q/local/utils/init.q; + // Load all functionality + loadfile`:registry/q/local/new.q; + loadfile`:registry/q/local/set.q; + loadfile`:registry/q/local/update.q; + loadfile`:registry/q/local/delete.q + ] + +registry.q.local.init:1b diff --git a/ml/registry/q/local/new.q b/ml/registry/q/local/new.q new file mode 100644 index 0000000..5464a02 --- /dev/null +++ b/ml/registry/q/local/new.q @@ -0,0 +1,57 @@ +// new.q - Generation of new elements of the ML registry locally +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// This functionality is intended to provide the ability to generate new +// registries and experiments within these registries. +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category local +// @subcategory new +// +// @overview +// Generates a new model registry at a user specified location on-prem. +// +// @param config {dict|null} Any additional configuration needed for +// initialising the registry +// +// @return {dict} Updated config dictionary containing relevant +// registry paths +registry.local.new.registry:{[config] + config:registry.util.create.registry config; + config:registry.util.create.modelStore config; + registry.util.create.experimentFolders config; + config + } + +// @kind function +// @category local +// @subcategory new +// +// @overview +// Generates a new named experiment within the specified registry +// locally without adding a model +// +// @todo +// It should be possible via configuration to add descriptive information +// about an experiment. +// +// @param experimentName {string} The name of the experiment to be located +// under the namedExperiments folder which can be populated by new models +// associated with the experiment +// @param config {dict|null} Any additional configuration needed for +// initialising the experiment +// +// @return {dict} Updated config dictionary containing relevant +// registry paths +registry.local.new.experiment:{[experimentName;config] + config:registry.local.util.check.registry config; + registry.util.create.experiment[experimentName;config] + } diff --git a/ml/registry/q/local/set.q b/ml/registry/q/local/set.q new file mode 100644 index 0000000..5b662de --- /dev/null +++ b/ml/registry/q/local/set.q @@ -0,0 +1,77 @@ +// set.q - Callable functions for the publishing of items to local file system +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Publish items to local file system +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category local +// @subcategory set +// +// @overview +// Set a model within local file-system storage +// +// @param experimentName {string} The name of the experiment to which a model +// being added to the registry is associated +// @param model {any} `(<|dict|fn|proj)` The model to be saved to the registry. +// @param modelName {string} The name to be associated with the model +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras"|"python" +// @param config {dict} Any additional configuration needed for +// setting the model +// +// @return {null} +registry.local.set.model:{[experimentName;model;modelName;modelType;config] + config:registry.util.check.registry config; + $[experimentName in ("undefined";""); + config[`experimentPath]:config[`registryPath],"/unnamedExperiments"; + config:registry.new.experiment[config`folderPath;experimentName;config] + ]; + config:(enlist[`major]!enlist 0b),config; + config:registry.util.update.config[modelName;modelType;config]; + function:registry.util.set.model; + arguments:(model;modelType;config); + registry.util.protect[function;arguments;config] + } + +// @kind function +// @category local +// @subcategory set +// +// @overview +// Set parameter information associated with a model locally +// +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param paramName {string} The name of the parameter to be saved +// @param params {dict|table|string} The parameters to save to file +// +// @return {null} +registry.local.set.parameters:{[experimentName;modelName;version;paramName;params;config] + config:registry.util.check.registry config; + // Retrieve the model from the store meeting the user specified conditions + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + // Construct the path to model folder containing the model to be retrieved + config,:flip modelDetails; + paramPath:registry.util.path.modelFolder[config`registryPath;config;`params]; + paramPath:paramPath,paramName,".json"; + registry.util.set.params[paramPath;params] + } diff --git a/ml/registry/q/local/update.q b/ml/registry/q/local/update.q new file mode 100644 index 0000000..6db4df7 --- /dev/null +++ b/ml/registry/q/local/update.q @@ -0,0 +1,48 @@ +// update.q - Callable functions for updating information related to a model +// on local file-sytem +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Update local model information +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category local +// @subcategory update +// +// @overview +// Prepare information for local updates +// +// @param folderPath {string|null} A folder path indicating the location +// of the registry or generic null if in the current directory +// @param experimentName {string|null} The name of an experiment within which +// the model having additional information added is located. +// @param modelName {string|null} The name of the model to which additional +// information is being added. In the case this is null, the newest model +// associated with the experiment is retrieved +// @param version {long[]|null} The specific version of a named model to add the +// new parameters to. In the case that this is null the newest model is retrieved +// generaly expressed as a duple (major;minor) +// @param config {dict} Any additional configuration needed for updating +// the parameter information associated with a model +// +// @return {dict} All information required for setting new configuration/ +// requirements information associated with a model +registry.local.update.prep:{[folderPath;experimentName;modelName;version;config] + config:registry.util.check.registry config; + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + // Construct the path to model folder containing the model to be retrieved + config,:flip modelDetails; + config[`versionPath]:registry.util.path.modelFolder[config`registryPath;config;::]; + config:registry.config.model,config; + config + } diff --git a/ml/registry/q/local/utils/check.q b/ml/registry/q/local/utils/check.q new file mode 100644 index 0000000..c061645 --- /dev/null +++ b/ml/registry/q/local/utils/check.q @@ -0,0 +1,37 @@ +// check.q - Utilities relating to checking of suitability of registry items +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities for checking items locally +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Check if the registry which is being manipulated exists, if it does not +// generate the registry at the sprcified location +// +// @param config {dict|null} Any additional configuration needed for +// initialising the registry +// +// @return {dict} Updated config dictionary containing registry path +registry.local.util.check.registry:{[config] + registryPath:config[`folderPath],"/KX_ML_REGISTRY"; + config:$[()~key hsym`$registryPath; + [logging.info"Registry does not exist at: '",registryPath, + "'. Creating registry in that location."; + registry.new.registry[config`folderPath;config] + ]; + [modelStorePath:hsym`$registryPath,"/modelStore"; + paths:`registryPath`modelStorePath!(registryPath;modelStorePath); + config,paths + ] + ]; + config + } diff --git a/ml/registry/q/local/utils/init.q b/ml/registry/q/local/utils/init.q new file mode 100644 index 0000000..df280d3 --- /dev/null +++ b/ml/registry/q/local/utils/init.q @@ -0,0 +1,13 @@ +// init.q - Initialise Utilities for local FS interactions +// Copyright (c) 2021 Kx Systems Inc +// +// Utilties relating to all interactions with local file +// system storage + +\d .ml + +if[not @[get;".ml.registry.q.local.util.init";0b]; + loadfile`:registry/q/local/utils/check.q + ] + +registry.q.local.util.init:1b diff --git a/ml/registry/q/main/delete.q b/ml/registry/q/main/delete.q new file mode 100644 index 0000000..e66a2ac --- /dev/null +++ b/ml/registry/q/main/delete.q @@ -0,0 +1,298 @@ +// delete.q - Main callable functions for deleting items from the model registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Delete items from the registry +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete a registry and the entirety of its contents +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param config {dict} Information relating to registry being deleted +// +// @return {null} +registry.delete.registry:{[folderPath;config] + config:registry.util.check.config[folderPath;config]; + if[`local<>storage:config`storage;storage:`cloud]; + registry[storage;`delete;`registry][folderPath;config] + } + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete an experiment and its associated models from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} Name of the experiment to be deleted +// +// @return {null} +registry.delete.experiment:{[folderPath;experimentName] + config:registry.util.check.config[folderPath;()!()]; + $[`local<>config`storage; + registry.cloud.delete.experiment[config`folderPath;experimentName;config]; + [config:`folderPath`experimentName!(config`folderPath;experimentName); + registry.util.delete.object[config;`experiment]; + ] + ]; + } + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete a version of a model/all models associated with a name +// from the registry and modelStore table +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} Name of the experiment to be deleted +// @param modelName {string|null} The name of the model to retrieve +// @param version {long[]|null} The version of the model to retrieve (major;minor) +// +// @return {null} +registry.delete.model:{[folderPath;experimentName;modelName;version] + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + // Locate/retrieve the registry locally or from the cloud + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config] + ] + ]; + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + modelName:first modelDetails `modelName; + config:registry.util.check.config[folderPath;()!()]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + $[`local<>config`storage; + registry.cloud.delete.model[config;experimentName;modelName;version]; + [configKeys:`folderPath`experimentName`modelName`version; + configVals:(config`folderPath;experimentName;modelName;version); + config:configKeys!configVals; + objectType:$[(::)~version;`allModels;`modelVersion]; + registry.util.delete.object[config;objectType] + ] + ]; + } + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete a parameter file associated with a name +// from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} Name of the experiment to be deleted +// @param modelName {string|null} The name of the model to retrieve +// @param version {long[]} The version of the model to retrieve (major;minor) +// @param paramFile {string} Name of the parameter file to delete +// +// @return {null} +registry.delete.parameters:{[folderPath;experimentName;modelName;version;paramFile] + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + // Locate/retrieve the registry locally or from the cloud + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config] + ] + ]; + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + modelName:first modelDetails `modelName; + version:first modelDetails `version; + config:registry.util.check.config[folderPath;()!()]; + $[`local<>config`storage; + [function:registry.cloud.delete.parameters; + params:(config;experimentName;modelName;version;paramFile); + function . params; + ]; + [function:registry.util.getFilePath; + params:(config`folderPath;experimentName;modelName;version;`params;enlist[`paramFile]!enlist paramFile); + location:function . params; + if[()~key location;logging.error"No parameter files exists with the given name, unable to delete."]; + hdel location; + ] + ]; + } + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete the metric table associated with a name +// from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} Name of the experiment to be deleted +// @param modelName {string|null} The name of the model to retrieve +// @param version {long[]} The version of the model to retrieve (major;minor) +// +// @return {null} +registry.delete.metrics:{[folderPath;experimentName;modelName;version] + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + // Locate/retrieve the registry locally or from the cloud + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config]] + ]; + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + modelName:first modelDetails `modelName; + version:first modelDetails `version; + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + folderPath:config`folderPath; + $[`local<>storage; + registry.cloud.delete.metrics[config;experimentName;modelName;version]; + [function:registry.util.getFilePath; + params:(folderPath;experimentName;modelName;version;`metrics;()!()); + location:function . params; + if[()~key location;logging.error"No metric table exists at this location, unable to delete."]; + hdel location; + ] + ]; + } + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete the code associated with a name +// from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} Name of the experiment to be deleted +// @param modelName {string|null} The name of the model to retrieve +// @param version {long[]} The version of the model to retrieve (major;minor) +// @param codeFile {string} The type of config +// +// @return {null} +registry.delete.code:{[folderPath;experimentName;modelName;version;codeFile] + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + // Locate/retrieve the registry locally or from the cloud + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config]] + ]; + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + modelName:first modelDetails `modelName; + version:first modelDetails `version; + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + folderPath:config`folderPath; + $[`local<>storage; + [function:registry.cloud.delete.code; + params:(config;experimentName;modelName;version;codeFile); + function . params + ]; + [function:registry.util.getFilePath; + params:(folderPath;experimentName;modelName;version;`code;enlist[`codeFile]!enlist codeFile); + location:function . params; + if[()~key location;logging.error"No such code exists at this location, unable to delete."]; + hdel location + ] + ]; + } + +// @kind function +// @category main +// @subcategory delete +// +// @overview +// Delete a metric from the metric table associated with a name +// from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} Name of the experiment to be deleted +// @param modelName {string|null} The name of the model to retrieve +// @param version {long[]} The version of the model to retrieve (major;minor) +// @param metricName {string} The name of the metric +// +// @return {null} +registry.delete.metric:{[folderPath;experimentName;modelName;version;metricName] + if[-11h=type metricName;metricName:string metricName]; + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + // Locate/retrieve the registry locally or from the cloud + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config]] + ]; + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + modelName:first modelDetails `modelName; + version:first modelDetails `version; + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + folderPath:config`folderPath; + $[`local<>storage; + [function:registry.cloud.delete.metric; + params:(config;experimentName;modelName;version;metricName); + function . params + ]; + [function:registry.util.getFilePath; + params:(folderPath;experimentName;modelName;version;`metrics;()!()); + location:function . params; + if[()~key location;logging.error"No metric table exists at this location, unable to delete."]; + location set ?[location;enlist (not;(like;`metricName;metricName));0b;`symbol$()]; + ] + ]; + } diff --git a/ml/registry/q/main/get.q b/ml/registry/q/main/get.q new file mode 100644 index 0000000..1175f10 --- /dev/null +++ b/ml/registry/q/main/get.q @@ -0,0 +1,329 @@ +// get.q - Main callable functions for retrieving information from the +// model registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Retrieve items from the registry including +// 1. Models: +// - q (functions/projections/appropriate dictionaries) +// - Python (python functions + sklearn/keras specific functionality) +// 2. Configuration +// 3. Model registry +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Retrieve a q/python/sklearn/keras model from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// +// @return {dict} The model and information related to the +// generation of the model +registry.get.model:registry.util.get.object[`model;;;;;::] + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Retrieve a keyed q/python/sklearn/keras model from the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param key {symbol} key from the model to retrieve +// +// @return {dict} The model and information related to the +// generation of the model +registry.get.keyedmodel:registry.util.get.object[`model] + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Retrieve language/library version information associated with a model stored in the registry +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve model information, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model from which to retrieve +// version information in the case this is null, the newest model associated +// with the experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// +// @return {dict} Information about the model stored in the registry including +// q version/date and if applicable Python version and Python library versions +registry.get.version:registry.util.get.object[`version;;;;;::] + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Load the metric table for a specific model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param param {null|dict|symbol|string} Search parameters for the retrieval +// of metrics +// in the case when this is a string, it is converted to a symbol +// +// @return {table} The metric table for a specific model, which may +// potentially be filtered +registry.get.metric:registry.util.get.object[`metric] + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Load the parameter information for a specific model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param param {symbol|string} The name of the parameter to retrieve +// +// @return {string|dict|table|float} The value of the parameter associated +// with a named parameter saved for the model. +registry.get.parameters:registry.util.get.object[`params] + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Retrieve a q/python/sklearn/keras model from the registry for prediction +// +// @todo +// Add type checking for modelName/experimentName/version +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// +// @return {any} `(<|dict|fn|proj)` Model retrieved from the registry. +registry.get.predict:{[folderPath;experimentName;modelName;version] + getModel:registry.get.model[folderPath;experimentName;modelName;version]; + if[registry.config.commandLine[`deployType];:getModel`model]; + modelType:`$getModel[`modelInfo;`model;`type]; + if[`graph~modelType; + logging.error"Retrieval of prediction function not supported for 'graph'" + ]; + axis:getModel[`modelInfo;`model;`axis]; + if[""~axis;axis:0b]; + model:getModel`model; + mlops.wrap[modelType;model;axis] + } + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Retrieve a q/python/sklearn/keras model from the registry for update +// +// @todo +// Add type checking for modelName/experimentName/version +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param supervised {boolean} Decide is model update supervised +// +// @return {any} `(<|dict|fn|proj)` Model retrieved from the registry. +registry.get.update:{[folderPath;experimentName;modelName;version;supervised] + getModel:registry.get.model[folderPath;experimentName;modelName;version]; + if[registry.config.commandLine[`deployType];:getModel`model]; + modelType:`$getModel[`modelInfo;`model;`type]; + if[`graph~modelType; + logging.error"Retrieval of prediction function not supported for 'graph'" + ]; + axis:getModel[`modelInfo;`model;`axis]; + model:getModel`model; + mlops.wrapUpdate[modelType;model;axis;supervised] + } + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Wrap models such that they all have a predict key regardless of where +// they originate +// +// @param mdlType {symbol} Form of model being used ```(`q/`sklearn/`keras)```, this +// defines how the model gets interpreted in the case it is Python code +// in particular. +// @param model {any} `(<|dict|fn|proj|foreign)` Model retrieved from registry. +// +// @return {any} `(<|fn|proj|foreign)` Predict function. +mlops.formatUpdate:{[mdlType;model] + $[99h=type model; + $[`update in key model; + model[`update]; + logging.error"model does not come with update function"]; + mdlType~`sklearn; + $[`partial_fit in .ml.csym model[`:__dir__][]`; + model[`:partial_fit]; + logging.error"No update function available for sklearn model"]; + logging.error"Update functionality not available for requested model" + ] + } + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Wrap models retrieved such that they all have the same format regardless of +// from where they originate, the data passed to the model will also be transformed +// to the appropriate format +// +// @param mdlType {symbol} Form of model being used ```(`q/`sklearn/`keras)```. This +// defines how the model gets interpreted in the case it is Python code +// in particular. +// @param model {any} `(<|dict|fn|proj|foreign)` Model retrieved from the registry. +// @param axis {boolean} Data in a 'long' or 'wide' format (`0b/1b`) +// +// @return {any} `(<|fn|proj|foreign)` The update function wrapped with a transformation +// function. +mlops.wrapUpdate:{[mdlType;model;axis;supervised] + model:mlops.formatUpdate[mdlType;model]; + transform:mlops.transform[;axis;mdlType]; + $[supervised; + model . {(x y;z)}[transform]::; + model transform::] + } + +// @kind function +// @category main +// @subcategory get +// +// @overview +// Load the model registry at the user specified location into process. +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param config {dict} Any additional configuration needed for +// retrieving the modelStore +// +// @return {table} Most recent version of the modelStore +registry.get.modelStore:{[folderPath;config] + config:registry.util.check.config[folderPath;config]; + if[not`local~storage:config`storage;storage:`cloud]; + $[storage~`local; + [modelStorePath:registry.util.check.registry[config]`modelStorePath; + load modelStorePath; + ?[modelStorePath;();0b;()] + ]; + [modelStore:get hsym`$config[`folderPath],"/KX_ML_REGISTRY/modelStore"; + key hsym` sv `$#[3;("/") vs ":",config`folderPath],"_"; + modelStore + ] + ] + } diff --git a/ml/registry/q/main/init.q b/ml/registry/q/main/init.q new file mode 100644 index 0000000..bb83a73 --- /dev/null +++ b/ml/registry/q/main/init.q @@ -0,0 +1,16 @@ +// init.q - Initialise the main q functionality for the model registry +// Copyright (c) 2021 Kx Systems Inc + +\d .ml + +if[not @[get;".ml.registry.q.main.init";0b]; + loadfile`:registry/q/main/new.q; + loadfile`:registry/q/main/log.q; + loadfile`:registry/q/main/set.q; + loadfile`:registry/q/main/delete.q; + loadfile`:registry/q/main/get.q; + loadfile`:registry/q/main/update.q; + loadfile`:registry/q/main/query.q + ] + +registry.q.main.init:1b diff --git a/ml/registry/q/main/log.q b/ml/registry/q/main/log.q new file mode 100644 index 0000000..aeccc77 --- /dev/null +++ b/ml/registry/q/main/log.q @@ -0,0 +1,66 @@ +// log.q - Main callable functions for logging information to the +// model registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Log information to the registry +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory log +// +// @overview +// Log metric values for a model +// +// @todo +// Add type checking for modelName/experimentName/version +// Improve function efficiency when dealing with cloud vendors presently this is limited +// by retrieval of registry and republish. +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param metricName {symbol|string} The name of the metric to be persisted +// in the case when this is a string, it is converted to a symbol +// @param metricValue {float} The value of the metric to be persisted +// +// @return {null} +registry.log.metric:{[folderPath;experimentName;modelName;version;metricName;metricValue] + metricName: $[10h=abs[type metricName]; `$; ]metricName; + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config] + ] + ]; + logParams:(storage;experimentName;modelName;version;config;metricName;metricValue); + .[registry.util.set.metric; + logParams; + {[x;y;z] + $[`local~x;;registry.util.delete.folder]y; + logging.error z + }[storage;config`folderPath] + ] + } diff --git a/ml/registry/q/main/new.q b/ml/registry/q/main/new.q new file mode 100644 index 0000000..8cddf12 --- /dev/null +++ b/ml/registry/q/main/new.q @@ -0,0 +1,71 @@ +// new.q - Functionality for generation of new elements of the ML registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// This functionality is intended to provide the ability to generate new +// registries and experiments within these registries. +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory new +// +// @overview +// Generates a new model registry at a user specified location on-prem +// or within a supported cloud providers storage solution +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param config {dict|null} Any additional configuration needed for +// initialising the registry +// +// @return {dict} Updated config dictionary containing relevant +// registry paths +registry.new.registry:{[folderPath;config] + config:registry.util.check.config[folderPath;config]; + if[not`local~storage:config`storage;storage:`cloud]; + registry[storage;`new;`registry]config + } + +// @kind function +// @category main +// @subcategory new +// +// @overview +// Generates a new named experiment within the specified registry without +// adding a model on-prem or within a supported cloud providers storage solution +// +// @todo +// It should be possible via configuration to add descriptive information +// about an experiment. +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string} The name of the experiment to be located +// under the namedExperiments folder which can be populated by new models +// associated with the experiment +// @param config {dict|null} Any additional configuration needed for +// initialising the experiment +// +// @return {dict} Updated config dictionary containing relevant +// registry paths +registry.new.experiment:{[folderPath;experimentName;config] + config:registry.util.check.config[folderPath;config]; + if[not`local~storage:config`storage;storage:`cloud]; + experimentName:registry.util.check.experiment experimentName; + registry[storage;`new;`experiment][experimentName;config] + } diff --git a/ml/registry/q/main/query.q b/ml/registry/q/main/query.q new file mode 100644 index 0000000..bc78f67 --- /dev/null +++ b/ml/registry/q/main/query.q @@ -0,0 +1,49 @@ +// query.q - Main callable functions for querying the modelStore +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Querying the modelStore table. Currently, the below features can +// be referenced by users to query the modelStore table: +// 1. registrationTime +// 2. experimentName +// 3. modelName +// 4. modelType +// 5. version +// 6. uniqueID +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory query +// +// @overview +// Query the modelStore +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param config {dict} Any additional configuration needed for +// retrieving the modelStore. Can also be empty dictionary `()!()`. +// +// @return {table} Most recent version of the modelStore +registry.query.modelStore:{[folderPath;config] + if[config~(::);config:()!()]; + // Retrieve entire modelStore + modelStore:registry.get.modelStore[folderPath;config]; + // If no user-defined config return entire modelStore + k:`modelName`experimentName`modelType`version`registrationTime`uniqueID; + if[not any k in key config;:modelStore]; + // Generate where clause and query modelStore + keys2check:(`modelName`experimentName`modelType;enlist`version;`registrationTime`uniqueID); + whereClause:registry.util.query.checkKey[config]/[();keys2check;(like;{all each x=\:y};=)]; + ?[modelStore;whereClause;0b;()] + } diff --git a/ml/registry/q/main/set.q b/ml/registry/q/main/set.q new file mode 100644 index 0000000..f183eea --- /dev/null +++ b/ml/registry/q/main/set.q @@ -0,0 +1,280 @@ +// set.q - Main callable functions for adding information to the model registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Setting items within the registry including +// 1. Models: +// - q (functions/projections/appropriate dictionaries) +// - Python (python functions + sklearn/keras specific functionality) +// 2. Configuration +// 3. Model information table +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Add a q object, Python function, Keras model or sklearn model +// to the registry so that it can be retrieved and applied to new data. +// In the current iteration there is an assumption of complete +// independence for the q functions/files i.e. q function/workflows +// explicitly don't use Python to make it easier to store and generate +// reintroduce models +// +// @todo +// Improve the configuration information that is being persisted +// presently this contains all information within the config folder +// however this is not particularly informative and may be confusing +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} Name of experiment model belongs to +// @param model {any} `(<|dict|fn|proj)` Model to be saved to the registry. +// @param modelName {string} The name to be associated with the model +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras"|"python" +// @param config {dict} Any additional configuration needed for +// setting the model +// +// @return {null} +registry.set.model:{[folderPath;experimentName;model;modelName;modelType;config] + config:registry.util.check.config[folderPath;config]; + if[not`local~storage:config`storage;storage:`cloud]; + experimentName:$[(any experimentName ~/: (::;""))|10h<>abs type experimentName; + "undefined"; + experimentName + ]; + c:registry[storage;`set;`model][experimentName;model;modelName;modelType;config]; + first c`uniqueID + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Add a q object to the registry. This should be a q object in the +// current process which is either a function/projection/dictionary +// containing a predict key +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(dict|fn|proj)` Model to be saved to the registry. +// @param config {dict} Information relating to the model that is +// to be saved, this includes version, experiment and model names +// +// @return {null} +registry.set.object:{[typ;registryPath;model;config] + toSet:$[type[model]in 10 11 -11h;"File";"Model"]; + registry.util.set[`$typ,toSet][registryPath;model;config] + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Set the configuration associated with a specified model version such +// that all relevant information needed to redeploy the model is present +// with a packaged model +// +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.set.modelConfig:{[model;modelType;config] + safeWrite:{[config;path] + if[not count key hsym `$config[`versionPath],"/config/",path,".json"; + registry.util.set.json[config;`config;path;enlist config] + ]}; + $[99h=type model; + $[not (("q"~modelType)&((`predict in key model)|(`modelInfo in key model))); + {[safeWrite;config;sym;model] + safeWrite[config;string[sym],"/modelInfo"] + }[safeWrite;config]'[key model;value model]; + safeWrite[config;"modelInfo"]]; + safeWrite[config;"modelInfo"] + ] + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Set the configuration associated with monitoring a specified model version +// such that all relevant information needed to monitor the model is present +// with a packaged model +// +// @param model {any} `(<|dict|fn|proj)` Model to be monitored. +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras" +// @param data {table} Historical data to understand model behaviour +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.set.monitorConfig:{[model;modelType;data;config] + func : {[sym;model;modelType;data;config] + if[not 98h~type data;:(::)]; + $[sym~(::); + newConfig:.j.k raze read0 hsym `$config[`versionPath],"/config/modelInfo.json"; + newConfig:.j.k raze read0 hsym `$config[`versionPath],"/config/",string[sym],"/modelInfo.json" + ]; + newConfig[`monitoring;`schema;`values]:registry.util.create.schema data; + newConfig[`monitoring;`schema;`monitor]:1b; + newConfig[`monitoring;`nulls;`values]:registry.util.create.null data; + newConfig[`monitoring;`nulls;`monitor]:1b; + newConfig[`monitoring;`infinity;`values]:registry.util.create.inf data; + newConfig[`monitoring;`infinity;`monitor]:1b; + newConfig[`monitoring;`latency;`values]:registry.util.create.latency[model;modelType;data]; + newConfig[`monitoring;`latency;`monitor]:1b; + newConfig[`monitoring;`csi;`values]:registry.util.create.csi data; + newConfig[`monitoring;`csi;`monitor]:1b; + newConfig[`monitoring;`psi;`values]:registry.util.create.psi[model;modelType;data]; + newConfig[`monitoring;`psi;`monitor]:1b; + params:`maxDepth`indent!(10;" "); + $[sym~(::); + (hsym `$config[`versionPath],"/config/modelInfo.json") 0: enlist .j.j newConfig; + (hsym `$config[`versionPath],"/config/",string[sym],"/modelInfo.json") 0: enlist .j.j newConfig] + }[;;modelType;;config]; + $[all 99h=(type[model];type[data]); + [k:key[model] inter key[data];func'[k;model k;data k]]; + not 99h=type[model]; + func[::;model;data]; + '"data to fit monitoring statistics is not partitioned on model key" + ] + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Set the configuration associated with supervised monitoring +// +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.set.superviseConfig:{[model;config] + func:{[sym;model;config] + $[sym~(::); + newConfig:.j.k raze read0 hsym `$config[`versionPath],"/config/modelInfo.json"; + newConfig:.j.k raze read0 hsym `$config[`versionPath],"/config/",string[sym],"/modelInfo.json" + ]; + newConfig[`monitoring;`supervised;`values]:config `supervise; + newConfig[`monitoring;`supervised;`monitor]:1b; + params:`maxDepth`indent!(10;" "); + $[sym~(::); + (hsym `$config[`versionPath],"/config/modelInfo.json") 0: enlist .j.j newConfig; + (hsym `$config[`versionPath],"/config/",string[sym],"/modelInfo.json") 0: enlist .j.j newConfig + ]; + }[;;config]; + $[99h~type[model]; + func'[key[model];value[model]]; + func[::;model]] + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Upsert relevant data from current run to modelStore +// +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.set.modelStore:{[config] + enlistCols:`experimentName`modelName`modelType`version`description; + regularCols:`registrationTime`uniqueID!config`registrationTime`uniqueID; + experimentName:config`experimentName; + experimentName:$[0h=type experimentName;;enlist]experimentName; + modelName:enlist config`modelName; + modelType:config`modelType; + modelType:enlist$[-10h=type modelType;enlist;]modelType; + description:config`description; + if[0=count description;description:""]; + description:enlist$[-10h=type description;enlist;]description; + version:enlist config`version; + info:regularCols,enlistCols! + (experimentName;modelName;modelType;version;description); + // check if model already exists + whereClause:enlist (&;(&;(~\:;`version;config[`version]);(~\:;`modelName;config[`modelName])); + (~\:;`experimentName;config[`experimentName])); + columns:enlist `uniqueID; + if[not count ?[config[`modelStorePath];whereClause;0b;columns!columns]`uniqueID; + config[`modelStorePath]upsert flip info + ]; + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Save parameter information for a model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param paramName {string|symbol} The name of the parameter to be saved +// @param params {dict|table|string} The parameters to save to file +// +// @return {null} +registry.set.parameters:{[folderPath;experimentName;modelName;version;paramName;params] + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + paramName:$[-11h=type paramName; + string paramName; + 10h=type paramName; + paramName; + logging.error"ParamName must be of type string or symbol" + ]; + setParams:(experimentName;modelName;version;paramName;params;config); + registry[storage;`set;`parameters]. setParams + } + +// @kind function +// @category main +// @subcategory set +// +// @overview +// Upsert relevant data from current run to metric table +// +// @param metricName {string} The name of the metric to be persisted +// @param metricValue {float} The value of the metric to be persistd +// @param metricPath {string} The path to the metric table +// +// @return {null} +registry.set.modelMetric:{[metricName;metricValue;metricPath] + enlistCols:`timestamp`metricName`metricValue; + metricDict:enlistCols!(.z.P;metricName;metricValue); + metricPath:hsym`$metricPath,"metric"; + metricPath upsert metricDict; + } diff --git a/ml/registry/q/main/update.q b/ml/registry/q/main/update.q new file mode 100644 index 0000000..09f4684 --- /dev/null +++ b/ml/registry/q/main/update.q @@ -0,0 +1,354 @@ +// update.q - Main callable functions for retrospectively adding information +// to the model registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Update information within the registry +// +// @category Model-Registry +// @subcategory Functionality +// +// @end + +\d .ml + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the config of a model that's already saved +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param model {any} `(<|dict|fn|proj)` The model to be saved to the registry. +// @param modelName {string} The name to be associated with the model +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras"|"python" +// @param config {dict} Any additional configuration needed for +// setting the model +// +// @return {null} +registry.update.config:{[folderPath;experimentName;modelName;version;config] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;config]; + modelType:first config`modelType; + config:registry.config.model,config; + modelPath:registry.util.path.modelFolder[config`registryPath;config;`model]; + model:registry.get[`$modelType]modelPath; + registry.util.set.requirements config; + if[`data in key config; + registry.set.monitorConfig[model;modelType;config`data;config] + ]; + if[`supervise in key config; + registry.set.superviseConfig[config] + ]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the requirement details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param requirements {string[][];hsym;boolean} The location of a saved +// requirements file, list of user specified requirements or a boolean +// indicating if the virtual environment of a user is to be 'frozen' +// +// @return {null} +registry.update.requirements:{[folderPath;experimentName;modelName;version;requirements] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + config[`requirements]:requirements; + registry.util.set.requirements config; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the latency details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param model {fn} The model on which the latency is to be evaluated +// @param data {table} Data on which to evaluate the model +// +// @return {null} +registry.update.latency:{[folderPath;experimentName;modelName;version;model;data] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + mlops.update.latency[fpath;model;data]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the null replacement details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param data {table} Data on which to determine the null replacement +// +// @return {null} +registry.update.nulls:{[folderPath;experimentName;modelName;version;data] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + mlops.update.nulls[fpath;data]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the infinity replacement details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param data {table} Data on which to determine the infinity replacement +// +// @return {null} +registry.update.infinity:{[folderPath;experimentName;modelName;version;data] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + mlops.update.infinity[fpath;data]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the csi details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param data {table} Data on which to determine historical distribution of the +// features +// +// @return {null} +registry.update.csi:{[folderPath;experimentName;modelName;version;data] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + .ml.mlops.update.csi[fpath;data]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the psi details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param model {fn} The model serving the predictions +// @param data {table} Data on which to determine historical distribution of the +// predictions +// +// @return {null} +registry.update.psi:{[folderPath;experimentName;modelName;version;model;data] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + mlops.update.psi[fpath;model;data]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the type details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param format {string} Type of the given model +// +// @return {null} +registry.update.type:{[folderPath;experimentName;modelName;version;format] + config:registry.util.update.checkPrep + [folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + mlops.update.type[fpath;format]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the supervised metrics of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param metrics {string[]} Supervised metrics to monitor +// +// @return {null} +registry.update.supervise:{[folderPath;experimentName;modelName;version;metrics] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + .ml.mlops.update.supervise[fpath;metrics]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } + +// @kind function +// @category main +// @subcategory update +// +// @overview +// Update the schema details of a saved model +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param data {table} The data which provides the new schema +// +// @return {null} +registry.update.schema:{[folderPath;experimentName;modelName;version;data] + config:registry.util.update.checkPrep[folderPath;experimentName;modelName;version;()!()]; + fpath:hsym `$config[`versionPath],"/config/modelInfo.json"; + mlops.update.schema[fpath;data]; + if[`local<>config`storage;registry.cloud.update.publish config]; + } diff --git a/ml/registry/q/main/utils/check.q b/ml/registry/q/main/utils/check.q new file mode 100644 index 0000000..894b5af --- /dev/null +++ b/ml/registry/q/main/utils/check.q @@ -0,0 +1,192 @@ +// check.q - Utilities relating to checking of suitability of registry items +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Check that the information provided for adding items to the registry is +// suitable, this includes but is not limited to checking if the model name +// provided already exists, that the configuration is appropriately typed etc. +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Correct syntax for path dependent on OS +// +// @param path {string} A path name +// +// @return {string} Path suitable for OS +registry.util.check.osPath:{[path] + $[.z.o like"w*";{@[x;where"/"=x;:;"\\"]};]path + }; + +// @private +// +// @overview +// Check to ensure that the folder path for the registry is appropriately +// typed +// +// @param folderPath {string|null} A folder path indicating the location the +// registry is to be located or generic null to place in the current +// directory +// +// @return {string} type checked folderPath +registry.util.check.folderPath:{[folderPath] + if[not((::)~folderPath)|10h=type folderPath; + logging.error"Folder path must be a string or ::" + ]; + $[(::)~folderPath;enlist".";folderPath] + } + +// @private +// +// @overview +// Check to ensure that the experiment name provided is suitable and return +// an appropriate surrogate in the case the model name is undefined +// +// @param experimentName {string} Name of the experiment to be saved +// +// @return {string} The name of the experiment +registry.util.check.experiment:{[experimentName] + $[""~experimentName; + "undefined"; + $[10h<>type experimentName; + logging.error"'experimentName' must be a string"; + experimentName + ] + ] + } + +// @private +// +// @overview +// Check that the model type that the user is providing to save the model +// against is within the list of approved types +// +// @param config {dict} Configuration provided by the user to +// customize the experiment +// +// @return {null} +registry.util.check.modelType:{[config] + modelType:config`modelType; + approvedTypes:("sklearn";"xgboost";"q";"keras";"python";"torch";"pyspark"); + if[10h<>abs type[modelType]; + logging.error"'modelType' must be a string" + ]; + if[not any modelType~/:approvedTypes; + logging.error"'",modelType,"' not in approved types for KX model registry" + ]; + } + +// @private +// +// @overview +// Check if the registry which is being manipulated exists +// +// @param config {dict|null} Any additional configuration needed for +// initialising the registry +// +// @return {dict} Updated config dictionary containing registry path +registry.util.check.registry:{[config] + folderPath:config`folderPath; + registryPath:folderPath,"/KX_ML_REGISTRY"; + config:$[()~key hsym`$registryPath; + [logging.info"Registry does not exist at: '",registryPath, + "'. Creating registry in that location."; + registry.new.registry[folderPath;config] + ]; + [modelStorePath:hsym`$registryPath,"/modelStore"; + paths:`registryPath`modelStorePath!(registryPath;modelStorePath); + config,paths + ] + ]; + config + } + +// @private +// +// @overview +// Check that a list of files that are attempting to be added to the +// registry exist and that they are either '*.q', '*.p' and '*.py' files +// +// @param files {symbol|symbol[]} The absolute/relative path to a file or +// list of files that are to be added to the registry associated with a +// model. These must be '*.p', '*.q' or '*.py' +// +// @return {symbol|symbol[]} All files which could be added to the registry +registry.util.check.code:{[files] + fileExists:{x where {x~key x}each x}$[-11h=type files;enlist;]hsym files; + // TO-DO + // - Add print to indicate what files couldnt be added + fileType:fileExists where any fileExists like/:("*.q";"*.p";"*.py"); + // TO-DO + // - Add print to indicate what files didn't conform to supported types + fileType + } + +// @private +// +// @overview +// Check user provided config has correct format +// +// @param folderPath {dict|string|null} Registry location, can be: +// 1. A dictionary containing the vendor and location as a string, e.g. +// ```enlist[`local]!enlist"myReg"``` or +// ```enlist[`aws]!enlist"s3://ml-reg-test"``` etc; +// 2. A string indicating the local path; +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON. +// @param config {dict} Configuration provided by the user to +// customize the pipeline +// +// @returns {dict} Returns config in correct format +registry.util.check.config:{[folderPath;config] + config:$[any[config~/:(();()!())]|101h=type config; + ()!(); + type[config]~99h; + config; + logging.error"config should be null or prepopulated dictionary" + ]; + loc:$[10h=abs type folderPath; + $[like[(),folderPath;"s3://*"]; + enlist[`aws]!; + like[(),folderPath;"ms://*"]; + enlist[`azure]!; + like[(),folderPath;"gs://*"]; + enlist[`gcp]!; + enlist[`local]! + ]enlist folderPath; + 99h=type folderPath; + folderPath; + any folderPath~/:((::);()); + registry.location; + logging.error"Unsupported folderPath provided" + ]; + locInfo:`storage`folderPath!first@'(key;value)@\:loc; + config,locInfo + } + +// @private +// +// @overview +// Define which form of storage is to be used by the interface +// +// @param cli {dict} Command line arguments as passed to the system on +// initialisation, this defines how the fundamental interactions of +// the interface are expected to operate. +// +// @returns {symbol} The form of storage to which all functions are expected +// to interact +registry.util.check.storage:{[cli] + vendorList:`gcp`aws`azure; + vendors:vendorList in key cli; + if[not any vendors;:`local]; + if[1 should return itself + // if the file exists at the correct location + src:key src; + if[()~src; + logging.error"File expected at '",string[src],"' did not exist" + ]; + if[not(1=count src)&all src like":*"; + logging.error"src must be an individual file not a directory" + ]; + if[(not all(src;dest)like":*") & not all -11h=type each (src;dest); + logging.error"Both src and dest directories must be a hsym like path" + ]; + system sv[" "]enlist["cp"],1_/:string(src;dest) + } + +// @private +// +// @overview +// Copy a directory from one location to another +// +// @todo +// Update this to use the axfs OS agnostic functionality provided by Analyst +// this should ensure that the functionality will operate on Windows/MacOS/Linux +// +// @param src {#hsym} Source destination to be copied. +// @param dest {#hsym} Destination to which to be copied. +// @return {null} +registry.util.copy.dir:{[src;dest] + // Expecting an individual file for copying -> should return itself + // if the file exists at the correct location + if[(not all(src;dest)like":*") & not all -11h=type each (src;dest); + logging.error"Both src and dest directories must be a hsym like path" + ]; + system sv[" "]enlist["cp -r"],1_/:string(src;dest) + } diff --git a/ml/registry/q/main/utils/create.q b/ml/registry/q/main/utils/create.q new file mode 100644 index 0000000..94bb9ae --- /dev/null +++ b/ml/registry/q/main/utils/create.q @@ -0,0 +1,285 @@ +// create.q - Create new objects within the registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Create new objects within the registry +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Create the registry folder within which models will be stored +// +// @todo +// Update for Windows Compliance +// +// @param folderPath {string|null} A folder path indicating the location the +// registry is to be located or generic null to place in the current +// directory +// @param config {dict} Any additional configuration needed for +// initialising the registry (Not presently used but for later use) +// +// @return {dict} Updated config with registryPath added +registry.util.create.registry:{[config] + registryPath:config[`folderPath],"/KX_ML_REGISTRY"; + if[not()~key hsym`$registryPath;logging.error"'",registryPath,"' already exists"]; + system"mkdir ",$[.z.o like"w*";"";"-p "],registry.util.check.osPath registryPath; + config,enlist[`registryPath]!enlist registryPath + } + +// @private +// +// @overview +// Create the splayed table within the registry folder which will be used +// to store information about the models that are present within the registry +// +// @param config {dict} Any additional configuration needed for +// initialising the registry (Not presently used but for later use) +// +// @return {dict} Updated config with modelStorePath added +registry.util.create.modelStore:{[config] + modelStoreKeys:`registrationTime`experimentName`modelName`uniqueID`modelType`version`description; + modelStoreVals:(`timestamp$();();();`guid$();();();()); + modelStoreSchema:flip modelStoreKeys!modelStoreVals; + modelStorePath:hsym`$config[`registryPath],"/modelStore"; + modelStorePath set modelStoreSchema; + config,enlist[`modelStorePath]!enlist modelStorePath + } + +// @private +// +// @overview +// Create the base folder structure used for storage of models associated +// with an experiment and models which have been generated independently +// +// @param config {dict} Any additional configuration needed for +// initialising the registry (Not presently used but for later use) +// +// @return {null} +registry.util.create.experimentFolders:{[config] + folders:("/namedExperiments";"/unnamedExperiments"); + experimentPaths:config[`registryPath],/:folders; + {system"mkdir ",$[.z.o like"w*";"";"-p "],registry.util.check.osPath x + }each experimentPaths; + // The following is required to upload the folders to cloud vendors + hiddenFiles:hsym`$experimentPaths,\:"/.hidden"; + {x 0:enlist()}each hiddenFiles; + } + +// @private +// +// @overview +// Add a folder associated to a named experiment provided +// +// @param experimentName {string} Name of the experiment to be saved +// @param config {dict|null} Any additional configuration needed for +// initialising the experiment +// +// @return {dict} Updated config dictionary containing experiment path +registry.util.create.experiment:{[experimentName;config] + if[experimentName~"undefined";logging.error"experimentName must be defined"]; + experimentString:config[`registryPath],"/namedExperiments/",experimentName; + experimentPath:hsym`$experimentString; + if[()~key experimentPath; + system"mkdir ",$[.z.o like"w*";"";"-p "],registry.util.check.osPath experimentString + ]; + // The following is requred to upload the folders to cloud vendors + hiddenFiles:hsym`$experimentString,"/.hidden"; + {x 0:enlist()}each hiddenFiles; + config,`experimentPath`experimentName!(experimentString;experimentName) + } + +// @private +// +// @overview +// Add all the folders associated with a specific model to the +// correct location on disk +// +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// +// @return {dict} Updated config dictionary containing relevant paths +registry.util.create.modelFolders:{[model;modelType;config] + folders:$[99h=type model; + $[not (("q"~modelType)&((`predict in key[model])|(`modelInfo in key model))); + ("params";"metrics";"code"),raze enlist["model/"],/:\:string[key[model]]; + ("model";"params";"metrics";"code")]; + ("model";"params";"metrics";"code") + ]; + newFolders:"/",/:folders; + modelFolder:config[`experimentPath],"/",config`modelName; + if[(1;0)~config`version;system"mkdir ",$[.z.o like"w*";"";"-p "], + registry.util.check.osPath modelFolder]; + versionFolder:modelFolder,"/",/registry.util.strVersion config`version; + newFolders:versionFolder,/:newFolders; + paths:enlist[versionFolder],newFolders; + {system"mkdir ",$[.z.o like"w*";"";"-p "], registry.util.check.osPath x + }each paths; + config,(`versionPath,`$folders,\:"Path")!paths + } + +// @private +// +// @overview +// Generate the configuration information which is to be saved +// with the model +// +// @param config {dict} Configuration information provided by the user +// +// @return {dict} A modified version of the run information +// dictionary with information formatted in a structure that is more sensible +// for persistence +registry.util.create.config:{[config] + newConfig:.ml.registry.config.default; + newConfig[`registry;`description]:config`description; + newConfig[`registry;`experimentInformation;`experimentName]:config`experimentName; + modelInfo:`modelName`version`requirements`registrationTime`uniqueID; + newConfig:{y[`registry;`modelInformation;z]:x z;y}[config]/[newConfig;modelInfo]; + newConfig[`model;`type]:config[`modelType]; + newConfig[`model;`axis]:config[`axis]; + newConfig + } + +// @private +// +// @overview +// Generate latency configuration information which is to be saved +// with the model +// +// @param model {any} `(dict|fn|proj)` Model retrieved from registry. +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras"|"python" +// @param data {table} Historical data for evaluating behaviour of model +// @param config {dict} Configuration information provided by the user +// +// @return {dict} A dictionary containing information on the average +// time to serve a prediction together with the standard deviation +registry.util.create.latency:{[model;modelType;data] + function:{[model;modelType;data] + // get predict function + predictor:.ml.mlops.wrap[`$modelType;model;1b]; + // Latency information + L:{system"sleep 0.0005";zz:enlist value x;a:.z.p;y zz;(1e-9)*.z.p-a}[;predictor] each 30#data; + `avg`std!(avg L;dev L)}[model;modelType]; + @[function;data;{show "unable to generate latency config due to error: ",x, + " latency monitoring cannot be supported"}] + } + +// @private +// +// @overview +// Generate schema configuration information which is to be saved +// with the model +// +// @param data {table} Historical data for evaluating behaviour of model +// @param config {dict} Configuration information provided by the user +// +// @return {dict} A dictionary containing information on the schema +// of the data provided to the prediction service +registry.util.create.schema:{[data] + // Schema information + (!). (select c,t from (meta data))`c`t + } + +// @private +// +// @overview +// Generate nulls configuration information which is to be saved +// with the model +// +// @param data {table} Historical data for evaluating behaviour of model +// @param config {dict} Configuration information provided by the user +// +// @return {dict} A dictionary contianing the values for repalcement of +// null values. +registry.util.create.null:{[data] + // Null information + function:{med each flip mlops.infReplace x}; + @[function;data;{show "unable to generate null config due to error: ",x, + " null replacement cannot be supported"}] + } + +// @private +// +// @overview +// Generate infs configuration information which is to be saved +// with the model +// +// @param data {table} Historical data for evaluating behaviour of model +// @param config {dict} Configuration information provided by the user +// +// @return {dict} A dictionary contianing the values for replacement of +// infinite values +registry.util.create.inf:{[data] + // Inf information + function:{(`negInfReplace`posInfReplace)!(min;max)@\:mlops.infReplace x}; + @[function;data;{show "unable to generate inf config due to error: ",x, + " inf replacement cannot be supported"}] + } + +// @private +// +// @overview +// Generate csi configuration information which is to be saved +// with the model +// +// @param data {table} Historical data for evaluating behaviour of model +// +// @return {dict} A dictionary contianing the values for replacement of +// infinite values +registry.util.create.csi:{[data] + bins:@["j"$;(count data)®istry.config.commandLine`bins; + {logging.error"Cannot convert 'bins' to an integer"}]; + @[{mlops.create.binExpected[;y] each flip x}[;bins];data;{show "unable ", + "to generate csi config due to error: ",x," csi monitoring cannot be ", + "supported"}] + } + +// @private +// +// @overview +// Generate psi configuration information which is to be saved +// with the model +// +// @param model {any} `(dict|fn|proj)` Model retrieved from registry. +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras"|"python" +// @param data {table} Historical data for evaluating behaviour of model +// +// @return {dict} A dictionary containing information on the average +// time to serve a prediction together with the standard deviation +registry.util.create.psi:{[model;modelType;data] + bins:@["j"$;(count data)®istry.config.commandLine`bins; + {logging.error"Cannot convert 'bins' to an integer"}]; + function:{[bins;model;modelType;data] + // get predict function + predictor:.ml.mlops.wrap[`$modelType;model;0b]; + preds:predictor data; + mlops.create.binExpected[raze preds;bins] + }[bins;model;modelType]; + @[function;data;{show "unable to generate psi config due to error: ",x, + " psi monitoring cannot be supported"}] + } + +// @private +// +// @overview +// Create a table within the registry folder which will be used +// to store information about the metrics of the model +// +// @param metricPath {string} The path to the metrics file +// +// @return {null} +registry.util.create.modelMetric:{[metricPath] + modelMetricKeys:`timestamp`metricName`metricValue; + modelMetricVals:(enlist 0Np;`; ::); + modelMetricSchema:flip modelMetricKeys!modelMetricVals; + modelMetricPath:hsym`$metricPath,"metric"; + modelMetricPath set modelMetricSchema; + } diff --git a/ml/registry/q/main/utils/delete.q b/ml/registry/q/main/utils/delete.q new file mode 100644 index 0000000..a8ee6b6 --- /dev/null +++ b/ml/registry/q/main/utils/delete.q @@ -0,0 +1,89 @@ +// delete.q - Delete items from the model registry and folder structure +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Delete items from the registry +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Delete all files contained within a specified directory recursively +// +// @param folderPath {symbol} Folder to be deleted +// +// @return {null} +registry.util.delete.folder:{[folderPath] + ty:type folderPath; + folderPath:hsym$[10h=ty;`$;-11h=ty;;logging.error"type"]folderPath; + orderedPaths:(),{$[11h=type d:key x;raze x,.z.s each` sv/:x,/:d;d]}folderPath; + hdel each desc orderedPaths; + } + +// @private +// +// @overview +// Delete all folders relating to an experiment or to 1/all versions of a model +// +// @param config {dict} Configuration information provided by the user +// @param objectType {symbol} ``` `experiment `allModels or `modelVersion``` +// +// @return {null} +registry.util.delete.object:{[config;objectType] + // Required variables + folderPath:config`folderPath; + experimentName:config`experimentName; + modelName:config`modelName; + version:config`version; + // Generate modelStore and object paths based on objectType + paths:registry.util.getObjectPaths + [folderPath;objectType;experimentName;modelName;version;config]; + modelStorePath:paths`modelStorePath; + checkPath:objectPath:paths`objectPath; + objectString:1_string objectPath; + // Check if object exists before attempting to delete + if["*"~last objectString;checkPath:hsym`$-1_objectString]; + if[emptyPath:()~key checkPath; + logging.info"No artifacts created for ",objectString,". Unable to delete." + ]; + // Where clause relative to each object type + objectCondition:registry.util.delete.where + [experimentName;modelName;version;objectType]; + whereClause:enlist(not;objectCondition); + // Update the modelStore with remaining models + newModels:?[modelStorePath;whereClause;0b;()]; + modelStorePath set newModels; + // Delete relevant folders + if[not emptyPath; + logging.info"Removing all contents of ",objectString; + registry.util.delete.folder objectPath + ]; + // Load new modelStore + load modelStorePath; + } + +// @private +// +// @overview +// Functional where clause required to delete objects from the modelStore +// +// @param experimentName {string} Name of experiment +// @param modelName {string} Name of model +// @param version {long[]} Model version number (major;minor) +// @param objectType {symbol} ``` `experiment `allModels or `modelVersion``` +// +// @return {(fn;symbol;symbol)} Where clause in functional form +registry.util.delete.where:{[experimentName;modelName;version;objectType] + $[objectType~`allModels; + (like;`modelName;modelName); + objectType~`modelVersion; + (&;(like;`modelName;modelName);({{x~y}[y]'[x]};`version;version)); + (like;`experimentName;experimentName) + ] + } diff --git a/ml/registry/q/main/utils/get.q b/ml/registry/q/main/utils/get.q new file mode 100644 index 0000000..bbdbdac --- /dev/null +++ b/ml/registry/q/main/utils/get.q @@ -0,0 +1,243 @@ +// get.q - Utilties relating to retrieval of objects from the registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities for object retrieval within the registry +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Retrieve a model from the registry, this is a wrapped version of +// this functionality to facilitate protected execution in the case +// that issues arise with retrieval and loading of a model from +// cloud providers or an on-prem location +// +// @param storage {symbol} The form of storage from which the model is +// being retrieved +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param config {dict} Configuration containing information surrounding +// the location of the registry and associated files +// @param optionalKey {sym} Optional symbol for loading model +// +// @return {dict} The model and information related to the +// generation of the model +registry.util.get.model:{[storage;experimentName;modelName;version;config;optionalKey] + // Retrieve the model from the store meeting the user specified conditions + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + // Construct the path to model folder containing the model to be retrieved + config,:flip modelDetails; + configPath:registry.util.path.modelFolder[config`registryPath;config;::]; + modelPath:registry.util.path.modelFolder[config`registryPath;config;`model]; + codePath:registry.util.path.modelFolder[config`registryPath;config;`code]; + registry.util.load.code codePath; + func:{[k;configPath;modelDetails;modelPath;config;storage] + $[k~(::); + modelConfig:configPath,"/config/modelInfo.json"; + modelConfig:configPath,"/config/",string[k],"/modelInfo.json" + ]; + modelInfo:.j.k raze read0 hsym`$modelConfig; + // Retrieve the model based on the form of saved model + modelType:first`$modelDetails`modelType; + modelPath,:$[k~(::);"";string[k],"/"],$[modelType~`q; + "mdl"; + modelType~`keras; + "mdl.h5"; + modelType~`torch; + "mdl.pt"; + modelType~`pyspark; + "mdl.model"; + "mdl.pkl" + ]; + model:mlops.get[modelType] $[modelType in `q;modelPath;pydstr modelPath]; + if[registry.config.commandLine`deployType; + axis:modelInfo[`modelInformation;`axis]; + model:mlops.wrap[`python;model;axis]; + ]; + returnInfo:`modelInfo`model!(modelInfo;model); + returnInfo + }[;configPath;modelDetails;modelPath;config;storage]; + if[b:()~key hsym `$configPath,"/config/modelInfo.json"; + k:key hsym `$configPath,"/config"]; + r:$[b;$[optionalKey~(::);k!func'[k];func optionalKey];func[::]]; + if[`local<>storage;registry.util.delete.folder config`folderPath]; + r + } + +// @private +// +// @overview +// Retrieve metrics from the registry, this is a wrapped version of this +// functionality to facilitate protected execution in the case that issues +// arise with retrieval or loading of metrics from cloud providers or +// an on-prem location +// +// @param storage {symbol} The form of storage from which the model is +// being retrieved +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param config {dictionary} Configuration containing information surrounding +// the location of the registry and associated files +// @param param {null|dict|symbol} Search parameters for the retrieval +// of metrics +// +// @return {table} The metric table for a specific model, which may +// potentially be filtered +registry.util.get.metric:{[storage;experimentName;modelName;version;config;param] + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + // Construct the path to model folder containing the model to be retrieved + config,:flip modelDetails; + metricPath:registry.util.path.modelFolder[config`registryPath;config;`metrics]; + metricPath:metricPath,"metric"; + metric:1_get hsym`$metricPath; + returnInfo:registry.util.search.metric[metric;param]; + if[`local<>storage;registry.util.delete.folder config`folderPath]; + returnInfo + } + +// @private +// +// @overview +// Retrieve parameters from the registry, this is a wrapped version of this +// functionality to facilitate protected execution in the case that issues +// arise with retrieval or loading of metrics from cloud providers or +// an on-prem location +// +// @param storage {symbol} The form of storage from which the model is +// being retrieved +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param config {dictionary} Configuration containing information surrounding +// the location of the registry and associated files +// @param paramName {symbol|string} The name of the parameter to retrieve +// +// @return {string|dict|table|float} The value of the parameter associated +// with a named parameter saved for the model. +registry.util.get.params:{[storage;experimentName;modelName;version;config;paramName] + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + // Construct the path to model folder containing the model to be retrieved + config,:flip modelDetails; + paramPath:registry.util.path.modelFolder[config`registryPath;config;`params]; + paramName:$[-11h=type paramName; + string paramName; + 10h=type paramName; + paramName; + logging.error"ParamName must be of type string or symbol" + ]; + paramPath,:paramName,".json"; + returnInfo:registry.util.search.params[paramPath]; + if[`local<>storage;registry.util.delete.folder config`folderPath]; + returnInfo + } + +registry.util.get.version:{[storage;experimentName;modelName;version;config;param] + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + config,:flip modelDetails; + rootPath:registry.util.path.modelFolder[config`registryPath;config;::]; + versionInfo:@[read0;hsym `$rootPath,"/.version.info";{'"Version information not found for model"}]; + .j.k raze versionInfo + }; + + +// @private +// +// @overview +// Retrieve a q/python/sklearn/keras model or parameters/metrics related to a +// specific model from the registry. +// +// @todo +// Add type checking for modelName/experimentName/version +// +// @param cli {dict} Command line arguments as passed to the system on +// initialisation, this defines how the fundamental interactions of +// the interface are expected to operate. +// @param folderPath {dict|string|null} Registry location. +// 1. Can be a dictionary containing the vendor and location as a string, e.g.: +// - enlist[`local]!enlist"myReg" +// - enlist[`aws]!enlist"s3://ml-reg-test" +// 2. A string indicating the local path +// 3. A generic null to use the current .ml.registry.location pulled from CLI/JSON +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param param {null|dict|symbol|string} Parameter required for parameter/ +// metric retrieval +// in the case when this is a string, it is converted to a symbol +// +// @return {dict} The model and information related to the +// generation of the model +registry.util.get.object:{[typ;folderPath;experimentName;modelName;version;param] + if[(typ~`metric)&abs[type param] in 10 11h; + param:enlist[`metricName]!enlist $[10h=abs[type param];`$;]param + ]; + config:registry.util.check.config[folderPath;()!()]; + if[not`local~storage:config`storage;storage:`cloud]; + // Locate/retrieve the registry locally or from the cloud + config:$[storage~`local; + registry.local.util.check.registry config; + [checkFunction:registry.cloud.util.check.model; + checkFunction[experimentName;modelName;version;config`folderPath;config] + ] + ]; + getParams:$[(typ~`model)¶m~(::); + (storage;experimentName;modelName;version;config;::); + (storage;experimentName;modelName;version;config;param) + ]; + .[registry.util.get typ; + getParams; + {[x;y;z] + $[`local~x;;registry.util.delete.folder]y; + 'z + }[storage;config`folderPath] + ] + } diff --git a/ml/registry/q/main/utils/init.q b/ml/registry/q/main/utils/init.q new file mode 100644 index 0000000..8296a54 --- /dev/null +++ b/ml/registry/q/main/utils/init.q @@ -0,0 +1,25 @@ +// init.q - Initialise main q utilities for the model registry +// Copyright (c) 2021 Kx Systems Inc +// +// Utilities relating to all basic interactions with the registry + +\d .ml + +// Load all utilties +if[not @[get;".ml.registry.q.main.utils.init";0b]; + loadfile`:registry/q/main/utils/requirements.q; + loadfile`:registry/q/main/utils/check.q; + loadfile`:registry/q/main/utils/create.q; + loadfile`:registry/q/main/utils/copy.q; + loadfile`:registry/q/main/utils/delete.q; + loadfile`:registry/q/main/utils/misc.q; + loadfile`:registry/q/main/utils/path.q; + loadfile`:registry/q/main/utils/search.q; + loadfile`:registry/q/main/utils/set.q; + loadfile`:registry/q/main/utils/update.q; + loadfile`:registry/q/main/utils/load.q; + loadfile`:registry/q/main/utils/get.q; + loadfile`:registry/q/main/utils/query.q + ] + +registry.q.main.utils.init:1b diff --git a/ml/registry/q/main/utils/load.q b/ml/registry/q/main/utils/load.q new file mode 100644 index 0000000..9c159e6 --- /dev/null +++ b/ml/registry/q/main/utils/load.q @@ -0,0 +1,68 @@ +// load.q - Utilties related to loading items into the registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities relating to object loading within the registry +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Load any code with a file extension '*.p','*.py','*.q' +// that has been saved with a model. NB: at the moment there is +// no idea of precedence within this load process so files should +// not be relied on to be loaded in a specific order. +// +// @todo +// Add some level of load ordering to the process +// +// @param codePath {string} The absolute path to the 'code' +// folder containing any source code +// +// @return {null} +registry.util.load.code:{[codePath] + files:key hsym`$codePath; + if[0~count key hsym`$codePath;:(::)]; + qfiles:files where files like "*.q"; + registry.util.load.q[codePath;qfiles]; + pfiles:files where files like "*.p"; + registry.util.load.p[codePath;pfiles]; + pyfiles:files where files like "*.py"; + mlops.load.py[codePath;pyfiles]; + } + +// @private +// +// @overview +// Load code with the file extension '*.q' +// +// @param codePath {string} The absolute path to the 'code' +// folder containing any source code +// @param files {symbol|symbols} q files which should be loadable +// +// @return {null} +registry.util.load.q:{[codePath;files] + sfiles:string files; + {system "l ",x,y}[codePath]each $[10h=type sfiles;enlist;]sfiles + } + +// @private +// +// @overview +// Load code with the file extension '*.p' +// +// @param codePath {string} The absolute path to the 'code' +// folder containing any source code +// @param files {symbol|symbol[]} Python files which should be loadable +// +// @return {null} +registry.util.load.p:{[codePath;files] + pfiles:string files; + {system "l ",x,y}[codePath]each $[10h=type pfiles;enlist;]pfiles; + } diff --git a/ml/registry/q/main/utils/misc.q b/ml/registry/q/main/utils/misc.q new file mode 100644 index 0000000..46924d4 --- /dev/null +++ b/ml/registry/q/main/utils/misc.q @@ -0,0 +1,156 @@ +// misc.q - Miscellaneous utilities for interacting with the registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Miscellaneous utilities for interacting with the registry +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +/ Protected execution of set function. If an error occurs, any created +// folders will be deleted. +// +// @param function {fn} Function to be applied +// @param arguments {list} Arguments to be applied +// @param config {dict} Configuration information provided by the user +// +// @return {null} +registry.util.protect:{[function;arguments;config] + $[`debug in key .Q.opt .z.x; + function . arguments; + .[function;arguments;registry.util.checkDepth[config;]] + ] + } + +// @private +// +// @overview +// Check the depth of the failing model. If first model in experiment, remove +// the entire experiment, otherwise simply remove all folders associated with +// the failing model. +// +// @param config {dict} Configuration information provided by the user +// @param err {string} Error string generated when upserting to table +// +// @return {null} +registry.util.checkDepth:{[config;err] + logging.warn "'",err,"' flagged when adding new model to modelStore."; + // Check if experiment is already in modelStore + modelStoreExperiments:?[config`modelStorePath;();();`experimentName]; + $[any config[`experimentName]in distinct modelStoreExperiments; + // Yes: delete current model version as other models will be + // present within the experiment + registry.delete.model . config`folderPath`experimentName`modelName`version; + // No: delete the entire experiment + registry.delete.experiment . config`folderPath`experimentName + ]; + } + +// @private +// +// @overview +// Generate paths to object and modelStore +// +// @param folderPath {string|null} A folder path indicating the location +// the registry containing the model to be deleted +// or generic to remove registry in the current directory +// @param objectType {symbol} ````experiment `allModels or `modelVersion``` +// @param experimentName {string} Name of experiment +// @param modelName {string} Name of model +// @param modelVersion {long[]} Model version number (major;minor) +// @param config {dict} Configuration information provided by the user +// +// @return {dict} Paths to object and modelStore +registry.util.getObjectPaths:{[folderPath;objectType;experimentName;modelName;modelVersion;config] + paths:registry.util.getRegistryPath[folderPath;config]; + registryPath:paths`registryPath; + modelStorePath:paths`modelStorePath; + if[any experimentName ~/: (::;"");experimentName:"undefined"]; + experimentName:"",experimentName; + experimentPath:$[unnamed:experimentName in("undefined";""); + "/unnamedExperiments"; + "/namedExperiments/",experimentName + ]; + additionalFolders:$[objectType~`allModels; + modelName; + objectType~`modelVersion; + modelName,"/",registry.util.strVersion modelVersion; + unnamed&modelName~""; + string first key hsym`$registryPath,experimentPath; + "" + ]; + objectPath:hsym`$registryPath,experimentPath,"/",additionalFolders; + `objectPath`modelStorePath!(objectPath;modelStorePath) + } + +// @private +// +// @overview +// Generate path to file +// +// @param folderPath {string|null} A folder path indicating the location +// the registry containing the file to be deleted +// or generic to remove registry in the current directory +// @param experimentName {string} Name of experiment +// @param modelName {string} Name of model +// @param modelVersion {long[]} Model version number (major;minor) +// @param localFolder {symbol} Local folder code/metrics/params/config +// @param config {dict} Extra details on file to be located +// +// @return {#hsym} Path to file. +registry.util.getFilePath:{[folderPath;experimentName;modelName;modelVersion;localFolder;config] + cfg:registry.util.check.config[folderPath;()!()]; + registryPath:registry.util.getRegistryPath[folderPath;cfg]`registryPath; + if[any experimentName ~/: (::;"");experimentName:"undefined"]; + experimentName:"",experimentName; + experimentPath:$[unnamed:experimentName in("undefined";""); + "/unnamedExperiments"; + "/namedExperiments/",experimentName + ]; + prefix:registryPath,experimentPath,"/",modelName,"/",registry.util.strVersion[modelVersion]; + $[localFolder~`code; + hsym `$prefix,"/code/",config`codeFile; + localFolder~`metrics; + hsym `$prefix,"/metrics/","metric"; + localFolder~`params; + hsym `$prefix,"/params/",(config`paramFile),".json"; + localFolder~`config; + hsym `$prefix,"/config/",string[config`configType],".json"; + logging.error"No such local folder in model registry"] + } + +// @private +// +// @overview +// Check user specified folder path and generate corresponding regisrty path +// +// @param folderPath {string|null} A folder path indicating the location +// the registry containing the model to be deleted +// or generic to remove registry in the current directory +// @param config {dict} Configuration information provided by the user +// +// @return {string} Path to registry folder +registry.util.getRegistryPath:{[folderPath;config] + registry.util.check.registry[config] + } + +// @private +// +// @overview +// Parse version as a string +// +// @param version {long[]} Version number represented as a duple of +// major and minor version +// +// @return {string} Version number provided as a string +registry.util.strVersion:{[version] + if[0h=type version;version:first version]; + "." sv string each version + } diff --git a/ml/registry/q/main/utils/path.q b/ml/registry/q/main/utils/path.q new file mode 100644 index 0000000..da3392a --- /dev/null +++ b/ml/registry/q/main/utils/path.q @@ -0,0 +1,51 @@ +// path.q - Utilities for generation of registry paths +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities for generation of registry paths +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Generate the path to the model/parameter/metric/version folder based +// on provided registry path and model information +// +// @param registryPath {string} Full/relative path to the model registry +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// @param folderType {symbol|null} Which folder is to be accessed? 'model'/ +// 'params'/'metrics', if '::' then the path to the versioned model is +// returned +// +// @return {string} The full path to the requested folder within a versioned +// model +registry.util.path.modelFolder:{[registryPath;config;folderType] + folder:$[folderType~`model; + "/model/"; + folderType~`params; + "/params/"; + folderType~`metrics; + "/metrics/"; + folderType~`code; + "/code/"; + folderType~(::); + ""; + logging.error"Unsupported folder type" + ]; + experiment:config`experimentName; + expBool:any experiment like "undefined"; + experimentType:$[expBool;"un",;]"namedExperiments/"; + if[not expBool; + experimentType:experimentType,/experiment,"/" + ]; + modelName:raze config`modelName; + modelVersion:"/",/registry.util.strVersion config`version; + registryPath,"/",experimentType,modelName,modelVersion,folder + } diff --git a/ml/registry/q/main/utils/query.q b/ml/registry/q/main/utils/query.q new file mode 100644 index 0000000..9738243 --- /dev/null +++ b/ml/registry/q/main/utils/query.q @@ -0,0 +1,37 @@ +// query.q - Utilities relating to querying the modelStore +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities relating to querying the modelStore +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Check user-defined keys in config and generate the correct format for +// where clauses +// +// @param config {dict} Any additional configuration needed for +// retrieving the modelStore. Can also be an empty dictionary `()!()`. +// @param whereClause {(fn;symbol;any)[]|()} List of whereClauses. Can +// initially be an empty list which will be popultated within the below. +// Individual clauses will contain the function (like/=) to use in +// the where clause, followed by the column name as a symbol and the +// associated value to check. +// @param keys2check {symbol[]} List of config keys to check +// @param function {function} `like/=` to be used in where clause +// +// @return {(fn;symbol;any)[]|()} Updated whereClause +registry.util.query.checkKey:{[config;whereClause;key2check;function] + if[any b:key2check in key config; + key2check@:where b; + whereClause,:{(x;z;y z)}[function;config]each key2check + ]; + whereClause + } diff --git a/ml/registry/q/main/utils/requirements.q b/ml/registry/q/main/utils/requirements.q new file mode 100644 index 0000000..7b00324 --- /dev/null +++ b/ml/registry/q/main/utils/requirements.q @@ -0,0 +1,80 @@ +// requirements.q - Utilities for the addition of requirements with a model +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities for the addition of requirements with a model +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Generate a requirements file using pip freeze and save to the +// model folder, this requires the user to be using a virtual environment +// as allowing ad-hoc pip freeze results in incompatible requirements due +// to on prem files generated over time +// +// @param config {dict} Configuration provided by the user to +// customize the experiment +// +// @return {::} +registry.util.requirements.pipfreeze:{[config] + sys:.p.import`sys; + if[(sys[`:prefix]`)~sys[`:base_prefix]`; + logging.error"Cannot execute a pip freeze when not in a virtualenv" + ]; + destPath:config[`versionPath],"/requirements.txt"; + requirements:system"pip freeze"; + hsym[`$destPath]0:requirements + } + +// @private +// +// @overview +// Generate a copy of a requirements file that a user has pointed to +// to the model folder. There are no checks made on the validity of these +// files other than that they exist, as such it is on the user to point +// to the appropriate file +// +// @param srcPath {string} Full/relative path to the requirements +// file to be copied +// @param config {dict} Configuration provided by the user to +// customize the experiment +// +// @return {null} +registry.util.requirements.copyfile:{[config] + srcPath:hsym config`requirements; + if[not srcPath~key srcPath; + logging.error"Requirements file you are attempting to copy does not exist" + ]; + srcPath:registry.util.check.osPath 1_string srcPath; + destPath:registry.util.check.osPath config[`versionPath],"/requirements.txt"; + copyCommand:$[.z.o like"w*";"copy /b/y";"cp"]; + system sv[" ";(copyCommand;srcPath;destPath)] + } + +// @private +// +// @overview +// Add a user defined list of lists as a requirements file, this includes +// checking that the requirements provided are all strings but does not +// validate that they are valid pip requirements, it is assumed that the +// user will supply compliant values for this +// +// @param config {dict} Configuration provided by the user to +// customize the experiment +// +// @return {null} +registry.util.requirements.list:{[config] + requirements:config`requirements; + if[not all 10h=type each requirements; + logging.error"User provided list of arguments must be a list of strings" + ]; + destPath:config[`versionPath],"/requirements.txt"; + hsym[`$destPath]0:requirements + } diff --git a/ml/registry/q/main/utils/search.q b/ml/registry/q/main/utils/search.q new file mode 100644 index 0000000..0bd4b63 --- /dev/null +++ b/ml/registry/q/main/utils/search.q @@ -0,0 +1,125 @@ +// search.q - Search the model store for specific information +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities for searching the modelStore +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Select the model most closely matching the users request for a model based +// on associated experiment, name of the mode and version of the model. +// +// - If no experiment, model name or version are provided, retrieve the most +// recently added model. +// - If no experiment or version are provided but a model name is, retrieve the +// highest versioned experiment associated with that name +// - If no experiment is provided by a model name and version are, retrieve the +// version of the model requested +// - If no model name or version are provided but an experiment name is, +// retrieve the most recent model added to that experiment +// +// @param experimentName {string|null} The name of the experiment to retrieve +// from +// @param modelName {string|null} The name of the model to retrieve +// @param version {long[]|null} The version of the model to retrieve +// initialising the experiment +// +// @return {table} A table containing the entry matching the user provided +// information +registry.util.search.model:{[experimentName;modelName;version;config] + infoKeys:`experimentName`modelName`modelType`version; + fmax:{xx:x where x[;0]=max x[;0];first xx where xx[;1]=max xx[;1]}; + modelNoVersion:( + (like;`modelName;modelName); + ({{x~y}[x y]'[y]};fmax;`version) + ); + modelVersion:( + (like;`modelName;modelName); + ({y~/:x};`version;version) + ); + whereCond:$[modelName~(::); + enlist(=;`registrationTime;(max;`registrationTime)); + $[version~(::);modelNoVersion;modelVersion] + ]; + whereCond,:$[any experimentName ~/: (::;"");();(like;`experimentName;experimentName)]; + ?[config`modelStorePath;whereCond;0b;infoKeys!infoKeys] + } + +// @private +// +// @overview +// Retrieve and increment the version of the model being saved within the +// registry based on previous versions +// +// @param config {dict} Configuration provided by the user to customize +// the experiment +// +// @return {dict} The updated version number to be used when persisting +// the model +registry.util.search.version:{[config] + if[`version in key config;:config]; + whereClause:( + (like;`experimentName;config`experimentName); + (like;`modelName;config`modelName) + ); + if[(`majorVersion in key config)&config[`major]; + logging.error"cant select majorVersion while incrementing version" + ]; + if[`majorVersion in key config; + mV:floor config`majorVersion; + whereClause,:(=;(`version;::;0);mV) + ]; + fmax:{xx:x where x[;0]=max x[;0];first xx where xx[;1]=max xx[;1]}; + selectClause:(fmax;`version); + currentVersion:?[config`modelStorePath;whereClause;();selectClause]; + if[(not count currentVersion)&`majorVersion in key config; + logging.error"cant select majorVersion if no models present" + ]; + if[not count currentVersion;:config,enlist[`version]!enlist (1;0)]; + if[bool:config[`major]; + :config,enlist[`version]!enlist(currentVersion[0]+1;0) + ]; + config,enlist[`version]!enlist currentVersion+(0;1) + } + +// @private +// +// @overview +// Search for a particular parameter in the metrics table +// +// @todo +// Add additional search parameters other than just metric name? +// +// @param metricTab {table} The table of metric information +// @param param {dict} Search parameters for config table +// +// @return {table} Metric table +registry.util.search.metric:{[metricTab;param] + if[not 99=type param;:metricTab]; + if[`metricName in key param; + metricName:enlist param`metricName; + metricTab:{[tab;metricName]?[tab;enlist(in;`metricName;metricName);0b;()] + }[metricTab;metricName] + ]; + metricTab + } + +// @private +// +// @overview +// Search for the parameter json fil +// +// @param paramPath {string} The path to the param JSON file +// +// @return {table|dict|string} The information within the parameter JSON file +registry.util.search.params:{[paramPath] + .j.k raze read0 hsym`$paramPath + } diff --git a/ml/registry/q/main/utils/set.q b/ml/registry/q/main/utils/set.q new file mode 100644 index 0000000..a741cd4 --- /dev/null +++ b/ml/registry/q/main/utils/set.q @@ -0,0 +1,694 @@ +// set.q - Utilties relating to setting objects in the registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Registry object setting utilities +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Add a q/Python objects to the registry so that they can be retrieved and +// applied to new data. In the current iteration there is an assumption of +// complete independence for the q functions/files i.e. q function/workflows +// explicitly don't use Python to make it easier to store and generate +// reintroduce models +// +// @param model {any} `(<|dict|fn|proj)` The model to be saved to the registry. +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras" +// @param config {dict} Any additional configuration needed for +// initialising the experiment +// +// @return {dict} Updated config dictionary containing relevant +// registry paths +registry.util.set.model:{[model;modelType;config] + load config`modelStorePath; + config:registry.util.create.modelFolders[model;modelType] config; + registry.set.object[modelType;config`registryPath;model;config]; + if[not count key hsym `$config`codePath; + registry.util.set.code[config`code;config`registryPath;config] + ]; + registry.util.set.version[modelType;config]; + registry.util.set.requirements config; + registry.set.modelConfig[model;modelType] config; + registry.set.modelStore config; + if[`data in key config; + registry.set.monitorConfig[model;modelType;config`data;config] + ]; + if[`supervise in key config; + registry.set.superviseConfig[model;config] + ]; + load config`modelStorePath; + whereClause: enlist (&;(&;(~\:;`version;config[`version]);(~\:;`modelName;config[`modelName])); + (~\:;`experimentName;config[`experimentName])); + columns:enlist `uniqueID; + config[`uniqueID]:first ?[config`modelStorePath;whereClause;0b;columns!columns]`uniqueID; + config + } + +// @private +// +// @overview +// General function for setting a file within the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.file:{[extension;registryPath;model;modelInfo] + model:hsym $[10h=type model;`$;]model; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.copy.file[model;hsym`$modelPath,extension] + } + +// @private +// +// @overview +// General function for setting a directory within the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @param extension {string} The name to be associated with the new copied dir +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.dir:{[extension;registryPath;model;modelInfo] + model:hsym $[10h=type model;`$;]model; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.copy.dir[model;hsym`$modelPath,extension] + } + +// @private +// +// @overview +// Set a file associated with a model to the ML Registry such that it +// can be deployed in the same way as the functions added to the registry +// from within process +// +// @param typ {symbol} Type of model being saved +// @param func {fn} Function used to load model +// @param mdl {symbol} Model name +// @param args {any} Arguments required for `registry.util.set.file`. +// +// @return {null} +registry.util.set.modelFile:{[typ;func;mdl;args] + err:"Could not retrieve ",string[typ]," model"; + mdl:@[func;mdl;{[x;y]'x," with error: ",y}err]; + mlops.check[typ][mdl;1b]; + registry.util.set.file . args + } + +// @private +// +// @overview +// Set a dir associated with a model to the ML Registry such that it +// can be deployed in the same way as the functions added to the registry +// from within process +// +// @param typ {symbol} Type of model being saved +// @param func {fn} Function used to load model +// @param mdl {symbol} Model name +// @param args {any} Arguments required for `registry.util.set.dir`. +// +// @return {null} +registry.util.set.modelDir:{[typ;func;mdl;args] + err:"Could not retrieve ",string[typ]," model"; + mdl:@[func;mdl;{[x;y]'x," with error: ",y}err]; + registry.util.set.dir . args + } + +// @private +// +// @overview +// Set a file associated with a q binary file to the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @todo +// Check that the file can be retrieved and is a suitable q object to be +// introduced to the system +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.qFile:{[extension;registryPath;model;modelInfo] + func:get; + mdl:hsym$[10h=type model;`$;]model; + args:(extension;registryPath;model;modelInfo); + registry.util.set.modelFile[`q;func;mdl;args] + }["/mdl"] + +// @private +// +// @overview +// Set a file associated with an Python object model to the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @todo +// Check that the file can be unpickled using joblib such that it can be +// introduced to the system appropriately +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.pythonFile:{[extension;registryPath;model;modelInfo] + func:.p.import[`joblib]`:load; + mdl:$[10h=type model;;1 _ string hsym@]model; + args:(extension;registryPath;model;modelInfo); + registry.util.set.modelFile[`python;func;pydstr mdl;args] + }["/mdl.pkl"] + +// @private +// +// @overview +// Set a file associated with an Sklearn pickled fit model to the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @todo +// Check that the file being added to the registry complies with +// being an fit sklearn model +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.sklearnFile:{[extension;registryPath;model;modelInfo] + func:.p.import[`joblib]`:load; + mdl:$[10h=type model;;1_string hsym@]model; + args:(extension;registryPath;model;modelInfo); + registry.util.set.modelFile[`sklearn;func;pydstr mdl;args] + }["/mdl.pkl"] + +// @private +// +// @overview +// Set a file associated with an XGBoost pickled fit model to the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @todo +// Check that the file being added to the registry complies with +// being an fit xgboost model +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.xgboostFile:{[extension;registryPath;model;modelInfo] + func:.p.import[`joblib]`:load; + mdl:$[10h=type model;;1_string hsym@]model; + args:(extension;registryPath;model;modelInfo); + registry.util.set.modelFile[`xgboost;func;pydstr mdl;args] + }["/mdl.pkl"] + +// @private +// +// @overview +// Set a file associated with a Keras model (.h5) to the +// ML Registry such that it can be deployed in the same way as the +// functions added to the registry from within process +// +// @todo +// Check that the file being added to the registry complies with +// being an fit keras model +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.kerasFile:{[extension;registryPath;model;modelInfo] + func:.p.import[`keras.models]`:load_model; + mdl:$[10h=type model;;1_string hsym@]model; + args:(extension;registryPath;model;modelInfo); + registry.util.set.modelFile[`keras;func;pydstr mdl;args] + }["/mdl.h5"] + +// @private +// +// @overview +// Set a file associated with a Pytorch jit saved model to the +// ML Registry such that it can be deployed +// +// @todo +// Check that the file can be retrieved and is a suitable torch object to be +// introduced to the system +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.torchFile:{[extension;registryPath;model;modelInfo] + torch:@[.p.import; + `torch; + {[x]logging.error"PyTorch not installed, cannot add PyTorch models to registry"} + ]; + modelPath:$[10h=type model;;1_string hsym@]model; + mdl:@[torch[`:jit.load]; + modelPath; + {[torch;modelPath;err] + @[torch[`:load]; + pydstr modelPath; + {[x] + logging.error"Torch models saved must be loadable using 'torch.load'|'torch.jit.load'" + }] + }[torch;modelPath] + ]; + mlops.check.torch[mdl;1b]; + registry.util.set.file[extension;registryPath;model;modelInfo] + }["/mdl.pt"] + + +// @private +// +// @overview +// Set a file associated with a PySpark pipeline saved model to the +// ML Registry such that it can be deployed +// +// @todo +// NOTE CAN ONLY LOAD FIT PIPELINES NOT MODELS +// +// @param extension {string} The name to be associated with the new copied file +// @param registryPath {string} Full/relative path to the model registry +// @param model {string|#hsym|symbol} Full/relative path to the model being copied +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.pysparkFile:{[extension;registryPath;model;modelInfo] + pipe:.p.import[`pyspark.ml]`:PipelineModel; + func:pipe`:load; + mdl:$[10h=type model;;1_string hsym@]model; + args:(extension;registryPath;model;modelInfo); + registry.util.set.modelDir[`pyspark;func;mdl;args] + }["/mdl.model"] + +// @private +// +// @overview +// Protected writing of a model +// +// @param writer {fn} Function to write model to disk +// @param path {string} Path to write model to +// +// @return {null} +registry.util.set.write:{[writer;path] if[not count key hsym `$path;writer path]} + +// @private +// +// @overview +// Set an underlying Sklearn embedPy object within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(<|foreign)` The sklearn model to be saved as a pickle file. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.sklearnModel:{[registryPath;model;modelInfo] + $[99h=type model; + [ + {[registryPath;modelInfo;sym;model] + mlops.check.sklearn[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`joblib][`:dump][x;y]}[model];modelPath,string[sym],"/mdl.pkl"]; + }[registryPath;modelInfo]'[key model;value model]; + ]; + [ + mlops.check.sklearn[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`joblib][`:dump][x;pydstr y]}[model];modelPath,"/mdl.pkl"]; + ] + ] + } + +// @private +// +// @overview +// Set an underlying XGBoost embedPy object within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(<|foreign)` The xgboost model to be saved as a pickle file. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.xgboostModel:{[registryPath;model;modelInfo] + $[99h=type model; + [ + {[registryPath;modelInfo;sym;model] + mlops.check.xgboost[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`joblib][`:dump][x;y]}[model];modelPath,"/",string[sym],"/mdl.pkl"]; + }[registryPath;modelInfo]'[key model;value model]; + ]; + [ + mlops.check.xgboost[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`joblib][`:dump][x;pydstr y]}[model];modelPath,"/mdl.pkl"]; + ] + ] + } + + +// @private +// +// @overview +// Set an underlying PySpark embedPy object within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(<|foreign)` The xgboost model to be saved as a pickle file. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.pysparkModel:{[registryPath;model;modelInfo] + $[99h=type model; + [{[registryPath;modelInfo;sym;model] + mlops.check.pyspark[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + if[not (model[`:__class__][`:__name__]`) like "*Pipeline*"; + pipe:.p.import[`pyspark.ml]`:Pipeline; + model:pipe[`stages pykw enlist model`][`:fit][()]; + ]; + registry.util.set.write[model[`:save];modelPath,"/",string[sym],"/mdl.model"]; + }[registryPath;modelInfo]'[key model;value model]; + ]; + [ + mlops.check.pyspark[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + if[not (model[`:__class__][`:__name__]`) like "*Pipeline*"; + pipe:.p.import[`pyspark.ml]`:Pipeline; + model:pipe[`stages pykw enlist model`][`:fit][()]; + ]; + registry.util.set.write[{x[pydstr y]}[model[`:save]];modelPath,"/mdl.model"]; + ] + ] + } + + +// @private +// +// @overview +// Set an q function object within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(dict|fn|proj)` The model to be saved +// @param modelInfo {dict} Information relating to the model that is +// to be saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.qModel:{[registryPath;model;modelInfo] + func1:{[registryPath;modelInfo;model] + mlops.check.q[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{hsym[`$y]set x}[model];modelPath,"mdl"]; + }[registryPath;modelInfo]; + + func2:{[registryPath;modelInfo;sym;model] + mlops.check.q[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{hsym[`$y]set x}[model];modelPath,string[sym],"/mdl"]; + }[registryPath;modelInfo]; + + $[(99h=type[model]); + $[not(`predict in key model)|(`modelInfo in key model);func2'[key model;value model];func1[model]]; + func1[model] + ]; + } + +// @private +// +// @overview +// Set an python embedPy object within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(<|foreign)` The Python object to be saved as a pickle file. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.pythonModel:{[registryPath;model;modelInfo] + $[99h=type model; + [ + {[registryPath;modelInfo;sym;model] + mlops.check.python[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`joblib][`:dump][x;y]}[model];modelPath,"/",string[sym],"/mdl.pkl"]; + }[registryPath;modelInfo]'[key model;value model]; + ]; + [ + mlops.check.python[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`joblib][`:dump][x;y]}[model];modelPath,"/mdl.pkl"]; + ] + ] + } + +// @private +// +// @overview +// Set a Keras model within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(<|foreign)` The Keras object to be saved as a h5 file. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.kerasModel:{[registryPath;model;modelInfo] + $[99h=type model; + [{[registryPath;modelInfo;sym;model] + mlops.check.keras[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[model[`:save];modelPath,"/",string[sym],"/mdl.h5"]; + }[registryPath;modelInfo]'[key model;value model]; + ]; + [ + mlops.check.keras[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[model[`:save];modelPath,"/mdl.h5"]; + ] + ] + } + +// @private +// +// @overview +// Set a Torch model within the ML Registry +// +// @param registryPath {string} Full/relative path to the model registry +// @param model {any} `(<|foreign)` The Torch object to be saved as a h5 file. +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.torchModel:{[registryPath;model;modelInfo] + $[99h=type model; + [{[registryPath;modelInfo;sym;model] + mlops.check.torch[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`torch][`:save][x;y]}[model];modelPath,"/",string[sym],"/mdl.pt"]; + }[registryPath;modelInfo]'[key model;value model]; + ]; + [ + mlops.check.torch[model;0b]; + modelPath:registry.util.path.modelFolder[registryPath;modelInfo;`model]; + registry.util.set.write[{.p.import[`torch][`:save][x;pydstr y]}[model];modelPath,"/mdl.pt"]; + ] + ] + } + +// @private +// +// @overview +// Add a code file with extension '*.p','*.py','*.q' to a specific +// model such that the code can be loaded on retrieval of the model. +// This is required to facilitate comprehensive support for PyTorch +// models being persisted and usable. +// +// @param files {symbol|symbol[]} The absolute/relative path to a file or +// list of files that are to be added to the registry associated with a +// model. These must be '*.p', '*.q' or '*.py' +// @param registryPath {string} Full/relative path to the model registry +// @param modelInfo {dict} Information relating to the model which is +// being saved, this includes version, experiment and model names +// +// @return {null} +registry.util.set.code:{[files;registryPath;modelInfo] + if[(11h<>abs type files)|all null files;:(::)]; + files:registry.util.check.code[files]; + if[0~count files;:(::)]; + codePath:registry.util.path.modelFolder[registryPath;modelInfo;`code]; + registry.util.copy.file[;hsym`$codePath]each files; + } + +// @private +// +// @overview +// Add a requirements file associated with a model to the versioned model +// folder this can be either a 'pip freeze` of the current environment, +// a user supplied list of requirements which can be pip installed or the +// path to an existing requirements.txt file which can be used. +// +// 'pip freeze' is only suitable for users running within venvs and as such +// is not supported within environments which are not inferred to be venvs as +// running within 'well' established environments can cause irreconcilable +// requirements. +// +// @param folderPath {string|null} A folder path indicating the location +// the registry containing the model which is to be populated with a requirements +// file +// @param config Configuration provided by the user to +// customize the experiment +// +// @return {null} +registry.util.set.requirements:{[config] + requirement:config[`requirements]; + $[0b~requirement; + :(::); + 1b~requirement; + registry.util.requirements.pipfreeze config; + -11h=type requirement; + registry.util.requirements.copyfile config; + 0h=type requirement; + registry.util.requirements.list config; + logging.error"requirements config key must be a boolean, symbol or list of strings" + ]; + } + +// @private +// +// @overview +// Set the parameters to a json file +// +// @param paramPath {string} The path to the parameter file +// @param params {dict|table|string} The parameters to save to file +// +// @return {null} +registry.util.set.params:{[paramPath;params] + (hsym `$paramPath) 0: enlist .j.j params + } + +// @private +// +// @overview +// Set a metric associated with a model to a supported cloud +// vendor or on-prem. This is a wrapper function used to facilitate +// protected execution. +// +// @param storage {symbol} Type of registry storage - local or cloud +// @param experimentName {string|null} The name of an experiment +// @param modelName {string|null} The name of the model to be retrieved +// @param version {long[]|null} The specific version of a named model +// @param metricName {string} The name of the metric to be persisted +// @param metricValue {float} The value of the metric to be persisted +// +// @return {null} +registry.util.set.metric:{[storage;experimentName;modelName;version;config;metricName;metricValue] + modelDetails:registry.util.search.model[experimentName;modelName;version;config]; + if[not count modelDetails; + logging.error"No model meeting your provided conditions was available" + ]; + // Construct the path to metric folder containing the config to be updated + config,:flip modelDetails; + metricPath:registry.util.path.modelFolder[config`registryPath;config;`metrics]; + fileExists:`metric in key hsym`$metricPath; + if[not fileExists;registry.util.create.modelMetric[metricPath]]; + registry.set.modelMetric[metricName;metricValue;metricPath]; + if[`local<>storage; + registry.cloud.update.publish config + ]; + } + +// @private +// +// @overview +// Set JSON file for specified object +// +// @param config {dict} Information relating to the model +// being saved, this includes version, experiment and model names +// @param jsonTyp {symbol} `registry.util.create` function to call +// @param jsonStr {string} Name of JSON file +// @param args {any} Arguments to apply to `registry.util.create` function. +// +// @return {null} +registry.util.set.json:{[config;jsonTyp;jsonStr;args] + jsonConfig:registry.util.create[jsonTyp]. args; + if[not(::)~jsonConfig; + (hsym `$config[`versionPath],"/config/",jsonStr,".json") 0: enlist .j.j jsonConfig + ]; + } + +// @private +// +// @overview +// Set Python library and q/Python language versions with persisted models +// +// @param modelType {string} User provided model type defining is the model was "q"/"sklearn" etc +// @param config Information relating to the model +// being saved, this includes version, experiment and model names along with +// path information relating to the saved location of model +// +// @return {null} +registry.util.set.version:{[modelType;config] + // Information about Python/q version used in model saving + versionFile:config[`versionPath],"/.version.info"; + + // Define q version used when persisting the model + versionInfo:enlist[`q_version]!enlist "Version: ",string[.z.K]," | Release Date: ",string .z.k; + + // Add model type to version info + versionInfo,:enlist[`model_type]!enlist modelType; + + // If the model isn't q save version of Python used + if[`q<>`$modelType;versionInfo,:enlist[`python_version]!enlist .p.import[`sys;`:version]`]; + + // Information about the Python library version used in the process of generating the model + if[(`$modelType) in `sklearn`keras`torch`xgboost`pyspark; + versionInfo,:enlist[`python_library_version]!enlist pygetver modelType; + ]; + // dont allow same model with different versions of q/python + $[count key hsym `$versionFile; + $[(.j.k raze read0 hsym `$versionFile)~.j.k raze .j.j versionInfo; + (hsym `$versionFile) 0: enlist .j.j versionInfo; + '"Error writing same model with two environments see .version.info file" + ]; + (hsym `$versionFile) 0: enlist .j.j versionInfo]; + } diff --git a/ml/registry/q/main/utils/update.q b/ml/registry/q/main/utils/update.q new file mode 100644 index 0000000..b525f3d --- /dev/null +++ b/ml/registry/q/main/utils/update.q @@ -0,0 +1,63 @@ +// update.q - Functionality for updating information related to the registry +// Copyright (c) 2021 Kx Systems Inc +// +// @overview +// Utilities for updating registry information +// +// @category Model-Registry +// @subcategory Utilities +// +// @end + +\d .ml + +// @private +// +// @overview +// Update the configuration supplied by a user such to include +// all relevant information for the saving of a model and its +// associated configuration +// +// @param modelName {string} The name to be associated with the model +// @param modelType {string} The type of model that is being saved, namely +// "q"|"sklearn"|"keras" +// @param config {dict} Configuration information provided by the user +// +// @return {dict} Default configuration defined by +// '.ml.registry.config.model' updated with user supplied information +registry.util.update.config:{[modelName;modelType;config] + config:registry.config.model,config; + config[`experimentName]:registry.util.check.experiment config`experimentName; + config,:`modelName`modelType!(modelName;modelType); + registry.util.check.modelType config; + config,:`registrationTime`uniqueID!(enlist .z.p;-1?0Ng); + registry.util.search.version config + } + +// @private +// +// @overview +// Check folder paths, storage type and configuration and prepare the +// ML Registry for publishing to the appropriate vendor +// +// @param folderPath {string|null} A folder path indicating the location +// of the registry or generic null if in the current directory +// @param experimentName {string|null} The name of an experiment from which +// to retrieve a model, if no modelName is provided the newest model +// within this experiment will be used. If neither modelName or +// experimentName are defined the newest model within the +// "unnamedExperiments" section is chosen +// @param modelName {string|null} The name of the model to be retrieved +// in the case this is null, the newest model associated with the +// experiment is retrieved +// @param version {long[]|null} The specific version of a named model to retrieve +// in the case that this is null the newest model is retrieved (major;minor) +// @param config {dict|null} Configuration information provided by the user +// +// @return {dict} Updated configuration information +registry.util.update.checkPrep:{[folderPath;experimentName;modelName;version;config] + config,:registry.util.check.config[folderPath;config]; + if[`local<>storage:config`storage;storage:`cloud]; + prepParams:(folderPath;experimentName;modelName;version;config); + registry[storage;`update;`prep]. prepParams + } diff --git a/ml/registry/tests/registry.t b/ml/registry/tests/registry.t new file mode 100644 index 0000000..61db20e --- /dev/null +++ b/ml/registry/tests/registry.t @@ -0,0 +1,672 @@ + +\l ml.q +.ml.loadfile`:util/init.q +.ml.loadfile`:clust/init.q +.ml.loadfile`:mlops/init.q +.ml.loadfile`:registry/init.q + +registry:"RegistryTests"; +registryDict:enlist[`local]!enlist registry; +@[system"mkdir -p ",;registry;{}]; + +/ Have set .ml.registry.location to expected location +/ Default registry to be root local directory +.ml.registry.location~enlist[`local]!enlist"." + +/ Create a new registry at a supplied location +/ A folder KX_ML_REGISTRY to exist in the 'RegistryTests' folder +.ml.registry.new.registry[registryDict;::]; +`KX_ML_REGISTRY~first key`:RegistryTests + +/ Populate a modelStore within a new registry +/ The modelStore to be a table +.ml.registry.get.modelStore[registry;::]; +98h=type modelStore + +/ The modelStore to contain the expected columns +~[cols modelStore;`registrationTime`experimentName`modelName`uniqueID`modelType`version`description] + +/ Add a new experiment to the registry +/ To be able to list a new experiment within the ML Registry +.ml.registry.new.experiment[registry;"ExperimentTest";::]; +`ExperimentTest in key`:RegistryTests/KX_ML_REGISTRY/namedExperiments + +/ The newly created experiment to contain only one file +1=count key`:RegistryTests/KX_ML_REGISTRY/namedExperiments/ExperimentTest + +/ Set q functions generated within a q process to a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +basicName:"basic-model"; +basicModel1:{x} ;basicModel2:{x+1}; +basicModel3:{x+2};basicModel4:{x+3}; +basicModel5:{x+4};basicModel6:{x+5}; +major:enlist[`major]!enlist 1b; +majorVersion:enlist[`majorVersion]!enlist 1; + +/ Add q models to a registry and be appropriately versioned +/ In sequence major/minor versioning of q models is appropriately applied +.ml.registry.set.model[registry;::;basicModel1;basicName;"q";enlist[`description]!enlist"test description"]; +.ml.registry.set.model[registry;::;basicModel2;basicName;"q";::]; +.ml.registry.set.model[registry;::;basicModel3;basicName;"q";::]; +.ml.registry.set.model[registry;::;basicModel4;basicName;"q";major]; +.ml.registry.set.model[registry;::;basicModel5;basicName;"q";::]; +.ml.registry.set.model[registry;::;basicModel6;basicName;"q";majorVersion]; +~[exec version from modelStore;(1 0; 1 1; 1 2; 2 0; 2 1; 1 3)] + +/ Add q models to a registry and be appropriately versioned +/ Guid type returned from set model +g:.ml.registry.set.model[registry;::;basicModel1;"testName";"q";::]; +type[g]=-2h + +/ Set Sklearn models generated within a q process to a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +major:enlist[`major]!enlist 1b; +blobs :.p.import[`sklearn.datasets][`:make_blobs;<]; +skdata:blobs[`n_samples pykw 1000;`centers pykw 2;`random_state pykw 500]; +skAP :.p.import[`sklearn.cluster][`:AffinityPropagation]; +sklearnAP1:skAP[`damping pykw 0.8][`:fit]skdata 0; +sklearnAP2:skAP[`damping pykw 0.5][`:fit]skdata 0; +expName:"cluster"; + +/ Add models generated using Pythons Scikit-Learn package to the registry +/ That two major versioned Scikit-learn models are added to the registry +.ml.registry.set.model[registry;expName;sklearnAP1;"skAPmodel";"sklearn";::]; +.ml.registry.set.model[registry;"cluster";sklearnAP2;"skAPmodel";"sklearn";major]; +~[exec version from modelStore where modelName like "skAPmodel";(1 0;2 0)] + +/ Set and retrieve a XGBoost model from the registry and use it for prediction + +.ml.registry.delete.registry[::;::]; +X: ([]1000?1f;1000?1f;1000?1f); +y: 1000?2; +Xtest: ([]1000?1f;1000?1f;1000?1f); +clf: .p.import[`xgboost;`:XGBClassifier][][`:fit][flip X`x`x1`x2; y]; + +/ Set and retrieve a model and use it for prediction +/ Predictions to be the same after retrieving the model from the registry +.ml.registry.set.model[::;::;clf;"xgb";"xgboost";::]; +mdl: .ml.registry.get.model[::;::;"xgb";::]; +predict: .ml.registry.get.predict[::;::;"xgb";::]; +predict[Xtest]~clf[`:predict][flip Xtest`x`x1`x2]` + +/ Set and retrieve a pyspark model from the registry and use it for prediction + +.ml.registry.delete.registry[::;::]; +PySpark:.p.import`pyspark; +V:.p.import[`pyspark.ml.linalg]`:Vectors; +LR:.p.import[`pyspark.ml.classification]`:LogisticRegression; +spark:PySpark[`:sql.SparkSession.builder][`:getOrCreate][]; +training : spark[`:createDataFrame][((1.0; V[`:dense][(0.0; 1.1; 0.1)]`);(0.0; V[`:dense][(2.0; 1.0; -1.0)]`);(0.0; V[`:dense][(2.0; 1.3; 1.0)]`);(1.0; V[`:dense][(0.0; 1.2; -0.5)]`));$[.pykx.loaded;.pykx.topy `label`features;("label"; "features")]]; +lr:LR[`maxIter pykw 10;`regParam pykw 0.01]; +model:lr[`:fit][training]; +xtest:([]10?1f;10?1f;10?1f); + +/ Set and retrieve a model and use it for prediction +/ Predictions to be the same after retrieving the model from the registry +.ml.registry.set.model[::;::;model;"pysp";"pyspark";::]; +mdl: .ml.registry.get.model[::;::;"pysp";::]; +predict: .ml.registry.get.predict[::;::;"pysp";::]; +a:predict[xtest]; +all(all all(a in (0.0;1.0));10=count a) + +/ Set q functions generated using the ML-Toolkit to a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +axis:enlist[`axis]!enlist 1b; +blobs :.p.import[`sklearn.datasets][`:make_blobs;<]; +skdata:blobs[`n_samples pykw 1000;`centers pykw 2;`random_state pykw 500]; +qAP1:.ml.clust.ap.fit[flip skdata 0;`nege2dist;0.8;min;::]; +qAP2:.ml.clust.ap.fit[flip skdata 0;`nege2dist;0.5;min;::]; + +/ Add q Affinity propagation models associated within the ML-Toolkit to the registry +/ That two minor versioned q Affinity propagation models are added to the registry +.ml.registry.set.model[registry;"cluster";qAP1;"qAPmodel";"q";axis]; +.ml.registry.set.model[registry;"cluster";qAP2;"qAPmodel";"q";axis]; +~[exec version from modelStore where modelName like "qAPmodel";(1 0; 1 1)] + +/ Add basic q model to a subexperiment within the registry +/ model to be addded to subexperiment +.ml.registry.set.model[registry;"exp/subexp";{x};"subExModel";"q";::]; +~[exec version from modelStore where modelName like "subExModel";enlist 1 0] + +/ Retrieve Models from the Registry based on different conditions + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +basicName:"basic-model"; +skName:"skAPmodel"; + +/ Retrieve retrieve previously generated models from the registry +/ To retrieve the highest versioned 'basic-model' +basicModel5~.ml.registry.get.model[registry;::;basicName;::]`model + +/ To retrieve version 1.1 of 'basic-model' +basicModel2~.ml.registry.get.model[registry;::;basicName;(1 1)]`model + +/ To retrieve the most recently added model to the registry +qModel:.ml.registry.get.model[registry;::;::;::]; +qModelInfo:qModel[`modelInfo;`registry;`modelInformation;`modelName`version]; +~[qModelInfo;("subExModel";1 0f)] + +/ To retrieve version 1.0 of the 'skAPmodel' +skModel:.ml.registry.get.model[registry;"cluster";skName;1 0]; +skModelInfo:skModel[`modelInfo;`registry;`modelInformation;`modelName`version]; +~[skModelInfo;("skAPmodel";1 0f)] + +/ Set and retrieve PyTorch models from a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +torchName:"torchModel"; +system"l examples/code/torch/torch.p"; +system"l examples/code/torch/torch.q"; +torchData :flip value flip ([]100?1f;asc 100?1f;100?10); +torchTarget:100?5; +mdl:.p.get[`classifier][count first torchData;200]; +torchModel:.torch.fitModel[torchData;torchTarget;mdl]; +torchCode :enlist[`code]!enlist `:examples/code/torch/torch.p; + +/ Add and retrieve Torch models from a registry +/ The model to be added and retrieved appropriately +.ml.registry.set.model[registry;::;torchModel;torchName;"torch";torchCode]; +.ml.registry.set.model[registry;::;torchModel;torchName;"torch";torchCode]; +getTorchModel:.ml.registry.get.model[registry;::;torchName;1 0]; +torchModelInfo:getTorchModel[`modelInfo;`registry;`modelInformation;`modelName`version]; +~[torchModelInfo;(torchName;1 0f)] + +/ The prediction model when retrieved and invoked to returns an appropriate type +getTorchPredict:.ml.registry.get.predict[registry;::;torchName;1 0]; +type[getTorchPredict torchData]in 8 9h + + +/Skipping any tests that use online analytics as functionality is not required + +/Update functions can be retrieved from the registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +X:100 2#200?1f; +/yReg:100?1f; +yClass:100?0b; +/online1:.ml.online.clust.sequentialKMeans.fit[flip X;`e2dist;3;::;::]; +/online2:.ml.online.sgd.linearRegression.fit[X;yReg;1b;::]; +sgdClass:.p.import[`sklearn.linear_model][`:SGDClassifier]; +sgdModel:sgdClass[pykwargs `max_iter`tol!(1000;0.003)][`:fit] . (X;yClass); + +/ should Add and retrieve update functions from the registry +/ expect The q clustering model to be added and retrieved appropriately +/.ml.registry.set.model[registry;::;online1;"onlineCluster";"q";::]; +/mdl1:.ml.registry.get.update[registry;::;"onlineCluster";::;0b]; +/(99h;`modelInfo`predict`update)~(type;key)@\:mdl1 flip X + +/ expect The q Linear Regression model to be added and retrieved appropriately +/.ml.registry.set.model[registry;::;online2;"onlineRegression";"q";::]; +/mdl2:.ml.registry.get.update[registry;::;"onlineRegression";::;1b]; +/(99h;`modelInfo`predict`update`updateSecure)~(type;key)@\:mdl2[X;yReg] + +/ expect An sklearn model to be added and retrieved appropriately +.ml.registry.set.model[registry;::;sgdModel;"SklearnSGD";"sklearn";::]; +mdl3:.ml.registry.get.update[registry;::;"SklearnSGD";::;1b]; +105h~type mdl3[X;yClass] + +/ expect The models retrieved models once used to be suitable for registry setting +/.ml.registry.set.model[registry;::;mdl1 flip X ;"onlineCluster";"q";::]; +/.ml.registry.set.model[registry;::;mdl2[X;yReg] ;"onlineRegression";"q";::]; +.ml.registry.set.model[registry;::;mdl3[X;yClass];"SklearnSGD";"sklearn";::]; +modelNames:("SklearnSGD"); +modelTypes:1b; +models:.ml.registry.get.update[registry;::;;::;][modelNames;modelTypes]; +(105h)~type each models + + +/ Users can set non-unique models in different experiments + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +qName:"qmodel"; +exp1Name:"experiment1"; +exp2Name:"experiment2"; + +/ Delete experiment +.ml.registry.delete.experiment[registry]each(exp1Name;exp2Name); + +/ allow multiple models to be added which have the same name +/ multiple models to be added to the registry in different experiments +.ml.registry.set.model[registry;exp1Name;{x};qName;"q";::]; +.ml.registry.set.model[registry;exp2Name;{x+1};qName;"q";::]; +store:.ml.registry.get.modelStore[registry;::]; +not(~/)exec experimentName from store where modelName like qName + +/ be able to retrieve models from different experiments that are named equivalently +/ multiple models to be retrieved with the same name +models:(.ml.registry.get.model[registry;exp1Name;qName;::][`modelInfo;`registry;`modelInformation;`modelName];.ml.registry.get.model[registry;exp2Name;qName;::][`modelInfo;`registry;`modelInformation;`modelName]); +all models~\:"qmodel" + +/ Prediction function wrappers can handle various data formats as input + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +qName:"qAPmodel"; +skName:"skAPmodel"; +expName:"cluster"; +dsetNew:2 10#20?1f; +clustDict:`col1`col2!dsetNew; +clustTab:flip clustDict; + +/ Ensure that matrix, dictionary and tabular data can be supplied to Python/q registry models +/ A q model to be invoked correctly with dict/tab/matrix input +qPredict:.ml.registry.get.predict[registry;expName;qName;1 0]; +return:qPredict@/:(dsetNew;clustDict;clustTab); +all raze(7h=type@/:;not 1_differ::)@\:return + +/ An Sklearn model to be invoked correctly with tab/matrix input +skPredict:.ml.registry.get.predict[registry;expName;skName;1 0]; +return:skPredict@/:(flip dsetNew;clustTab); +all raze(7h=type@/:;not 1_differ::)@\:return + +/ Models can be added to and retrieved from a registry from disk + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +getLatest:{[] + mlModel:.ml.registry.get.model[registry;::;::;::]; + mlModel[`modelInfo;`registry;`modelInformation;`modelName`version] + }; + +/ Ensure models can be added and retrieved based on name from disk +/ A q model to be saved and retrieved based on file +model:`:examples/models/qModel; +.ml.registry.set.model[registry;::;model;"qmdl";"q";::]; +~[getLatest[];("qmdl";1 0f)] + +/ A py model to be saved and retrieved based on file +model:`:examples/models/pythonModel.pkl; +.ml.registry.set.model[registry;::;model;"pymdl";"python";::]; +~[getLatest[];("pymdl";1 0f)] + +/ A keras model to be saved and retrieved based on file +model:`:examples/models/kerasModel.h5; +.ml.registry.set.model[registry;::;model;"kmdl";"keras";::]; +~[getLatest[];("kmdl";1 0f)] + +/ A sklearn model to be saved and retrieved based on file +model:`:examples/models/sklearnModel.pkl; +.ml.registry.set.model[registry;::;model;"smdl";"sklearn";::]; +~[getLatest[];("smdl";1 0f)] + +/ A PyTorch model to be saved, retrieved and run based on a file +model:`:examples/models/torchModel.pt; +.ml.registry.set.model[registry;::;model;"ptmdl";"torch";::]; +~[getLatest[];("ptmdl";1 0f)] + +/ Configuration information related to a model's characteristics can be added to a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +readBasicJson:{[metric] + cfg:`$":RegistryTests/KX_ML_REGISTRY/unnamedExperiments/basic-model/3.0/config/modelInfo.json"; + info:.j.k raze read0 cfg; + info[`monitoring;metric;`values] + }; +readSklearnJson:{[metric] + skConfig:`:RegistryTests/KX_ML_REGISTRY/namedExperiments/cluster/skAPmodel/2.0/config/modelInfo.json; + info:.j.k raze read0 skConfig; + info[`monitoring;metric;`values] + }; +configData:([] 0N,0W,0N,til 50); +majorData:`major`data!(1b;configData); +basicName:"basic-model"; +skName:"skAPmodel"; +expName:"cluster"; +skRequire:`:RegistryTests/KX_ML_REGISTRY/namedExperiments/cluster/skAPmodel/2.0/requirements.txt; +requirements:("numpy";"pandas==2.0.0";"scikit-learn>=1.0.0"); +configDict:`requirements`data`supervise!(requirements;configData;1b); + +/ Populate latency/null/infinite/schema configuration when setting a model +/ Model latency information to be persisted with a model +.ml.registry.set.model[registry;::;{x};basicName;"q";majorData]; +~[key readBasicJson`latency;`avg`std] + +/ The saved schema to contain the appropriate data +~[readBasicJson`schema;enlist[`x]!enlist enlist "j"] + +/ The null replacement to contain only reference to appropriate schema +~[key readBasicJson`nulls;enlist[`x]] + +/ The infinity replace functionality to have appropriate keys +~[key readBasicJson`infinity;`negInfReplace`posInfReplace] + +/ Populate latency/null/infinite/schema configuration after a model has been added to registry +/ Model latency information to be persisted with the newest Scikit-Learn AP model +.ml.registry.update.config[registry;expName;skName;::;configDict]; +~[key readSklearnJson`latency;`avg`std] + +/ The saved schema to contain the appropriate data +~[readSklearnJson`schema;enlist[`x]!enlist enlist "j"] + +/ The null replacement to contain only reference to appropriate schema +~[key readSklearnJson`nulls;enlist[`x]] + +/ The infinity replace functionality to have appropriate keys +~[key readSklearnJson`infinity;`negInfReplace`posInfReplace] + +/ Python requirements needed for execution of a model can be associated with a given model + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +reqName:"requireModel"; +requirements:("numpy";"pandas";"scikit-learn"); +reqFile:enlist[`requirements]!enlist `$"registry/tests/requirements.txt"; +reqList:enlist[`requirements]!enlist requirements; +readRequire:{[version] + read0 hsym`$"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/requireModel/", + version,"/requirements.txt" + }; + +/ Associate Python requirements with a model +/ Requirements to be added based on reference to a known requirements file location +.ml.registry.set.model[registry;::;{x};reqName;"q";reqFile]; +saved:readRequire "1.0"; +~[saved;requirements] + +/ A list of requirements to be saved with a model +.ml.registry.set.model[registry;::;{x};reqName;"q";reqList]; +saved:readRequire "1.1"; +~[saved;requirements] + +/ Parameter information can be added to and retrieved from the Model registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +basicName:"basic-model"; +paramList:1 2 3 4f; +paramDict:`param1`param2!1 2f; + +/ Add and retrieve parameter information associated with a model +/ To retrieve dictionary parameters saved to disk +.ml.registry.set.parameters[registry;::;basicName;::;"paramFile";paramDict]; +.ml.registry.set.parameters[registry;::;basicName;::;`symParams;paramDict]; +paramList:("paramFile";"symParams"); +params:.ml.registry.get.parameters[registry;::;basicName;::;]each paramList; +all paramDict~/:params + +/ To retrieve list parameters saved to disk +.ml.registry.set.parameters[registry;::;basicName;::;"paramFile2";paramList]; +getParams:.ml.registry.get.parameters[registry;::;basicName;::;"paramFile2"]; +~[getParams;paramList] + +/ Attempts to pass paramName as an inappropriate type +err:@[.ml.registry.set.parameters[registry;::;basicName;::;;1 2];1;{x}]; +err~"ParamName must be of type string or symbol" + +/ Metrics information can be added to and retrieved from a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +basicName:"basic-model"; + +/ Add and retrieve metric information related to a model from a registry +/ To retrieve all metrics in the order they were associated with a model (1) +.ml.registry.log.metric[registry;::;basicName;::;`func1_sym;2.4]; +.ml.registry.log.metric[registry;::;basicName;::;`func1_sym;4]; +.ml.registry.log.metric[registry;::;basicName;::;`func2_sym;0.1]; +metrics:.ml.registry.get.metric[registry;::;basicName;::;::]; +all(~[exec metricValue from metrics;(2.4; 4j; 0.1)];~[type exec metricName from metrics;11h]) + +/ To retrieve all metrics in the order they were associated with a model (2) +.ml.registry.log.metric[registry;::;basicName;::;"func1_str";2.4]; +.ml.registry.log.metric[registry;::;basicName;::;"func1_str";4]; +.ml.registry.log.metric[registry;::;basicName;::;"func2_str";0.1]; +metrics:.ml.registry.get.metric[registry;::;basicName;::;::]; +all(~[exec metricValue from metrics;(2.4; 4j; 0.1; 2.4; 4j; 0.1)];~[type exec metricName from metrics;11h]) + +/ To retrieve all metrics in the order they were associated with a model (3) +.ml.registry.log.metric[registry;::;basicName;::;`func3;"hello"]; +.ml.registry.log.metric[registry;::;basicName;::;`func4;`world]; +.ml.registry.log.metric[registry;::;basicName;::;`func5;2021.10.05]; +metrics:.ml.registry.get.metric[registry;::;basicName;::;::]; +all(~[6_(exec metricValue from metrics);("hello"; `world; 2021.10.05)];~[type exec metricName from metrics;11h]) + +/ To retrieve only metrics related to a specific name (1) +metrics:.ml.registry.get.metric[registry;::;basicName;::;"func1_str"]; +all(~[exec metricValue from metrics;(2.4;4)];~[type exec metricName from metrics;11h]) + +/ To retrieve only metrics related to a specified name +metrics:.ml.registry.get.metric[registry;::;basicName;::;`func1_sym]; + +all(~[exec metricValue from metrics;(2.4;4)];~[type exec metricName from metrics;11h]) + +/ Delete items from a registry + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +basicName:"basic-model"; +lsModelAll:{[file;str] + path:hsym `$"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/basic-model",str; + key .Q.dd[path;file] + }; +lsModel:lsModelAll[;"/3.0/"]; +lsModel2:lsModelAll[;"/3.1/"]; + + +/system"rm -rf RegistryTests"; + + +/ Delete code associated with a model +/ Deletion of code to not be possible if code file doesnt exist +err:.[.ml.registry.delete.code;(registry;::;torchName;1 0;"test");{x}]; +err~"No such code exists at this location, unable to delete." + +/ Deletion of code to be completed appropriately if the code file exists +.ml.registry.delete.code[registry;::;torchName;1 0;"torch.p"]; +path:"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/torchName/1.0/code"; +show key hsym`$path; +0~count key hsym`$path + +/ Deletion of code to be completed appropriately if the code file exists using defaults +.ml.registry.delete.code[registry;::;torchName;::;"torch.p"]; +path:"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/torchName/1.0/code"; +show key hsym`$path; +0~count key hsym`$path + +/ Delete information associated with a single model +/ A metric to be deleted from the metrics table with/without default +name: "xyz_123"; +num_models: count .ml.registry.get.metric[registry;::;basicName;::;::]; +.ml.registry.log.metric[registry;::;basicName;::;name;1.0]; +.ml.registry.delete.metric[registry;::;basicName;3 0;name]; +.ml.registry.log.metric[registry;::;basicName;::;name;1.0]; +.ml.registry.delete.metric[registry;::;basicName;::;name]; +~[count .ml.registry.get.metric[registry;::;basicName;::;::];num_models] + +/ The metric table to be deleted from a model +.ml.registry.delete.metrics[registry;::;basicName;3 0]; +~[lsModel`metrics;`symbol$()] + +/ The metric table to be deleted from a model defaults +name: "xyz_123"; +.ml.registry.set.model[registry;::;{x};basicName;"q";()!()]; +.ml.registry.log.metric[registry;::;basicName;::;name;1.0]; +.ml.registry.delete.metrics[registry;::;basicName;::]; +~[lsModel2`metrics;`symbol$()] + +/ Attempts to delete metrics/tables that dont exist will fail +err1:.[.ml.registry.delete.metrics;(registry;::;basicName;1 0);{x}]; +err2:.[.ml.registry.delete.metric;(registry;::;basicName;1 0;"func");{x}]; +errCode:"No metric table exists at this location, unable to delete."; +all errCode~/:(err1;err2) + +/ A parameter associated with a model to be deleted +.ml.registry.delete.parameters[registry;::;basicName;3 0;"paramFile"]; +not `paramFile.json in lsModel`params + +/ A parameter associated with a model to be deleted default +.ml.registry.set.parameters[registry;::;basicName;3 1;"number";7f]; +.ml.registry.delete.parameters[registry;::;basicName;::;"number"]; +not `number.json in lsModel2`params + +/ Attempts to delete parameters that dont exist to fail +params:(registry;::;basicName;1 0;"paramFile"); +err:.[.ml.registry.delete.parameters;params;{x}]; +err~"No parameter files exists with the given name, unable to delete." + +/ Delete an experiment from the registry +/ ExperimentTest to be removed from the registry +.ml.registry.delete.experiment[registry;"ExperimentTest"]; +not `ExperimentTest in key`:RegistryTests/KX_ML_REGISTRY/namedExperiments + +/ Delete models from the registry +/ A specific versioned model to be deleted from the registry +.ml.registry.delete.model[registry;::;basicName;1 3]; +not (1 3) in exec version from modelStore where modelName like basicName + +/ An entire model to be deleted from the registry +.ml.registry.delete.model[registry;::;basicName;::]; +not count select from modelStore where modelName like basicName + +/ Delete an entire registry +/ The registry located in the RegistryTests folder to be removed +.ml.registry.delete.registry[registry;::]; +not count key`:RegistryTests + +/ Config can be updated + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; +configName:"config-model"; +.ml.registry.set.model[registry;::;{x};configName;"q";::]; +readConfig:{ + cfgPath:hsym `$"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/config-model/1.0/config/modelInfo.json"; + .j.k raze read0 cfgPath + }; + +/ Update requirements information associated with a model +/ To set a boolean indicating that Python requirements are required +req:("numpy";"pandas"); +.ml.registry.update.requirements[registry;::;configName;1 0;req]; +path:"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/","config-model/1.0/requirements.txt"; +req~read0 hsym`$path + +/ Update null information in monitoring config +/ To retreive monitoring data for nulls +.ml.registry.update.nulls[registry;::;configName;1 0;([] 100?1f)]; +d:readConfig[]; +d[`monitoring;`nulls;`values;`x] within (0.2;0.8) + +/ Update infinity information in monitoring config +/ To retreive monitoring data for infinities +.ml.registry.update.infinity[registry;::;configName;1 0;([] 100?1f)]; +d:readConfig[]; +d[`monitoring;`infinity;`values;`posInfReplace;`x] within (0.8;1.2) + +/ Update type information in config +/ To retreive data for type +.ml.registry.update.type[registry;::;configName;1 0;"sklearn"]; +d:readConfig[]; +d[`model;`type]~"sklearn" + +/ Update supervise information in config +/ To retreive data for supervised metrics +.ml.registry.update.supervise[registry;::;configName;1 0;enlist ".ml.mse"]; +d:readConfig[]; +d[`monitoring;`supervised;`values]~enlist[".ml.mse"] + +/ Update schema information in config +/ To retreive data for monitoring schema +.ml.registry.update.schema[registry;::;configName;1 0;([]til 100)]; +d:readConfig[]; +d[`monitoring;`schema;`values;`x]~enlist "j" + +/ Update latency information in config +/ To retreive data for monitoring latency +.ml.registry.update.latency[registry;::;configName;1 0;{x};([]til 100)]; +d:readConfig[]; +d[`monitoring;`latency;`values;`avg]<1f + +/ Update csi information in config +/ To retreive data for monitoring csi +.ml.registry.update.csi[registry;::;configName;1 0;([]1000?1f)]; +d:readConfig[]; +d[`monitoring;`csi;`monitor] + +/ Update psi information in config +/ To retreive data for monitoring psi +.ml.registry.update.psi[registry;::;configName;1 0;{flip value flip x};([]1000?1f)]; +d:readConfig[]; +d[`monitoring;`psi;`monitor] + +/ Language/Library version information is stored with a persisted model + +registry:"RegistryTests"; +@[system"mkdir -p ",;registry;{}]; + +/Delete registry +system"rm -rf ",registry; + +/ Persist a q model and have q version information persisted and associated with the model +/ a file to be persisted which contains version information +.ml.registry.set.model[registry;::;{x};"q-version-model";"q";::]; +path:hsym`$"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/","q-version-model/1.0/.version.info"; +path~key path + +/ the version information to indicate it is a q model and contain only modelType and q information +versionInfo:.ml.registry.get.version[registry;::;::;::]; +all(`q_version`model_type~key versionInfo;enlist["q"]~versionInfo`model_type) + +/ Persist a Pythonic models and have version information persisted and associated with the model +/ a file to be persited which contains version information +.ml.registry.set.model[registry;::;sklearnAP1;"sklearn-version-model";"sklearn";::]; +path:hsym`$"RegistryTests/KX_ML_REGISTRY/unnamedExperiments/","sklearn-version-model/1.0/.version.info"; +path~key path + +/ the version information to indicate the q+Python versions along with sklearn library version +versionInfo:.ml.registry.get.version[registry;::;::;::]; +all( + `q_version`model_type`python_version`python_library_version~key versionInfo; + versionInfo[`model_type]~"sklearn"; + versionInfo[`python_library_version]~.ml.pygetver "sklearn"; + versionInfo[`python_version]~$[.pykx.loaded;string .p.import[`sys][`:version]`;.p.import[`sys][`:version]`] + ) + + + +/ Set and retrieve keyed models + +.ml.registry.delete.registry[::;::]; +/X:([] 100?1f;asc 100?1f); +/y: asc 100?1f; +/m:.ml.online.sgd.linearRegression.fit[X;y;1b;enlist[`maxIter]!enlist[10000]]; +m:{x+1}; +models:`EURUSD`GBPUSD!(m;m); +.ml.registry.set.model[::;::;models;"forex";"q";::]; + +/ write a keyed model in stages + +.ml.registry.set.model[::;::;models;"forexTri";"q";::]; +.ml.registry.set.model[::;::;models,enlist[`EURGBP]!enlist m;"forexTri";"q";enlist[`version]!enlist(1;0)]; + +/ try to overwrite a model + +.ml.registry.set.model[::;::;models,enlist[`EURGBP]!enlist {x};"forexTri";"q";enlist[`version]!enlist(1;0)]; + +/ should Retrieve a models +/ expect retrieve keyed model as a dictionary +models:.ml.registry.get.model[::;::;"forex";::]; +key[models]~`EURUSD`GBPUSD + +/ expect retrieve individual keyed model +model:.ml.registry.get.keyedmodel[::;::;"forex";::;`EURUSD]; +key[model]~`modelInfo`model + +/ expect retrieve keyed model set in stages +models:.ml.registry.get.model[::;::;"forexTri";::]; +`EURGBP`EURUSD`GBPUSD ~ asc key[models] + +/ expect model EURGBP was not over written +models:.ml.registry.get.model[::;::;"forexTri";::]; +not models[`EURGBP;`model]~{x} diff --git a/ml/registry/tests/requirements.txt b/ml/registry/tests/requirements.txt new file mode 100644 index 0000000..ad5443d --- /dev/null +++ b/ml/registry/tests/requirements.txt @@ -0,0 +1,3 @@ +numpy +pandas +scikit-learn diff --git a/ml/registry/tests/scripts/example.p b/ml/registry/tests/scripts/example.p new file mode 100644 index 0000000..c249bb9 --- /dev/null +++ b/ml/registry/tests/scripts/example.p @@ -0,0 +1 @@ +python_test=10 diff --git a/ml/registry/tests/scripts/example.q b/ml/registry/tests/scripts/example.q new file mode 100644 index 0000000..a6fded2 --- /dev/null +++ b/ml/registry/tests/scripts/example.q @@ -0,0 +1 @@ +.test.q.example:1 diff --git a/ml/registry/tests/scripts/monitorUtils.q b/ml/registry/tests/scripts/monitorUtils.q new file mode 100644 index 0000000..45e6933 --- /dev/null +++ b/ml/registry/tests/scripts/monitorUtils.q @@ -0,0 +1,20 @@ +monitorCols:`nulls`infinity`schema`latency`psi`csi`supervised; +monitorFeatureChecks:{[k;r] + all(type[r]~99h; + count[r]~7; + cols[r]~k; + value[r]~1111110b + ) + }[monitorCols] +monitorValueChecks:{[k;r] + all(type[r]~99h; + count[r]~7; + cols[r]~k; + key[r`nulls]~enlist`x; + key[r`infinity]~`negInfReplace`posInfReplace; + key[r`latency]~`avg`std; + key[r`csi]~enlist`x; + r[`schema]~enlist[`x]!enlist(),"f"; + r[`supervised]~() + ) + }[monitorCols] diff --git a/ml/registry/tests/scripts/test_torch.py b/ml/registry/tests/scripts/test_torch.py new file mode 100644 index 0000000..43d8b68 --- /dev/null +++ b/ml/registry/tests/scripts/test_torch.py @@ -0,0 +1,45 @@ +import numpy as np +import torch + + +class LinearNNModel(torch.nn.Module): + def __init__(self): + super(LinearNNModel, self).__init__() + self.linear = torch.nn.Linear(1, 1) # One in and one out + + def forward(self, x): + y_pred = self.linear(x) + return y_pred + + +def gen_data(): + # Example linear model modified to use y = 2x + # from https://github.com/hunkim/PyTorchZeroToAll + # X training data, y labels + x = torch.arange(1.0, 25.0).view(-1, 1) + y = torch.from_numpy(np.array([val.item() * 2 for val in x]).astype('float32')).view(-1, 1) + return x, y + + +# Define model, loss, and optimizer +model = LinearNNModel() +criterion = torch.nn.MSELoss() +optimizer = torch.optim.SGD(model.parameters(), lr=0.001) + +# Training loop +epochs = 250 +x, y = gen_data() +for _epoch in range(epochs): + # Forward pass: Compute predicted y by passing X to the model + y_pred = model(x) + + # Compute the loss + loss = criterion(y_pred, y) + + # Zero gradients, perform a backward pass, and update the weights. + optimizer.zero_grad() + loss.backward() + optimizer.step() + +m = torch.jit.script(LinearNNModel()) +m.save("tests/torch.pt") diff --git a/ml/requirements.txt b/ml/requirements.txt index 0791074..06b1417 100644 --- a/ml/requirements.txt +++ b/ml/requirements.txt @@ -5,4 +5,5 @@ scikit-learn statsmodels matplotlib sobol-seq -pandas \ No newline at end of file +pandas +joblib diff --git a/ml/xval/tests/xval.t b/ml/xval/tests/xval.t index 0b00766..71ec762 100644 --- a/ml/xval/tests/xval.t +++ b/ml/xval/tests/xval.t @@ -1,3 +1,4 @@ +\S 22 \l ml.q \l util/init.q \l xval/utils.q diff --git a/scripts/link.sh b/scripts/link.sh index 2d6ce87..b169d2b 100644 --- a/scripts/link.sh +++ b/scripts/link.sh @@ -28,3 +28,4 @@ for dir in "${dirs[@]}"; do ln -s "$PWD/$dir" "$QHOME/$dir" echo "Linked $PWD/$dir to $QHOME/$dir" done + diff --git a/scripts/setup.sh b/scripts/setup.sh index 9285054..d5c9762 100644 --- a/scripts/setup.sh +++ b/scripts/setup.sh @@ -4,4 +4,4 @@ cp /home/kx/.theanorc ~ conda activate /home/kx/.conda/envs/kx export QHOME=/home/kx/.conda/envs/kx/q -export QLIC=/home/kx \ No newline at end of file +export QLIC=/home/kx