Releases: KxSystems/ml
4.1.0
ML Registry Functionality: A location for the storage and versioning of ML models on-prem along with a common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data. This allows for enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location.
4.0.0
The release of ML toolkit 4.0 comes with several key changes, enhancements and improvements:
- Unified Codebase: Migrated other components of the ML toolkit (NLP & AutoML) into the same repository for improved code sharing and maintainability.
- PyKX Support: NLP, ML and AutoML will now use PyKX if available, otherwise reverting to embedPy.
- Python Dependency Updates: Added support for python 3.11, and removed several dependency version pins & limits to ensure compatibility and improved performance.
- Enhanced Testing & CI: Improved internal testing and continuous integration systems, ensuring better reliability for future releases. Includes automated Snyk scans for enhanced security.
- Multi-Processing Support Fix: Resolved issues with multi-processing support, providing more robust and efficient parallel processing capabilities.
- Examples Provided: Comprehensive examples and associated sample output reports are now available under examples/. These examples offer practical use cases and demonstrate the new features and improvements.
3.2.0
- Fix to issues relating to unsupported versions of scipy
- Updates to tests no-longer supported by the Python equivalent functions
3.1.0
- Update to FRESH functionality to be more efficient in distributed applications
- Fix to df2tab to handle nulls appropriately in date columns
- Fix to tsPlot functionality
Addition of stats library in tgz releases
Addition of stats library to packaged release (#95) * Addition of stats folder for .tgz releases * length update for FRESH functionality
3.0.1
Addition of stats library for docker image deployment
3.0.0
- Refactor coding/commenting style to be up to date with coding standards
- Addition of stats section. This includes functionality such as
- OLS/WLS fit/predict functionality
- Transfer of percentile/describe function from utility folder to stats folder
- Expansion of the.ml.
describe` function to allow users more flexibility by having a user configurable json file - Change function names to camel case. Any functions that were affected by this change are defined within
functionMapping.json
. These functions are still callable until the next release of the ML Toolkit. If the old versions are called a warning message will be sent to stdout - Scaling and transformation preprocessing functions were amended to now contain a
fit/transform/fitTransform
key. Any functions affected by this changed are defined withinfunctionMapping.json
. These functions are still callable until the next release of the ML Toolkit. If the old versions are called a warning message will be sent to stdout. - All functions containing a
predict/update/transform
key as output, must now takeconfig
as the initial input which is of typedictionary
and has amodelInfo
key - The contents within Freshs'
hyperparam.txt
file were converted to a json filehyperparameters.json
- The utility functions within the clustering library were moved to
clust/utils.q
init.q
can now be loaded before initialization ofml.q
- All README files were updated to reflect that the toolkit is not in its BETA release stages
- Test script was added to check that length of code in files did not exceed 80 chars
filelength.t
- Tests are now run in appveyor/travis by calling
testFiles.bat
. This will be updated when any new test folder is added to the toolkit - All tests were updated to reflect these changes
2.0.0
What’s New:
Time series functionality:
- Addition of time series models implemented in q
- AR, ARMA, ARIMA, SARIMA and ARCH.
- Time series feature engineering techniques (windowed and lagged feature generation.
- Data stationarity testing
Graph/pipeline resources:
- Framework for the development of modularised kdb+ workflows and executable pipeline structures
Optimization:
- Implementation of the Broyden-Fletcher-Goldfarb-Shanno algorithm for function minimization
Grid Search:
- Random and pseudo random (Sobol) number generated parameter set functionality providing an alternative to exhaustive grid search.
Clustering:
- Implementation of k-means clustering now uses early stopping
Updates:
Clustering:
- Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.
Initial release candidate for version 2.0.0 (update)
Additive update, including clustering updates
- Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.
Initial release candidate for version 2.0.0
What’s New:
Time series functionality:
- Addition of time series models implemented in q
- AR, ARMA, ARIMA, SARIMA and ARCH.
- Time series feature engineering techniques (windowed and lagged feature generation.
- Data stationarity testing
Graph/pipeline resources:
- Framework for the development of modularised kdb+ workflows and executable pipeline structures
Optimization:
- Implementation of the Broyden-Fletcher-Goldfarb-Shanno algorithm for function minimization
Grid Search:
- Random and pseudo random (Sobol) number generated parameter set functionality providing an alternative to exhaustive grid search.
Clustering: - Implementation of k-means clustering now uses early stopping
Updates:
Clustering:
- Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.