All notable changes to this project will be documented in this file.
- Statistics Functions: Added
safe_percentile
andextract_statistics_dataframe_per_label
methods, including comprehensive tests and documentation. - Correlation Removal: Removed all correlation calculations from
get_correlated_features
,visualize_correlations
, andplot_correlation_dendrogram
. - Added support for Github Actions in tox.
- Feature Importance Plot: Updated to comply with scikit-learn conventions, improve type handling, and sort features by importance in descending order.
- Updated package versions in requirements files.
- API Change: Changed the signature of
preprocess::plot_features_interaction
. - Improved documentation using Claude AI and updated README.md.
- Fixed imports for
plot_correlation_dendrogram
. - Fixed various test issues.
- Fixed documentation inconsistencies.
- Temporarily excluded kaleido from tests.
- minor changes
- GitHub Actions: Implemented as the primary CI/CD infrastructure, replacing TravisCI.
- ROC and Precision-Recall Curves: Introduced new functions to generate ROC and Precision-Recall curves with threshold annotations using Plotly figures.
- Testing Infrastructure: Switched from
nose
topytest-mpl
for testing plots, ensuring better compatibility and features. - Codebase, Tests, Readme, and Documentation: Refactored using Claude 3.5 Sonnet to improve readability, maintainability, and overall quality.
- Coveralls Integration: Restored Coveralls integration to track code coverage and ensure high-quality code.
- xai::generate_decision_paths: Deprecated the method and recommended using
sklearn.tree.export_text
as a more suitable alternative. - minor changes
- Updated Matplotlib version to accommodate the change of xticks now returning a list of a numpy's ndarry.
- Tests
- minor changes
- update packages and supported python version
- minor changes
- xai::plot_features_importance method that visualize into bar chart the feature importance.
- a new module named
unsupervised
was added. The module contains methods that calculate and/or visualize evaluation performance of an unsupervised model. - unsupervised::plot_cluster_cardinality method that plots the number of points per cluster as a bar chart.
- unsupervised::plot_cluster_magnitude method that plots the Total Point-to-Centroid Distance per cluster as a bar chart.
- unsupervised::plot_magnitude_vs_cardinality method plots the cardinality vs. magnitude as a scatter plot.
- unsupervised::plot_loss_vs_cluster_number method that plots the graph which helps to find the optimum parameter
k
for KMeans.
- deprecated xai::draw_tree. Use sklearn.tree.plot_tree instead.
- requirements dependencies.
- minor changes
- code examples to README.md
- visualization_aids module was merged into the preprocess module.
- avoid FutureWarning due to sklearn version upgrade (Pass labels=[1, 0], pos_label=0, average=binary, sample_weight=None as keyword args. From version 0.25 passing these as positional arguments will result in an error).
- fixed docs
- minor changes
- visualization_aids::visualize_feature method that visualize one feature distribution.
- metrics::visualize_accuracy_grouped_by_probability method that visualize accuracy stacked by probability.
- visualization_aids::visualize_features was deprecated.
- Ravel y_train in metrics::plot_metric_growth_per_labeled_instances if the shape is (n_sample, 1) to avoid DataConversionWarning (A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().)
- minor changes
- visualization_aids::plot_features_relationship added support for datetime features.
- visualization_aids::plot_correlation_dendrogram method that plots correlation dendrogram.
- Moved visualization_aids::draw_tree and visualization_aids::draw_dot_data to xai module.
- Redesign README.md file
- Converting bool features in visualization_aids::plot_features_relationship to avoid RuntimeWarning from numpy (Converting input from bool to <class 'numpy.uint8'> for compatibility.)
- minor changes
- visualization_aids::visualize_correlations function that plot heatmap of features' correlations.
- visualization_aids::plot_features_relationship function that plot the shared distribution of two features.
- documentation for the new methods.
- requirements dependencies.
- visualization_aids::generate_decision_paths moved to xai::generate_decision_paths.
- visualization_aids::visualize_features changed parameter name frame to data.
- styling in the readme file.
- documentation fixes.
- tests fixes.
- minor changes
- strings::extract_significant_terms_from_subset function that extract significant terms from a data subset like elasticsearch significant_text aggregation.
- automated testing and code coverage.
- deployment to conda.
- confusion matrix image in read me converted to a github link.
- strings::append_tags_to_frame added parameters max_features, lowercase and tokenizer.
- visualization_aids::plot_metric_growth_per_labeled_instances moved to metrics.
- minor changes
- minor changes
- visualization_aids::plot_metric_growth_per_labeled_instances function that plot given metric change where the amount of labeled instances increase.
- visualization_aids::print_decision_paths can now receives a char for indentation markings.
- metrics::plot_confusion_matrix receives more seaborn parameters for better control over plotting.
- visualization_aids::draw_dot_data function that plot Graphviz's dot data.
- package name renamed to
data_science_utils
. - visualization_aids::print_decision_paths default indent char changed from " " to "\t".
- rewrite README.md
- revamp documentation with
read the docs
theme.
- package description and keywords
- minor changes
- visualization_aids::visualize_features added parameters: features: list of feature to visualize, num_columns: number of columns in the grid, and remove_na: True to ignore NA values when plotting; False otherwise.
- visualization_aids::draw_tree changed signature to matplotlib coding style (see matplotlib Usage Guide).
- all drawing method now return matplotlib.axes.Axes instead of matplotlib.pyplot.Figure.
- Revert import change of sklearn.tree.tree to sklearn.tree due to FutureWarning.
- visualization_aids::print_decision_paths a method that converts decision tree to a python function.
- visualization_aids::draw_tree parameter features_names changed to feature_names.
- visualization_aids::draw_tree parameters feature_names and class_names received default value.
- Changed import of sklearn.tree.tree to sklearn.tree due to FutureWarning.
- added matplotlib testing
- removed metrics::plot_precision_recall and metrics::plot_roc_curve due duplication with Yellowbrick package
- changed metrics::print_confusion_matrix to plot_confusion_matrix which returns a matplotlib figure
- visualization_aids now do not require ipython
- visualization_aids returns matplotlib figure objects
- metrics returns matplotlib figure objects
- docs
- minor changes
- added install_requires, python_requires and license in setup.py
- added changelog
- add version dependencies to requirements
- changed tox virtualenv
- changed classifiers in setup.py
- fix tests for strings
- minor changes
- handle DeprecationWarning when using visualization_aids::draw_tree
- fix tests for strings
- added method for feature visualization as visualization_aids::visualize_features
- updated syntax for dropping index in preprocess::get_correlated_features
- updated documentation for new feature visualization in visualization_aids::visualize_features
- added module preprocess
- documentation for modules metrics, preprocess, strings and visualization_aids
- minor changes to setup.py
- Initial release:
- created the metrics, strings and visualization modules