Releases: epigen/unsupervised_analysis
Releases · epigen/unsupervised_analysis
v3.0.1 - Enable module usage using `github()` directive
- to enable module usage using
github()
directive- comment
global.yaml
(now requires full snakemake installation, not minimal)
- comment
- add nodefaults to all env YAML
Full Changelog: v3.0.0...v3.0.1
v3.0.0 - Snakemake 8 compatible
Breaking change: Requires Snakemake >= v8.
Full Changelog: v2.0.0...v3.0.0
v2.0.0 - Performance improvements
Enhancements and new features
- PCA: To improve performance
n_components
andsvd_solver
can be configured. - Heatmap: performance improvements
- distance matrix calculation done by pdist from scipy and parallelized for observations and features
- hierarchical clustering using fastcluster
- observations can be downsampled using configuration
n_observations
- top features can be selected by variability using configuration
n_features
The documentation was updated accordingly.
Bug fixes and other performance improvements are not mentioned.
Full Changelog: v1.1.0...v2.0.0
v1.1.0 - small enhancements and bug fixes
Enhancements and new features
- Additional PCA diagnostics: Visualization of the top 10 loadings per principal component using lollipop plots.
- Internal cluster index calculation optional (very compute intensive).
- Enable plotting of all features using the keyword "ALL".
- Enhance Snakemake report using labels.
- Switch from panels to solo plots.
- Switch to data.table usage for accelerated read/write in R.
The documentation was updated accordingly.
Bug fixes and performance improvements are not mentioned.
Full Changelog: v1.0.1...v1.1.0
v1.0.1 - update author ORCID
Full Changelog: v1.0.0...v1.0.1
v1.0.0 - unsupervised analysis now includes cluster analysis methods
enhancements
- added a config flag for 2D plot coord_fixed() option
new features
- Clustering
- Leiden algorithm
- Clustification: an ML-based clustering approach that iteratively merges clusters based on misclassification
- Clustree analysis and visualization
- Cluster Validation
- External cluster indices are determined by comparing all clustering results with all categorical metadata
- Internal cluster indices are determined for each clustering and [metadata_of_interest]
- Multiple-criteria decision-making (MCDM) using TOPSIS for ranking clustering results by internal indices
- Visualization
- all clustering results as 2D and interactive 2D & 3D plots for all available embedings/projections.
- external cluster indices as hierarchically clustered heatmaps, aggregated in one panel.
- internal cluster indices as one heatmap with clusterings and selected metadata sorted by TOPSIS ranking from top to bottom and split cluster indices split by type (cost/benefit functions to be minimized/maximized).
documentation
- add scRNA-seq analysis section to the documentation
- update the documentation accordingly (Software, Methods, Features, Examples)
- update report to include all new feature outputs
- update rulegraph
Bug fixes and performance improvements are not mentioned.
Full Changelog: v0.2.0...v1.0.0
v0.2.0 - enhancements, new features and a full example added
enhancements
- 2D metadata plots: up to 10 columns per row, coordinates are fixed on both axes, numeric color scheme blue to red with midpoint 0 in grey
new features
- 2D feature plots: specify features of interest, which values from the data, will be highlighted in the 2D plots (motivated by bioinformatics highlighting expression levels of marker genes)
- densMAP support: local density preserving regularization as an additional dimensionality reduction method
- additional PCA diagnostics:
- pairs: sequential pair-wise PCs for up to 10 PCs using scatter- and density-plots colored by metadata_of_interest
- loadings: showing the magnitude and direction of the 10 most influential features for each PC combination
- interactive 2D and 3D visualizations (self-contained HTML files) of all projections and embeddings including widgets to color by categorical and numerical metadata, respectively
- hierarchically clustered heatmaps of scaled data (z-score) with configured distance metrics and clustering methods (all combinations are computed), and annotated with metadata_of_interest
documentation
- add a minimal example, using the digits dataset from sklearn, to show configuration, results, and report (.test/ folder)
- update the documentation accordingly (Software, Methods, Features, Examples)
- update report to include all new feature outputs (apart from interactive plots)
- update rulegraph
Bug fixes and performance improvements are not mentioned.
Full Changelog: v0.1.0...v0.2.0
v0.1.0 - first stable version with PCA, UMAP and 2D visualizations
skip empty metadata columns in 2D plots