v0.5.0 (April 2018)
Enhancements:
Plotting and transforming text data
hyp.plot
now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages.- A new
vectorizer
argument inhyp.plot
to specify a text vectorizer. Currently supportsCountVectorizer,
TfidfVectorizer`, or class instances (fit or unfit) of these models. - A new
semantic
argument inhyp.plot
that specifies the semantic model to use to transform text. Current supportsLatentDirichletAllocation
,NMF
, or class instances (fit or unfit) of these models. - A new
corpus
argument inhyp.plot
that allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text. - Enhanced
hyp.format_data
function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.
New algorithms
- A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g.
hyp.plot(data, cluster='HDBSCAN')
- A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g.
hyp.plot(data, reduce='UMAP')
New parameters
- A new
size
param to resize figure e.g.hyp.plot(data, size=[10,8])
- A new
ax
param to add figure to existing axis e.g.hyp.plot(data, ax=ax)
New text examples
- A new dataset of NIPS papers e.g.
hyp.load('nips')
(from kaggle) - A new dataset of selected wikipedia pages e.g.
hyp.load('wiki')
- A new dataset of State of the Union text from 1989-2017. Can be loaded as
hyp.load('sotus')
(from kaggle)
API changes
In hyp.plot
changed group
arg to hue
(group will still be supported but depreciated in a coming release).
- Removed deprecated
describe_pca
function. Please use more general function,describe
.
Bugs fixed
- When using
chemtrails
inhyp.plot
, the entire timeseries would appear for the first few seconds of an animation and then dissapear. - The legend colors did not align with the data when using the
fmt
orcolor
args. - When grouping with group/hue arg, labels were not reshuffled.
- Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.
NOTE: If you have been using the development version of 0.5.0, please clear your
data cache (/Users/yourusername/hypertools_data).