Skip to content

v0.5.0 (April 2018)

Compare
Choose a tag to compare
@andrewheusser andrewheusser released this 18 Apr 20:29
· 169 commits to master since this release

Enhancements:

Plotting and transforming text data

  • hyp.plot now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages.
  • A new vectorizer argument in hyp.plot to specify a text vectorizer. Currently supports CountVectorizer, TfidfVectorizer`, or class instances (fit or unfit) of these models.
  • A new semantic argument in hyp.plot that specifies the semantic model to use to transform text. Current supports LatentDirichletAllocation, NMF, or class instances (fit or unfit) of these models.
  • A new corpus argument in hyp.plot that allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text.
  • Enhanced hyp.format_data function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.

New algorithms

  • A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g. hyp.plot(data, cluster='HDBSCAN')
  • A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g. hyp.plot(data, reduce='UMAP')

New parameters

  • A new size param to resize figure e.g. hyp.plot(data, size=[10,8])
  • A new ax param to add figure to existing axis e.g. hyp.plot(data, ax=ax)

New text examples

  • A new dataset of NIPS papers e.g. hyp.load('nips') (from kaggle)
  • A new dataset of selected wikipedia pages e.g. hyp.load('wiki')
  • A new dataset of State of the Union text from 1989-2017. Can be loaded as hyp.load('sotus') (from kaggle)

API changes
In hyp.plot changed group arg to hue (group will still be supported but depreciated in a coming release).

  • Removed deprecated describe_pca function. Please use more general function, describe.

Bugs fixed

  • When using chemtrails in hyp.plot, the entire timeseries would appear for the first few seconds of an animation and then dissapear.
  • The legend colors did not align with the data when using the fmt or color args.
  • When grouping with group/hue arg, labels were not reshuffled.
  • Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.

NOTE: If you have been using the development version of 0.5.0, please clear your
data cache (/Users/yourusername/hypertools_data).