Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for Gensim #3225

Merged
merged 35 commits into from
Sep 4, 2017
Merged
Changes from 4 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4f80272
Add recipe for Gensim
invalid-email-address Jun 30, 2017
1b25b9c
Update meta.yaml
souravsingh Jul 6, 2017
d96feb8
Add additional dependencies
souravsingh Jul 6, 2017
f88123a
Make fixes
souravsingh Jul 6, 2017
321e1bb
Update minimum NumPy and Scipy vers and fix tests
souravsingh Jul 7, 2017
62aef3d
Update meta.yaml
souravsingh Jul 7, 2017
454a9a5
Fix test
souravsingh Jul 14, 2017
b28f0e8
Fix problems
souravsingh Jul 14, 2017
bb4ecc5
Update meta.yaml
souravsingh Jul 17, 2017
7831ca8
Update meta.yaml
souravsingh Jul 17, 2017
22c08c7
Update meta.yaml
souravsingh Jul 17, 2017
74bda87
Small fix
souravsingh Jul 18, 2017
b6a7a3e
Fix test requirements
souravsingh Jul 18, 2017
118e8b4
Update meta.yaml
souravsingh Jul 18, 2017
e47dff8
Remove cython and update the about sections
souravsingh Jul 20, 2017
b38a352
Updates to about section
souravsingh Jul 20, 2017
aa07df0
Update version
souravsingh Aug 3, 2017
85fb030
Skip win32 builds
souravsingh Aug 3, 2017
f6e8a6c
Remove python-annoy from test dependency
souravsingh Aug 3, 2017
31e6c43
Update meta.yaml
souravsingh Aug 4, 2017
df8323e
Add command
souravsingh Aug 7, 2017
6c23113
Update meta.yaml
souravsingh Aug 16, 2017
c71e153
Make updates to recipe
souravsingh Aug 16, 2017
42ac3c5
Update meta.yaml
souravsingh Aug 16, 2017
fe9051d
Update meta.yaml
souravsingh Aug 17, 2017
a5395e8
Add test requirements
souravsingh Aug 17, 2017
6d2164a
Make changes to accomodate nosetests
souravsingh Aug 22, 2017
a43cba4
Pin scikit-learn version
souravsingh Aug 24, 2017
0d7ff05
Remove tensorflow from test requirements
souravsingh Aug 24, 2017
3b6612f
Skip builds entirely for OSX and Windows
souravsingh Aug 24, 2017
61854a0
Fix formatting
souravsingh Aug 24, 2017
4df111d
Remove keras from test requires
souravsingh Aug 26, 2017
45218e8
Test support for win64 platform
souravsingh Aug 26, 2017
1bb5ce8
Update meta.yaml
ocefpaf Sep 1, 2017
53a16a2
add the skip statement
ocefpaf Sep 1, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions recipes/gensim/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
{% set name = "gensim" %}
{% set version = "2.2.0" %}
{% set sha256 = "eb099de1e50447c42e168a1a99de4721923688afc71b12fe522f79687a4fbb13" %}

package:
name: {{ name|lower }}
version: {{ version }}

source:
fn: {{ name }}-{{ version }}.tar.gz
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz
sha256: {{ sha256 }}

build:
number: 0
script: python setup.py install --single-version-externally-managed --record record.txt

requirements:
build:
- python
- setuptools
- numpy x.x
- cython

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? I don't think cython is a dependency of gensim.

- scipy >=0.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use numpy/scipy versions from develop branch

- six >=1.5.0
- smart_open >=1.2.1

run:
- python
- numpy x.x
- cython
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cython is probably not needed as a runtime dependency

- scipy >=0.7.0
- six >=1.5.0
- smart_open >=1.2.1

test:
requires:
- nose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably sufficient initially to have

    imports:
        - gensim
        - gensim.foo

for all the standard importable modules

- testfixtures
- unittest2
- scikit-learn
- tensorflow >=1.1.0
- keras >=2.0.4
- pyemd
- Morfessor==2.0.2a4
- annoy

commands:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you removed test run?

Copy link
Contributor Author

@souravsingh souravsingh Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few reasons for removing the test runs-

  1. Tensorflow doesn't support 32-bit Windows, so running Gensim tests for win-32 on Appveyor will fail. It is possible to skip win-32 builds, but it might defeat the purpose of full support.

  2. The test run times out in Travis( which is used for building recipe for OSX) at Doc2Vec parallel training test due to no output for 10 mins.

Do we want to add the test run in the recipe?

Copy link
Member

@menshikh-iv menshikh-iv Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to run tests for sure. About win32 - we can check win support with x64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing for sklearn_integration

======================================================================
ERROR: testPipeline (gensim.test.test_sklearn_integration.TestSklLdaModelWrapper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/staged-recipes/build_artefacts/gensim_1503413959047/_t_env/lib/python2.7/site-packages/gensim/test/test_sklearn_integration.py", line 153, in testPipeline
    text_lda.fit(corpus, data.target)
  File "/staged-recipes/build_artefacts/gensim_1503413959047/_t_env/lib/python2.7/site-packages/sklearn/pipeline.py", line 257, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/staged-recipes/build_artefacts/gensim_1503413959047/_t_env/lib/python2.7/site-packages/sklearn/pipeline.py", line 226, in _fit
    self.steps[step_idx] = (name, fitted_transformer)
TypeError: 'tuple' object does not support item assignment
-------------------- >> begin captured logging << --------------------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem with sklearn==0.19.0, we already fix it in develop branch. Now you can "hardcode" sklearn to 0.18.2 to avoid this (and remove in next gensim release)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pinning has solved the sklearn tests, but there is a new failure-

======================================================================
ERROR: Test Keras 'Embedding' layer returned by 'get_embedding_layer' function for a smaller version of the 20NewsGroup classification problem.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/site-packages/gensim/test/test_keras_integration.py", line 100, in testEmbeddingLayer20NewsGroup
    data = fetch_20newsgroups(subset='train', categories=['alt.atheism', 'comp.graphics', 'sci.space'])
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/site-packages/sklearn/datasets/twenty_newsgroups.py", line 225, in fetch_20newsgroups
    cache_path=cache_path)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/site-packages/sklearn/datasets/twenty_newsgroups.py", line 91, in download_20newsgroups
    opener = urlopen(URL)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 110] Connection timed out>

- nosetests --exe -v gensim

about:
home: http://github.com/RaRe-Technologies/gensim
license: LGPL 3.0
license_file: COPYING
license_family: LGPL
summary: 'A library for topic modelling and document indexing'
description: |
Copy link

@piskvorky piskvorky Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the official tagline and description of gensim (see https://github.com/RaRe-Technologies/gensim).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The | is some specific formatter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Gensim is a topic modelling, document indexing and similarity retrieval
library for Python 2.7 and Python 3.5+. It is focussed towards Natural
Language Processing and Information retrieval.
doc_url: http://radimrehurek.com/gensim/
dev_url: https://github.com/RaRe-Technologies/gensim

extra:
recipe-maintainers:
- souravsingh
- tmylk
- menshikh-iv