Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent versions of smart-open (1.11.0 and 1.11.1) break gensim in Python 2.7 #2786

Closed
brukau opened this issue Apr 9, 2020 · 19 comments
Closed
Assignees
Labels
bug Issue described a bug impact HIGH Show-stopper for affected users reach HIGH Affects most or all Gensim users
Milestone

Comments

@brukau
Copy link
Contributor

brukau commented Apr 9, 2020

Problem description

What are you trying to achieve? What is the expected result? What are you seeing instead?

Using gensim with Python2.7

Steps/code/corpus to reproduce

pip install gensim
...
python
>>> import gensim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/__init__.py", line 5, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils  # noqa:F401
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/parsing/__init__.py", line 4, in <module>
    from .preprocessing import (remove_stopwords, strip_punctuation, strip_punctuation2,  # noqa:F401
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/parsing/preprocessing.py", line 42, in <module>
    from gensim import utils
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/utils.py", line 45, in <module>
    from smart_open import open
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/smart_open/__init__.py", line 28, in <module>
    from .smart_open_lib import open, parse_uri, smart_open, register_compressor
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 23, in <module>
    import pathlib
ImportError: No module named pathlib
>>> 

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

Linux-5.3.0-45-generic-x86_64-with-Ubuntu-19.10-eoan
('Python', '2.7.17 (default, Nov 7 2019, 10:07:09) \n[GCC 9.2.1 20191008]')
('NumPy', '1.16.1')
('SciPy', '1.2.3')
gensim import fails

@brukau brukau changed the title Recent version of smart_open breaks gensim in Python 2.7 Recent version of smart-open (1.11.1) breaks gensim in Python 2.7 Apr 9, 2020
@brukau brukau changed the title Recent version of smart-open (1.11.1) breaks gensim in Python 2.7 Recent versions of smart-open (1.11.0 and 1.11.1) breaks gensim in Python 2.7 Apr 9, 2020
@brukau brukau changed the title Recent versions of smart-open (1.11.0 and 1.11.1) breaks gensim in Python 2.7 Recent versions of smart-open (1.11.0 and 1.11.1) break gensim in Python 2.7 Apr 9, 2020
@piskvorky
Copy link
Owner

piskvorky commented Apr 9, 2020

@mpenkov we'll have to pin an older version of smart_open for Gensim. Gensim supports 2.7, and 2.7 is still widely used in the industry, but the latest smart_open releases dropped 2.7 support.

@brukau @bavaria95 why do you use Python2.7?

@piskvorky
Copy link
Owner

@mpenkov I also wonder how come CI didn't catch this? A critical blocking issue (for 2.7 users).

@piskvorky piskvorky added bug Issue described a bug reach HIGH Affects most or all Gensim users impact HIGH Show-stopper for affected users labels Apr 9, 2020
@brukau
Copy link
Contributor Author

brukau commented Apr 9, 2020

@piskvorky we are moving to Py3 only, but it seems not fast enough

@piskvorky
Copy link
Owner

@brukau what is not fast enough?

@bavaria95
Copy link

@piskvorky our migration to py3 :)

@piskvorky
Copy link
Owner

piskvorky commented Apr 9, 2020

Alright :)

I see two options:

  1. We drop 2.7 from Gensim, and then this becomes a non-issue. Of course that won't help you right now, until you finish your migration. We plan to drop 2.7 from Gensim anyway in the next release.
  2. We make a quick release that pins an older version of smart_open, one that still supported py2.7.

As a quick fix for yourself now, please install an older version of smart_open manually. The README says smart_open==1.10.0 works with py2.7.

@brukau
Copy link
Contributor Author

brukau commented Apr 9, 2020

For us it is not big deal, it can be workarounded easily

I am fine with both options, I am not sure you will not have more people complaining soon, but these days it can be fair to drop Py2 support

also on PyPi Gensim still mention 2.7 Compatibility

@piskvorky
Copy link
Owner

Yes, dropping 2.7 from smart_open caused some unforeseen issues. In Gensim and perhaps other projects too.

Hopefully nothing critical as py2.7 is becoming outdated by the day.

@piskvorky
Copy link
Owner

Another "failed install" report: https://groups.google.com/forum/#!topic/gensim/uGQnpQhx4hI

@gojomo
Copy link
Collaborator

gojomo commented Apr 9, 2020

Shouldn't smart_open 1.11.0 and above declare something that generates an error at install time, at least, if installed to a Python version it doesn't support? It seems like python_requires might do the trick: https://packaging.python.org/guides/distributing-packages-using-setuptools/#python-requires

(This might not have saved gensim users, but it'd have created a more interpretable error.)

Further, dropping support for Python 2.7 seems large enough that it deserves an increment of the first, "major" release number, rather than just a "1.10.x" to "1.11.0" bump. I'd suggest smart_open just skip to major-version 3.0.0 - so the messaging is simple, "smart_open 3.0 needs Python 3".

@piskvorky
Copy link
Owner

I agree with both. The 2.7 drop was way too subtle. CC @mpenkov .

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 12, 2020

I've released 3.8.2. It contains the necessary pin:

(gensim-py2) sergeyich:~ misha$ pip show gensim
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Name: gensim
Version: 3.8.2
Summary: Python framework for fast Vector Space Modelling
Home-page: http://radimrehurek.com/gensim
Author: Radim Rehurek
Author-email: [email protected]
License: LGPLv2.1
Location: /Users/misha/gensim-py2/lib/python2.7/site-packages
Requires: smart-open, scipy, numpy, six
Required-by:
(gensim-py2) sergeyich:~ misha$ pip show smart-open
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Name: smart-open
Version: 1.10.0
Summary: Utils for streaming large files (S3, HDFS, GCS, gzip, bz2...)
Home-page: https://github.com/piskvorky/smart_open
Author: Radim Rehurek
Author-email: [email protected]
License: MIT
Location: /Users/misha/gensim-py2/lib/python2.7/site-packages
Requires: bz2file, requests, google-cloud-storage, boto3
Required-by: gensim
(gensim-py2) sergeyich:~ misha$ python -c 'import gensim'
(gensim-py2) sergeyich:~ misha$

Big thank you to @menshikh-iv for getting the CI builds to complete.

@piskvorky
Copy link
Owner

piskvorky commented Apr 12, 2020

I'm getting this error for 3.8.2, on a fresh empty py2.7 environment:

Screen Shot 2020-04-12 at 10 20 44

But pip install gensim==3.8.1 works fine:

Screen Shot 2020-04-12 at 10 24 05

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 12, 2020

That's odd. Which platform @piskvorky ?

I cannot reproduce in clean environments on MacOS and Ubuntu.

Also, are you sure your network isn't being possessed? The SSL errors and the error message

Could not find index page for numpy

seem particularly suspicious.

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Apr 12, 2020

Hm, interesting, it works for me on ubuntu 19.10 too

(testp27) ivan@P50:~$ python --version
Python 2.7.17
(testp27) ivan@P50:~$ pip --version
pip 20.0.2 from /home/ivan/.virtualenvs/testp27/local/lib/python2.7/site-packages/pip (python 2.7)
(testp27) ivan@P50:~$ pip install gensim
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Collecting gensim
  Downloading gensim-3.8.2.tar.gz (23.4 MB)
     |████████████████████████████████| 23.4 MB 5.0 MB/s 
Collecting numpy<=1.16.1,>=1.11.3
  Using cached numpy-1.16.1-cp27-cp27mu-manylinux1_x86_64.whl (17.0 MB)
Collecting scipy>=1.0.0
  Downloading scipy-1.2.3-cp27-cp27mu-manylinux1_x86_64.whl (24.8 MB)
     |████████████████████████████████| 24.8 MB 950 bytes/s 
Collecting six>=1.5.0
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Collecting smart_open<1.11,>=1.8.1
  Using cached smart_open-1.10.0.tar.gz (99 kB)
Collecting requests
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting boto3
  Downloading boto3-1.12.39-py2.py3-none-any.whl (128 kB)
     |████████████████████████████████| 128 kB 67.1 MB/s 
Collecting google-cloud-storage
  Downloading google_cloud_storage-1.27.0-py2.py3-none-any.whl (79 kB)
     |████████████████████████████████| 79 kB 14.6 MB/s 
Processing ./.cache/pip/wheels/70/27/4c/cd6a1b48a925dd8bb3640fe6948d2b7cbf88ef0858d5a84f59/bz2file-0.98-py2-none-any.whl
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting certifi>=2017.4.17
  Downloading certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
     |████████████████████████████████| 157 kB 35.2 MB/s 
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting idna<3,>=2.5
  Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Collecting jmespath<1.0.0,>=0.7.1
  Using cached jmespath-0.9.5-py2.py3-none-any.whl (24 kB)
Collecting botocore<1.16.0,>=1.15.39
  Using cached botocore-1.15.39-py2.py3-none-any.whl (6.1 MB)
Collecting s3transfer<0.4.0,>=0.3.0
  Using cached s3transfer-0.3.3-py2.py3-none-any.whl (69 kB)
Collecting google-auth<2.0dev,>=1.11.0
  Downloading google_auth-1.13.1-py2.py3-none-any.whl (87 kB)
     |████████████████████████████████| 87 kB 12.0 MB/s 
Collecting google-resumable-media<0.6dev,>=0.5.0
  Using cached google_resumable_media-0.5.0-py2.py3-none-any.whl (38 kB)
Collecting google-cloud-core<2.0dev,>=1.2.0
  Using cached google_cloud_core-1.3.0-py2.py3-none-any.whl (26 kB)
Collecting python-dateutil<3.0.0,>=2.1
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting docutils<0.16,>=0.10
  Using cached docutils-0.15.2-py2-none-any.whl (548 kB)
Collecting futures<4.0.0,>=2.2.0; python_version == "2.7"
  Using cached futures-3.3.0-py2-none-any.whl (16 kB)
Collecting cachetools<5.0,>=2.0.0
  Using cached cachetools-3.1.1-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: setuptools>=40.3.0 in ./.virtualenvs/testp27/lib/python2.7/site-packages (from google-auth<2.0dev,>=1.11.0->google-cloud-storage->smart_open<1.11,>=1.8.1->gensim) (44.1.0)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting rsa<4.1,>=3.1.4
  Using cached rsa-4.0-py2.py3-none-any.whl (38 kB)
Collecting google-api-core<2.0.0dev,>=1.16.0
  Using cached google_api_core-1.16.0-py2.py3-none-any.whl (70 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting pytz
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Processing ./.cache/pip/wheels/56/af/44/f0c28e985bc224ffb90612f7cdeef432ba4fbd5d15485ab271/googleapis_common_protos-1.51.0-py2-none-any.whl
Collecting protobuf>=3.4.0
  Using cached protobuf-3.11.3-cp27-cp27mu-manylinux1_x86_64.whl (1.3 MB)
Building wheels for collected packages: gensim, smart-open
  Building wheel for gensim (setup.py) ... done
  Created wheel for gensim: filename=gensim-3.8.2-cp27-cp27mu-linux_x86_64.whl size=25041185 sha256=fbd6e810d34d3deeaacb2c30fd2505ca9294ee24ebf23b8d830a620e16237bf7
  Stored in directory: /home/ivan/.cache/pip/wheels/71/8e/8b/571604ba1f56fc578d240a1e7b9fd4bd00cdd8be46f83b01c5
  Building wheel for smart-open (setup.py) ... done
  Created wheel for smart-open: filename=smart_open-1.10.0-py2-none-any.whl size=90638 sha256=2407c32ef3431ccd7cfa5cb32a3971ff349aae0b8e433591d25140a6216404ce
  Stored in directory: /home/ivan/.cache/pip/wheels/ae/26/13/8172396f596bae35773e720fe117b708c7666f705c19b5ba90
Successfully built gensim smart-open
Installing collected packages: numpy, scipy, six, urllib3, certifi, chardet, idna, requests, jmespath, python-dateutil, docutils, botocore, futures, s3transfer, boto3, cachetools, pyasn1, pyasn1-modules, rsa, google-auth, google-resumable-media, pytz, protobuf, googleapis-common-protos, google-api-core, google-cloud-core, google-cloud-storage, bz2file, smart-open, gensim
Successfully installed boto3-1.12.39 botocore-1.15.39 bz2file-0.98 cachetools-3.1.1 certifi-2020.4.5.1 chardet-3.0.4 docutils-0.15.2 futures-3.3.0 gensim-3.8.2 google-api-core-1.16.0 google-auth-1.13.1 google-cloud-core-1.3.0 google-cloud-storage-1.27.0 google-resumable-media-0.5.0 googleapis-common-protos-1.51.0 idna-2.9 jmespath-0.9.5 numpy-1.16.1 protobuf-3.11.3 pyasn1-0.4.8 pyasn1-modules-0.2.8 python-dateutil-2.8.1 pytz-2019.3 requests-2.23.0 rsa-4.0 s3transfer-0.3.3 scipy-1.2.3 six-1.14.0 smart-open-1.10.0 urllib3-1.25.8

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Apr 12, 2020

I'm worried about this lines
image

old libssl issue (typical problem on OSX)?

@piskvorky
Copy link
Owner

piskvorky commented Apr 12, 2020

This is MBP El Capitan 10.11.6, the same one I've had for years.

Also, are you sure your network isn't being possessed?

:D Possibly – I created another fresh virtual env, and the exact same command went through now:

Screen Shot 2020-04-12 at 10 45 31

So, disregard… I guess. Hopefully just some weird local network/caching issue. Packets eaten by virus.

@gojomo
Copy link
Collaborator

gojomo commented Apr 13, 2020

Speaking of dependency versions & looking at these output examples:

The version constraint numpy<=1.16.1,>=1.11.3 looks fishy to me - staying on a 15-month-old version (numpy-1.16.1) of such an intensely-maintained and often-improved library as numpyseems unwise.

But, I don't see where in our source this is declared. (Maybe it's a side-effect of another dependency?)

If the aim is to pick the latest numpy supporting Python-2.7, that appears to be numpy-1.16.6 dated 2019-12-29.

@mpenkov
Copy link
Collaborator

mpenkov commented May 1, 2020

@gojomo Created #2818 to deal with the numpy issue outside of this PR.

The original problem is solved (we pinned the smart_open version), so I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug impact HIGH Show-stopper for affected users reach HIGH Affects most or all Gensim users
Projects
None yet
Development

No branches or pull requests

6 participants