Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load word2vec exception #1170

Merged
merged 9 commits into from
Feb 24, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

"""
Deep learning via the distributed memory and distributed bag of words models from
[1]_, using either hierarchical softmax or negative sampling [2]_ [3]_.
[1]_, using either hierarchical softmax or negative sampling [2]_ [3]_. See [tutorial]_

**Make sure you have a C compiler before installing gensim, to use optimized (compiled)
doc2vec training** (70x speedup [blog]_).
Expand All @@ -34,7 +34,9 @@
.. [3] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality.
In Proceedings of NIPS, 2013.
.. [blog] Optimizing word2vec in gensim, http://radimrehurek.com/2013/09/word2vec-in-python-part-two-optimizing/
.. [tutorial] Doc2vec in gensim tutorial, http://radimrehurek.com/2013/09/word2vec-in-python-part-two-optimizing/

.. [tutorial] Doc2vec in gensim tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb



"""
Expand Down Expand Up @@ -617,7 +619,6 @@ def __init__(self, documents=None, dm_mean=None,
null_word=dm_concat, **kwargs)

self.load = call_on_class_only
self.load_word2vec_format = call_on_class_only

if dm_mean is not None:
self.cbow_mean = dm_mean
Expand Down
13 changes: 7 additions & 6 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@

The word vectors can also be instantiated from an existing file on disk in the word2vec C format as a KeyedVectors instance::

NOTE: It is impossible to continue training the vectors loaded from the C format because the binary tree is missing.

NOTE: It is impossible to continue training the vectors loaded from the C format because hidden weights, vocabulary frequency and the binary tree is missing.


>>> from gensim.models.keyedvectors import KeyedVectors
>>> word_vectors = KeyedVectors.load_word2vec_format('/tmp/vectors.txt', binary=False) # C text format
Expand Down Expand Up @@ -74,18 +76,18 @@

and so on.

If you're finished training a model (=no more updates, only querying), then switch to the :mod:`gensim.models.KeyedVectors` instance in wv

If you're finished training a model (=no more updates, only querying), you can do

>>> model.delete_temporary_training_data(replace_word_vectors_with_normalized=True)
>>> word_vectors = model.wv
>>> del model

to trim unneeded model memory = use (much) less RAM.

Note that there is a :mod:`gensim.models.phrases` module which lets you automatically
detect phrases longer than one word. Using phrases, you can learn a word2vec model
where "words" are actually multiword expressions, such as `new_york_times` or `financial_crisis`:

>>> bigram_transformer = gensim.models.Phraser(gensim.models.Phrases(sentences))
>>> bigram_transformer = gensim.models.Phrases(sentences)
>>> model = Word2Vec(bigram_transformer[sentences], size=100, ...)

.. [1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
Expand Down Expand Up @@ -433,7 +435,6 @@ def __init__(
"""

self.load = call_on_class_only
self.load_word2vec_format = call_on_class_only

if FAST_VERSION == -1:
logger.warning('Slow version of {0} is being used'.format(__name__))
Expand Down