Save and load methods generate KeyedVector warnings word2vec and doc2vec #1069

tmylk · 2017-01-03T12:45:54Z

Introduced in #980
CC @anmol01gulati

model.save(temp_path)
WARNING:gensim.models.word2vec:direct access to syn0norm will not be supported in future gensim releases, please use model.wv.syn0norm
WARNING:gensim.models.word2vec:direct access to syn0norm will not be supported in future gensim releases, please use model.wv.syn0norm
WARNING:gensim.models.word2vec:direct access to syn0norm will not be supported in future gensim releases, please use model.wv.syn0norm
WARNING:gensim.models.word2vec:direct access to syn0norm will not be supported in future gensim releases, please use model.wv.syn0norm

new_model = gensim.models.Word2Vec.load(temp_path)  # open the model
WARNING:gensim.models.word2vec:direct access to syn0norm will not be supported in future gensim releases, please use model.wv.syn0norm
WARNING:gensim.models.word2vec:direct access to index2word will not be supported in future gensim releases, please use model.wv.index2word

The text was updated successfully, but these errors were encountered:

tmylk · 2017-01-04T01:14:57Z

The issue is partially fixed by #1072
There is still a problem that after introducing KeyedVectors now syn0, vocab, id2word are saved TWO times. Once in word2vec and once in keyedvectors. After direct access is deprecated it will be only once.

In the mean time we can add an ignore_sub parameter to utils.save_specials where it handles recursive saves. That way syn0norm and syn0 will be ignored from word2vec pickle but saved in KeyedVectors pickle.

gojomo · 2017-01-04T21:14:01Z

Rather than silencing the warnings, via a new warning-enablement state/API, shouldn't the save/load code just not do the warned-against actions? (An API where a SaveLoad object self-reports its eligible properties would make more sense than a hide-warnings workaround.)

(Such complications are why my preference, as per comment at #833 (comment) , was to avoid backward-compatibility tricks by keeping the refactor in a parallel class and/or holding it for an API-breaking major release.)

tmylk · 2017-01-04T21:22:11Z

Current behaviour of SaveLoad API is to save everything except ignored fields. To change it to save what is explicitly given will take a while. For example even calling 'hasattr' on 'model.vocab' raises a warning. The silencing will be removed in the next API breaking release

gojomo · 2017-01-05T22:23:27Z

My hunch (which could be wrong) is that generically-useful improvements to SaveLoad would fix this with fewer lines of code than this cludgey new warnings-suppression capability. For example, SaveLoad might define a method like:

def savables(self): 
    return iteritems(self.__dict__)

Its other methods that currently use iteritems(self.__dict__) directly would use this instead. And a subclass (like Word2Vec) that has local knowledge that some of its (faked) attributes shouldn't really be included in SaveLoad's list-of-savables (because it will result in wasteful double-saving or unhelpful warnings) would override savables() to prevent the wasteful activity, not just (temporarily) hide it.

tmylk · 2017-01-06T21:58:36Z

Agree with @gojomo. What do you think, @anmol01gulati?

jayantj · 2017-01-10T07:17:05Z

Actually, I looked a little bit into this, because I found it surprising syn0norm and index2word are still accessed directly from Word2Vec. Even if Word2Vec saves all attributes except the ones in ignore, iteritems(self.__dict__) doesn't even return syn0norm and index2word.

The reason they end up being accessed is these two snippets of code -

   @classmethod
    def load(cls, *args, **kwargs):
        Word2Vec.disable_keyed_vectors_warnings()
        model = super(Word2Vec, cls).load(*args, **kwargs)
        # update older models
        if hasattr(model, 'table'):
            delattr(model, 'table')  # discard in favor of cum_table
        if model.negative and hasattr(model, 'index2word'):
            model.make_cum_table()  # rebuild cum_table from vocabulary
        ...
        ...

and

    def save(self, *args, **kwargs):
        # don't bother storing the cached normalized vectors, recalculable table
        # TODO: after introducing KeyedVectors now syn0, vocab, id2word are saved TWO times. Once in word2vec and once in keyedvectors
        #       After keyedvectors are deprecated it will be only once
        Word2Vec.disable_keyed_vectors_warnings()
        kwargs['ignore'] = kwargs.get('ignore', ['syn0norm', 'table', 'cum_table'])
        ...
        ...

The first should be easily fixable (by changing to hasattr(model.wv, 'index2word')).

The second one is slightly trickier, because recursive SaveLoads are handled a little unintuitively.
If I simply remove syn0norm from this line -

kwargs['ignore'] = kwargs.get('ignore', ['syn0norm', 'table', 'cum_table'])

in Word2Vec, and add it to KeyedVecs instead, it doesn't work as intended, because Word2Vec never calls the save method of KeyedVecs while saving KeyedVecs. It only calls an internal method _save_specials.

The best fix, IMO, would be to make the default ignore fields for both KeyedVecs and Word2Vec a class variable instead of the current hack inside save, and use that in either @gojomo 's proposed savables method, or in the current handling of ignore in _save_specials.

somah1411 · 2017-02-17T09:04:05Z

can please explaine more about this problem and where can i but this hasattr(model.wv, 'index2word')).

tmylk · 2017-02-17T14:08:00Z

This is resolved by deprecating direct access to KeyedVectors in #1147

tmylk added the bug Issue described a bug label Jan 3, 2017

tmylk closed this as completed Feb 17, 2017

tmylk mentioned this issue Mar 12, 2017

Remove direct access to properties moved to KeyedVectors #1147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save and load methods generate KeyedVector warnings word2vec and doc2vec #1069

Save and load methods generate KeyedVector warnings word2vec and doc2vec #1069

tmylk commented Jan 3, 2017 •

edited

Loading

tmylk commented Jan 4, 2017

gojomo commented Jan 4, 2017

tmylk commented Jan 4, 2017

gojomo commented Jan 5, 2017 •

edited

Loading

tmylk commented Jan 6, 2017

jayantj commented Jan 10, 2017 •

edited

Loading

somah1411 commented Feb 17, 2017

tmylk commented Feb 17, 2017

Save and load methods generate KeyedVector warnings word2vec and doc2vec #1069

Save and load methods generate KeyedVector warnings word2vec and doc2vec #1069

Comments

tmylk commented Jan 3, 2017 • edited Loading

tmylk commented Jan 4, 2017

gojomo commented Jan 4, 2017

tmylk commented Jan 4, 2017

gojomo commented Jan 5, 2017 • edited Loading

tmylk commented Jan 6, 2017

jayantj commented Jan 10, 2017 • edited Loading

somah1411 commented Feb 17, 2017

tmylk commented Feb 17, 2017

tmylk commented Jan 3, 2017 •

edited

Loading

gojomo commented Jan 5, 2017 •

edited

Loading

jayantj commented Jan 10, 2017 •

edited

Loading