-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save and load methods generate KeyedVector warnings word2vec and doc2vec #1069
Comments
The issue is partially fixed by #1072 In the mean time we can add an |
Rather than silencing the warnings, via a new warning-enablement state/API, shouldn't the save/load code just not do the warned-against actions? (An API where a SaveLoad object self-reports its eligible properties would make more sense than a hide-warnings workaround.) (Such complications are why my preference, as per comment at #833 (comment) , was to avoid backward-compatibility tricks by keeping the refactor in a parallel class and/or holding it for an API-breaking major release.) |
Current behaviour of SaveLoad API is to save everything except ignored fields. To change it to save what is explicitly given will take a while. For example even calling 'hasattr' on 'model.vocab' raises a warning. The silencing will be removed in the next API breaking release |
My hunch (which could be wrong) is that generically-useful improvements to SaveLoad would fix this with fewer lines of code than this cludgey new warnings-suppression capability. For example, SaveLoad might define a method like:
Its other methods that currently use |
Agree with @gojomo. What do you think, @anmol01gulati? |
Actually, I looked a little bit into this, because I found it surprising The reason they end up being accessed is these two snippets of code - @classmethod def load(cls, *args, **kwargs): Word2Vec.disable_keyed_vectors_warnings() model = super(Word2Vec, cls).load(*args, **kwargs) # update older models if hasattr(model, 'table'): delattr(model, 'table') # discard in favor of cum_table if model.negative and hasattr(model, 'index2word'): model.make_cum_table() # rebuild cum_table from vocabulary ... ... and def save(self, *args, **kwargs): # don't bother storing the cached normalized vectors, recalculable table # TODO: after introducing KeyedVectors now syn0, vocab, id2word are saved TWO times. Once in word2vec and once in keyedvectors # After keyedvectors are deprecated it will be only once Word2Vec.disable_keyed_vectors_warnings() kwargs['ignore'] = kwargs.get('ignore', ['syn0norm', 'table', 'cum_table']) ... ... The first should be easily fixable (by changing to The second one is slightly trickier, because recursive SaveLoads are handled a little unintuitively.
in The best fix, IMO, would be to make the default |
can please explaine more about this problem and where can i but this hasattr(model.wv, 'index2word')). |
This is resolved by deprecating direct access to KeyedVectors in #1147 |
Introduced in #980
CC @anmol01gulati
The text was updated successfully, but these errors were encountered: