Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when calling build_vocab() #1266

Closed
KiddoZhu opened this issue Apr 7, 2017 · 1 comment
Closed

Segmentation Fault when calling build_vocab() #1266

KiddoZhu opened this issue Apr 7, 2017 · 1 comment

Comments

@KiddoZhu
Copy link
Contributor

KiddoZhu commented Apr 7, 2017

I am trying to apply pretrained word2vec weights to doc2vec, but it exits with segmentation fault if I called build_vocab() after reseting the weights. There isn't any traceback printed. Perhaps the runtime error is caused by Cython code.

doc_vectors = Doc2Vec(...)
pretrained = Word2Vec.load("wiki-sg/word2vec.bin"))
super(Doc2Vec, doc_vectors).reset_from(pretrained)
doc_vectors.wv.syn0 = pretrained.syn0
doc_vectors.build_vocab(sentences, update=True)

I have also tried to annotated the line doc_vectors.wv.syn0 = pretrained.syn0, however, it does not help.

@gojomo
Copy link
Collaborator

gojomo commented Apr 8, 2017

This appears to be a duplicate of #1019, so further discussion/investigation/fixing of the crash should happen there.

Note that the update=True incremental vocab-expansion feature has so far only been implemented/tested with a focus on Word2Vec – so could be crashy-buggy if applied to Doc2Vec.

Similarly, reset_from() was intended to borrow properties from a same-type model, so might not work in this fashion (and could also be causing mismatches in state that trigger hard seg-faults in the cython code).

Separately, you may want to look into intersect_word2vec_format() as a different option for mixing pretrained vectors into a model with an existing vocabulary - see the method comment & prior discussion on the project discussion list for more on how it could be used.

@gojomo gojomo closed this as completed Apr 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants