Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc2Vec Segmentation Fault Windows and Linux #1578

Closed
mullenba opened this issue Sep 8, 2017 · 1 comment
Closed

Doc2Vec Segmentation Fault Windows and Linux #1578

mullenba opened this issue Sep 8, 2017 · 1 comment

Comments

@mullenba
Copy link

mullenba commented Sep 8, 2017

I've tried this basic code on both Linux and Windows. I'm trying to do some online training and it seems like after a couple passes it throws a seg fault.

Code to recreate problem.

from gensim.models.doc2vec import Doc2Vec, LabeledSentence, TaggedDocument


sentences = [('food', 'I like to eat broccoli and bananas.'),
             ('food', 'I ate a banana and spinach smoothie for breakfast.'),
             ('animals', 'Chinchillas and kittens are cute.'),
             ('animals', 'My sister adopted a kitten yesterday.'),
             ('animals', 'Look at this cute hamster munching on a piece of broccoli.')]

convSentences = []
for s in sentences:
    convSentences.append(LabeledSentence(tags=[s[0]], words = s[1].split()))

model = Doc2Vec(size=300, window=8, min_count=1, workers=1)

print("Pass 1:")
model.build_vocab([convSentences[0]])
model.train([convSentences[0]], total_examples=model.corpus_count)

print("Pass 2:")
model.build_vocab([convSentences[1]], update=True)
model.train([convSentences[1]], total_examples=model.corpus_count)

print("Pass 3:")
model.build_vocab([convSentences[2]], update=True)
model.train([convSentences[2]], total_examples=model.corpus_count)

print("Pass 4:")
model.build_vocab([convSentences[3]], update=True)
model.train([convSentences[3]], total_examples=model.corpus_count)

print("Pass 5:")
model.build_vocab([convSentences[4]], update=True)
model.train([convSentences[4]], total_examples=model.corpus_count)

Here's the output running in Windows Idle. Python 3.5.2

Warning (from warnings module):
  File "C:\Python35\lib\site-packages\gensim\utils.py", line 855
    warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
UserWarning: detected Windows; aliasing chunkize to chunkize_serial
Pass 1:
Pass 2:
Pass 3:

Passes 1-3 go quick, then a long pause and Linux throws a segmentation fault, Windows throws an unspecified error.

@gojomo
Copy link
Collaborator

gojomo commented Sep 8, 2017

Duplicate of #1019 – but this is a very useful minimal triggering case, thank you! I'll be closing this as a duplicate, for further discussion to occur there.

FYI, build_vocab(..., update=True) vocabulary-expansion feature was only developed & tested with respect to Word2Vec – thus this sort of bug when used via inheritance in Doc2Vec.

@gojomo gojomo closed this as completed Sep 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants