Doc2Vec Segmentation Fault Windows and Linux #1578

mullenba · 2017-09-08T16:53:14Z

I've tried this basic code on both Linux and Windows. I'm trying to do some online training and it seems like after a couple passes it throws a seg fault.

Code to recreate problem.

from gensim.models.doc2vec import Doc2Vec, LabeledSentence, TaggedDocument


sentences = [('food', 'I like to eat broccoli and bananas.'),
             ('food', 'I ate a banana and spinach smoothie for breakfast.'),
             ('animals', 'Chinchillas and kittens are cute.'),
             ('animals', 'My sister adopted a kitten yesterday.'),
             ('animals', 'Look at this cute hamster munching on a piece of broccoli.')]

convSentences = []
for s in sentences:
    convSentences.append(LabeledSentence(tags=[s[0]], words = s[1].split()))

model = Doc2Vec(size=300, window=8, min_count=1, workers=1)

print("Pass 1:")
model.build_vocab([convSentences[0]])
model.train([convSentences[0]], total_examples=model.corpus_count)

print("Pass 2:")
model.build_vocab([convSentences[1]], update=True)
model.train([convSentences[1]], total_examples=model.corpus_count)

print("Pass 3:")
model.build_vocab([convSentences[2]], update=True)
model.train([convSentences[2]], total_examples=model.corpus_count)

print("Pass 4:")
model.build_vocab([convSentences[3]], update=True)
model.train([convSentences[3]], total_examples=model.corpus_count)

print("Pass 5:")
model.build_vocab([convSentences[4]], update=True)
model.train([convSentences[4]], total_examples=model.corpus_count)

Here's the output running in Windows Idle. Python 3.5.2

Warning (from warnings module):
  File "C:\Python35\lib\site-packages\gensim\utils.py", line 855
    warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
UserWarning: detected Windows; aliasing chunkize to chunkize_serial
Pass 1:
Pass 2:
Pass 3:

Passes 1-3 go quick, then a long pause and Linux throws a segmentation fault, Windows throws an unspecified error.

The text was updated successfully, but these errors were encountered:

gojomo · 2017-09-08T17:32:06Z

Duplicate of #1019 – but this is a very useful minimal triggering case, thank you! I'll be closing this as a duplicate, for further discussion to occur there.

FYI, build_vocab(..., update=True) vocabulary-expansion feature was only developed & tested with respect to Word2Vec – thus this sort of bug when used via inheritance in Doc2Vec.

gojomo mentioned this issue Sep 8, 2017

Segmentation fault using build_vocab(..., update=True) for Doc2Vec #1019

Open

gojomo closed this as completed Sep 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc2Vec Segmentation Fault Windows and Linux #1578

Doc2Vec Segmentation Fault Windows and Linux #1578

mullenba commented Sep 8, 2017

gojomo commented Sep 8, 2017

Doc2Vec Segmentation Fault Windows and Linux #1578

Doc2Vec Segmentation Fault Windows and Linux #1578

Comments

mullenba commented Sep 8, 2017

gojomo commented Sep 8, 2017