Multicore NULL bug #376

ghost · 2015-07-01T00:21:12Z

ubuntu@ip-172-31-33-28:~$ python lda_en.py
2015-07-01 00:13:06,021 : INFO : initializing corpus reader from <bz2.BZ2File object at 0x7fd29bf1eb90>
2015-07-01 00:13:06,052 : INFO : accepted corpus with 3831719 documents, 100000 features, 595701551 non-zero entries
MmCorpus(3831719 documents, 100000 features, 595701551 non-zero entries)
2015-07-01 00:13:06,059 : INFO : using symmetric alpha at 0.0001
2015-07-01 00:13:06,060 : INFO : using serial LDA version on this node
2015-07-01 00:16:27,970 : INFO : running online LDA training, 10000 topics, 1 passes over the supplied corpus of 3831719 documents, updating every 3
2000 documents, evaluating every ~320000 documents, iterating 50x with a convergence threshold of 0.001000
2015-07-01 00:16:27,976 : INFO : training LDA model using 32 processes
2015-07-01 00:16:33,442 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #1000/3831719, outstanding queue size 1
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 264, in _feed
send(obj)
SystemError: NULL result without error in PyObject_Call
2015-07-01 00:16:40,488 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #2000/3831719, outstanding queue size 2
2015-07-01 00:16:45,457 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #3000/3831719, outstanding queue size 3
2015-07-01 00:16:50,220 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #4000/3831719, outstanding queue size 4

ghost · 2015-07-01T03:12:42Z

This is related to this bug: https://bugs.python.org/issue17560

ghost · 2015-07-01T03:17:25Z

Another version of this error:
mingus@lsa:~/Projects/MBTI> python lda_en.py
2015-06-30 21:13:34,692 : INFO : loaded corpus index from wiki_en_tfidf.mm.index.bz2
2015-06-30 21:13:34,692 : INFO : initializing corpus reader from wiki_en_tfidf.mm.bz2
2015-06-30 21:13:34,719 : INFO : accepted corpus with 3831719 documents, 100000 features, 595701551 non-zero entries
2015-06-30 21:13:34,726 : INFO : using symmetric alpha at 0.0004
2015-06-30 21:13:34,726 : INFO : using serial LDA version on this node
2015-06-30 21:14:20,875 : INFO : running online LDA training, 2500 topics, 1 passes over the supplied corpus of 3831719 documents, updating every 2000 documents, evaluating every ~20000 documents, iterating 50x with a convergence threshold of 0.001000
2015-06-30 21:14:20,906 : INFO : training LDA model using 2 processes
2015-06-30 21:14:29,270 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #1000/3831719, outstanding queue size 1
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 266, in _feed
send(obj)
IOError: bad message length
2015-06-30 21:14:38,933 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #2000/3831719, outstanding queue size 2
2015-06-30 21:14:47,038 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #3000/3831719, outstanding queue size 3
2015-06-30 21:14:54,713 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #4000/3831719, outstanding queue size 4

piskvorky · 2015-07-01T09:17:05Z

Yes, I remember this error. It's a bug/limitation in CPython.

It's been reported here for gensim, but there's no (easy) workaround for now :(

ghost · 2015-07-02T05:10:02Z

I tried the SO monkeypatch. It results in this error:

https://gist.github.com/brianmingus/c58533bc690516a600f6

This error is not easily fixable, because it's a bug in C code:

https://github.com/python/cpython/blob/3.4/Modules/posixmodule.c#L8048-L8065

ghost · 2015-07-02T05:29:47Z

I filed a bug for this issue: http://bugs.python.org/issue24550

ghost · 2015-07-02T07:06:30Z

If you hack multiprocessing in Python 3.6 then ldamulticore works!

ghost · 2015-07-02T07:47:45Z

It's working for 5k topics but at 10k a new bug emerges:

2015-07-02 07:36:40,642 : INFO : initializing corpus reader from wiki_en_tfidf.mm.bz2
2015-07-02 07:36:40,681 : INFO : accepted corpus with 3831719 documents, 100000 features, 595701551 non-zero entries
2015-07-02 07:36:40,702 : INFO : using symmetric alpha at 0.0001
2015-07-02 07:36:40,709 : INFO : using serial LDA version on this node
2015-07-02 07:40:24,386 : INFO : running online LDA training, 10000 topics, 1 passes over the supplied corpus of 3831719 documents, updating every 310000 documents, evaluating every ~3100000 documents, iterating 50x with a convergence threshold of 0.001000
2015-07-02 07:40:24,412 : INFO : training LDA model using 31 processes
2015-07-02 07:43:12,940 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #10000/3831719, outstanding queue size 1
Traceback (most recent call last):
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 241, in _feed
obj = ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB

piskvorky · 2015-07-02T10:50:35Z

Looks like you're on a quest Brian :)

That's exciting on its own, and will push the boundaries of what people have tried with LDA in the past. Especially if coupled with a thorough analysis of the results (human eval, not perplexity). How useful are the 10k (or 5k) topics? What is the practical applicability of such models? There is an article or two waiting in there somewhere.

tmylk · 2016-01-23T22:57:20Z

@brianmingus Do you have any interesting results to share here about breaking the topic barrier?

koustuvsinha · 2017-07-17T14:38:44Z

any updates for this? it's still an issue

ghost · 2017-07-17T15:39:27Z

If you want many topics, use an autoencoder to implement LSA. Set the size of your hidden layer to the number of topics desired.

koustuvsinha · 2017-07-17T15:53:41Z

I am just querying for 25 topics and it's still failing. the size of my individual documents is large which is the reason I suppose.

menshikh-iv · 2017-08-14T10:55:10Z

@koustuvsinha you can reduce a size of your vocab and batch_size to avoid this issue.

menshikh-iv · 2017-10-03T07:09:22Z

Unfortunately, it's a limitation from python, for this reason, we can't fix it.

menshikh-iv closed this as completed Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multicore NULL bug #376

Multicore NULL bug #376

ghost commented Jul 1, 2015

ghost commented Jul 1, 2015

ghost commented Jul 1, 2015

piskvorky commented Jul 1, 2015

ghost commented Jul 2, 2015

ghost commented Jul 2, 2015

ghost commented Jul 2, 2015

ghost commented Jul 2, 2015

piskvorky commented Jul 2, 2015

tmylk commented Jan 23, 2016

koustuvsinha commented Jul 17, 2017

ghost commented Jul 17, 2017

koustuvsinha commented Jul 17, 2017

menshikh-iv commented Aug 14, 2017

menshikh-iv commented Oct 3, 2017

Multicore NULL bug #376

Multicore NULL bug #376

Comments

ghost commented Jul 1, 2015

ghost commented Jul 1, 2015

ghost commented Jul 1, 2015

piskvorky commented Jul 1, 2015

ghost commented Jul 2, 2015

ghost commented Jul 2, 2015

ghost commented Jul 2, 2015

ghost commented Jul 2, 2015

piskvorky commented Jul 2, 2015

tmylk commented Jan 23, 2016

koustuvsinha commented Jul 17, 2017

ghost commented Jul 17, 2017

koustuvsinha commented Jul 17, 2017

menshikh-iv commented Aug 14, 2017

menshikh-iv commented Oct 3, 2017