-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore NULL bug #376
Comments
This is related to this bug: https://bugs.python.org/issue17560 |
Another version of this error: |
Yes, I remember this error. It's a bug/limitation in CPython. It's been reported here for gensim, but there's no (easy) workaround for now :( |
I tried the SO monkeypatch. It results in this error: https://gist.github.com/brianmingus/c58533bc690516a600f6 This error is not easily fixable, because it's a bug in C code: https://github.com/python/cpython/blob/3.4/Modules/posixmodule.c#L8048-L8065 |
I filed a bug for this issue: http://bugs.python.org/issue24550 |
If you hack multiprocessing in Python 3.6 then ldamulticore works! |
It's working for 5k topics but at 10k a new bug emerges: 2015-07-02 07:36:40,642 : INFO : initializing corpus reader from wiki_en_tfidf.mm.bz2 |
Looks like you're on a quest Brian :) That's exciting on its own, and will push the boundaries of what people have tried with LDA in the past. Especially if coupled with a thorough analysis of the results (human eval, not perplexity). How useful are the 10k (or 5k) topics? What is the practical applicability of such models? There is an article or two waiting in there somewhere. |
@brianmingus Do you have any interesting results to share here about breaking the topic barrier? |
any updates for this? it's still an issue |
If you want many topics, use an autoencoder to implement LSA. Set the size of your hidden layer to the number of topics desired. |
I am just querying for 25 topics and it's still failing. the size of my individual documents is large which is the reason I suppose. |
@koustuvsinha you can reduce a size of your vocab and batch_size to avoid this issue. |
Unfortunately, it's a limitation from python, for this reason, we can't fix it. |
ubuntu@ip-172-31-33-28:~$ python lda_en.py
2015-07-01 00:13:06,021 : INFO : initializing corpus reader from <bz2.BZ2File object at 0x7fd29bf1eb90>
2015-07-01 00:13:06,052 : INFO : accepted corpus with 3831719 documents, 100000 features, 595701551 non-zero entries
MmCorpus(3831719 documents, 100000 features, 595701551 non-zero entries)
2015-07-01 00:13:06,059 : INFO : using symmetric alpha at 0.0001
2015-07-01 00:13:06,060 : INFO : using serial LDA version on this node
2015-07-01 00:16:27,970 : INFO : running online LDA training, 10000 topics, 1 passes over the supplied corpus of 3831719 documents, updating every 3
2000 documents, evaluating every ~320000 documents, iterating 50x with a convergence threshold of 0.001000
2015-07-01 00:16:27,976 : INFO : training LDA model using 32 processes
2015-07-01 00:16:33,442 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #1000/3831719, outstanding queue size 1
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 264, in _feed
send(obj)
SystemError: NULL result without error in PyObject_Call
2015-07-01 00:16:40,488 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #2000/3831719, outstanding queue size 2
2015-07-01 00:16:45,457 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #3000/3831719, outstanding queue size 3
2015-07-01 00:16:50,220 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #4000/3831719, outstanding queue size 4
The text was updated successfully, but these errors were encountered: