Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseError: no element found: line 45, column 0 #1496

Closed
fuzihaofzh opened this issue Jul 21, 2017 · 4 comments
Closed

ParseError: no element found: line 45, column 0 #1496

fuzihaofzh opened this issue Jul 21, 2017 · 4 comments

Comments

@fuzihaofzh
Copy link

Hi, I downloaded data from https://dumps.wikimedia.org/enwiki/20170701/enwiki-20170701-pages-articles-multistream.xml.bz2
and I run code as:

from gensim.corpora import WikiCorpus
inp = "enwiki-20170701-pages-articles-multistream.xml.bz2"
wiki = WikiCorpus(inp, lemmatize=False, dictionary={})
a = wiki.get_texts()
a.next()

and I got

Process InputQueue-24:
Traceback (most recent call last):
  File "/home/tkzif/ProgramFiles/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/tkzif/ProgramFiles/anaconda2/lib/python2.7/site-packages/gensim/utils.py", line 843, in run
    wrapped_chunk = [list(chunk)]
  File "/home/tkzif/ProgramFiles/anaconda2/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py", line 310, in <genexpr>
    ((text, self.lemmatize, title, pageid)
  File "/home/tkzif/ProgramFiles/anaconda2/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py", line 215, in extract_pages
    for elem in elems:
  File "/home/tkzif/ProgramFiles/anaconda2/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py", line 200, in <genexpr>
    elems = (elem for _, elem in iterparse(f, events=("end",)))
  File "<string>", line 107, in next
ParseError: no element found: line 45, column 0

Is there any thing wrong with my method?

@xurannlpr
Copy link

i also meet this error, have you already fixed it?

@fuzihaofzh
Copy link
Author

@xurannlpr yes, I do fixed the error. It is due to my error download package. Please download the package exactly as the document in gensim. It may help. Thanks.

@polm
Copy link
Contributor

polm commented Aug 1, 2017

Looks like the issue is that "multistream" isn't supported. Seems this has confused other people too...

https://groups.google.com/forum/#!msg/gensim/vA97TdOkljk/ty1YPuZWAQAJ

menshikh-iv pushed a commit that referenced this issue Sep 18, 2017
* Add comment explaining lack of multistream support

See #1496, looks like this has confused some people. -POLM

* Add file patterns to documentation
@menshikh-iv
Copy link
Contributor

Resolved in #1515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants