Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write end dead child number [34542970] in file [Comments.xml] whilst parsing archive #10

Open
sprnza opened this issue Mar 10, 2022 · 3 comments

Comments

@sprnza
Copy link

sprnza commented Mar 10, 2022

I've downloaded these files

stack ➜  stackoff ls stackoverflow
Sites.xml  stackoverflow.com-Comments.7z  stackoverflow.com-Posts.7z  stackoverflow.com-Users.7z

When I try to index them I get this error:

org.tools4j.stacked.index.FileInZipParserException: Write end dead child number [34542970] in file [Comments.xml]
        at org.tools4j.stacked.index.FileInZipParser.start(SeZipFileParser.kt:189)
        at org.tools4j.stacked.index.ExtractCallback$getStream$2.run(SeZipFileParser.kt:104)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.tools4j.stacked.index.XmlFileParserException: Write end dead child number [34542970]
        at org.tools4j.stacked.index.XmlFileParser.parseElements(XmlFileParser.kt:64)
        at org.tools4j.stacked.index.XmlFileParser.parse(XmlFileParser.kt:20)
        at org.tools4j.stacked.index.FileInZipParser.start(SeZipFileParser.kt:187)
        ... 6 more
Caused by: com.ctc.wstx.exc.WstxIOException: Write end dead
        at com.ctc.wstx.sr.StreamScanner.constructFromIOE(StreamScanner.java:640)
        at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1004)
        at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1043)
        at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:789)
        at com.ctc.wstx.sr.BasicStreamReader.parseAttrValue(BasicStreamReader.java:1973)
        at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3145)
        at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:3043)
        at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2919)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123)
        at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)
        at org.tools4j.stacked.index.XmlFileParser.parseElements(XmlFileParser.kt:37)
        ... 8 more
Caused by: java.io.IOException: Write end dead
        at java.base/java.io.PipedInputStream.read(PipedInputStream.java:310)
        at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
        at com.ctc.wstx.io.BaseReader.readBytes(BaseReader.java:155)
        at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:369)
        at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:112)
        at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:89)
        at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
        at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:998)
        ... 17 more
14:01:22.097 [Thread-422250] ERROR org.tools4j.stacked.index.SeDirParser - Write end dead child number [34542970] in file [Comments.xml] whilst parsing archive [/mnt/data/service-data/stackoff/stackoverflow/stackoverflow.com-Comments.7z]

@sprnza
Copy link
Author

sprnza commented Mar 10, 2022

What could be possibly wrong?

@morganrivers
Copy link

morganrivers commented Mar 16, 2022

I got a very similar error. Seems to be the same as the first error mentioned in issue #8 .

Potentially moving the stackoverflow data to a new hard drive is a fix that might work?

Here's my log:

19:55:47.696 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 61000000 comments rows read from xml...
19:56:05.055 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 62000000 comments rows read from xml...
19:56:20.022 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 630000`00 comments rows read from xml...
19:56:36.413 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 64000000 comments rows read from xml...
19:56:54.268 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 65000000 comments rows read from xml...
19:57:09.460 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 66000000 comments rows read from xml...
19:57:27.304 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 67000000 comments rows read from xml...
19:57:44.881 [pool-2-thread-1] DEBUG org.tools4j.stacked.index.XmlFileParser - 68000000 comments rows read from xml...
org.tools4j.stacked.index.FileInZipParserException: Write end dead child number [68090165] in file [Comments.xml]
at org.tools4j.stacked.index.FileInZipParser.start(SeZipFileParser.kt:189)
at org.tools4j.stacked.index.ExtractCallback$getStream$2.run(SeZipFileParser.kt:104)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.tools4j.stacked.index.XmlFileParserException: Write end dead child number [68090165]
at org.tools4j.stacked.index.XmlFileParser.parseElements(XmlFileParser.kt:64)
at org.tools4j.stacked.index.XmlFileParser.parse(XmlFileParser.kt:20)
at org.tools4j.stacked.index.FileInZipParser.start(SeZipFileParser.kt:187)
... 6 more
Caused by: com.ctc.wstx.exc.WstxIOException: Write end dead
at com.ctc.wstx.sr.StreamScanner.constructFromIOE(StreamScanner.java:640)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1004)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1043)
at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:789)
at com.ctc.wstx.sr.BasicStreamReader.parseAttrValue(BasicStreamReader.java:1973)
at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3145)
at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:3043)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2919)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123)
at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)
at org.tools4j.stacked.index.XmlFileParser.parseElements(XmlFileParser.kt:37)
... 8 more
Caused by: java.io.IOException: Write end dead
at java.base/java.io.PipedInputStream.read(PipedInputStream.java:310)
at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.ctc.wstx.io.BaseReader.readBytes(BaseReader.java:155)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:369)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:112)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:89)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:998)
... 17 more
19:57:48.327 [Thread-14] ERROR org.tools4j.stacked.index.SeDirParser - Write end dead child number [68090165] in file [Comments.xml] whilst parsing archive [/mnt/old_hdd/STACKOVERFLOW/stackoverflow.com-Comments.7z]

@TuringTux
Copy link

TuringTux commented Jul 18, 2022

I too experience this issue. I've tried it a few times and it always crashes at a different child number, so I suppose (as did the previous posters) that it doesn't have something to do with the input data itself (like a malformed string or something like that).

I did monitor CPU, RAM and disk space usage, all were within normal parameters for the entire run...

Update: My mitigation for now is to host an instance of Kiwix-Serve, which is an entirely different application but can also serve an offline dump of Stack Overflow answers (among other sources). Maybe this can help some future visitors with the same problem, too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants