Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Building a corpus from Italian Wikipedia #7

Open
raymanrt opened this issue Aug 31, 2011 · 11 comments
Open

Error Building a corpus from Italian Wikipedia #7

raymanrt opened this issue Aug 31, 2011 · 11 comments

Comments

@raymanrt
Copy link

Hi, the command given is:
pig-0.8.1/bin/pig -x local -p PIGNLPROC_JAR=pignlproc/target/pignlproc-0.1.0-SNAPSHOT.jar -p LANG=it -p INPUT=/home/rayman/Scrivania/wiki_dump/itwiki-latest-pages-articles.xml -p OUTPUT=workspace pignlproc/examples/ner-corpus/01_extract_sentences_with_links.pig

With pig-0.8.1 seems to work well also with only one chunk of the dump, so I decided to process the whole dump (I have only one machine but there's no hurry.
After a couple of hour of processing, the error is the following:

2011-08-31 11:45:25,856 [Thread-624] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Regione_di_Worodougou,Diocesi di Odienné,4) (3)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011-08-31 11:45:26,970 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0003 has failed! Stop running all dependent jobs
2011-08-31 11:45:26,972 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-08-31 11:45:26,973 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-08-31 11:45:26,973 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete
2011-08-31 11:45:26,975 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2 0.8.1 rayman 2011-08-31 11:09:15 2011-08-31 11:45:26 ORDER_BY,FILTER

Some jobs have failed! Stop running all dependent jobs

Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0001 noredirect,parsed,sentences,stored MAP_ONLY
job_local_0002 ordered SAMPLER

Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0003 ordered ORDER_BY Message: Job failed! file:///home/rayman/ner-training-itwiki/workspace/it/sentences_with_links,

Input(s):
Successfully read records from: "/home/rayman/Scrivania/wiki_dump/itwiki-latest-pages-articles.xml"

Output(s):
Failed to produce result in "file:///home/rayman/ner-training-itwiki/workspace/it/sentences_with_links"

Job DAG:
job_local_0001 -> job_local_0002,
job_local_0002 -> job_local_0003,
job_local_0003

2011-08-31 11:45:26,975 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 11:45:26,977 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 11:45:26,978 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2011-08-31 11:45:26,980 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 11:45:26,984 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/rayman/ner-training-itwiki/pig_1314781753331.log

And the log file says:

Pig Stack Trace

ERROR 2244: Job failed, hadoop does not return any error message

org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:500)

at org.apache.pig.Main.main(Main.java:107)

pig_1314781753331.log (END)

What do you think about it?

Riccardo

@ogrisel
Copy link
Owner

ogrisel commented Aug 31, 2011

Hum there does not seem to be pignlproc related packages in the stacktrace... Is this error random or systematically reproduced?

@raymanrt
Copy link
Author

Executing the same script on a different machine gives me the following excepiton:

2011-08-31 14:07:11,305 [Thread-622] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2011-08-31 14:07:11,325 [Thread-622] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2011-08-31 14:07:11,325 [Thread-622] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2011-08-31 14:07:11,326 [Thread-622] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 14:07:11,326 [Thread-622] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2011-08-31 14:07:11,327 [Thread-622] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2011-08-31 14:07:11,736 [Thread-622] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Eccitone,Scintillatore,15) (1)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011-08-31 14:07:14,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0003 has failed! Stop running all dependent jobs
2011-08-31 14:07:14,919 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-08-31 14:07:14,919 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-08-31 14:07:14,919 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete
2011-08-31 14:07:14,922 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2 0.8.1 brainaetic 2011-08-31 13:41:30 2011-08-31 14:07:14 ORDER_BY,FILTER

Some jobs have failed! Stop running all dependent jobs

Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0001 noredirect,parsed,sentences,stored MAP_ONLY
job_local_0002 ordered SAMPLER

Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0003 ordered ORDER_BY Message: Job failed! file:///home/brainaetic/rayman/ner-training-itwiki/workspace/it/sentences_with_links,

Input(s):
Successfully read records from: "file:///home/brainaetic/rayman/itwiki-latest-pages-articles.xml"

Output(s):
Failed to produce result in "file:///home/brainaetic/rayman/ner-training-itwiki/workspace/it/sentences_with_links"

Job DAG:
job_local_0001 -> job_local_0002,
job_local_0002 -> job_local_0003,
job_local_0003

2011-08-31 14:07:14,922 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 14:07:14,924 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 14:07:14,924 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2011-08-31 14:07:14,927 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-08-31 14:07:14,932 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/brainaetic/rayman/ner-training-itwiki/pig_1314790889277.log

Pig Stack Trace

ERROR 2244: Job failed, hadoop does not return any error message

org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:500)
at org.apache.pig.Main.main(Main.java:107)

@ogrisel
Copy link
Owner

ogrisel commented Aug 31, 2011

Unfortunately I have no idea what's happening. The best way to proceed would be to isolate the few Wikipedia articles that trigger the failure (assuming they are always the same) in a unit tests to be able to use the debugger and trace the origin of the issue.

@raymanrt
Copy link
Author

Third execution went wrong with:

java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Regno_di_Sardegna,Santa Margherita di Staffora,13) (3)

Let's try another one run, but they are all different pages by now...

@raymanrt
Copy link
Author

And again:

java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Repubblica_Socialista_Federale_di_Jugoslavia,Luciano Sušanj,2) (3)

@renaud
Copy link

renaud commented Feb 28, 2012

same here:


2012-02-28 19:31:51,008 [Thread-1469] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.io.IOException: Illegal partition for Null: false index: 0 (http://fr.wikipedia.org/wiki/Casimiro_Nay,Projet:Football/Index/C,1) (1)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
0.20.2  0.8.1   richarde    2012-02-28 18:24:06 2012-02-28 19:31:54 ORDER_BY,FILTER

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local_0003  ordered ORDER_BY    Message: Job failed!    file:///pignlproc/output/wiki_dump_parsed/fr/sentences_with_links,

@renaud
Copy link

renaud commented Feb 28, 2012

turning off sorting for now

diff --git a/examples/ner-corpus/01_extract_sentences_with_links.pig b/examples/ner-corpus/01_extract_sentences_with_links.pig
index ead569e..2767b39 100644
--- a/examples/ner-corpus/01_extract_sentences_with_links.pig
+++ b/examples/ner-corpus/01_extract_sentences_with_links.pig
@@ -28,6 +28,8 @@ sentences = FOREACH projected
 stored = FOREACH sentences
   GENERATE title, sentenceOrder, linkTarget, linkBegin, linkEnd, sentence;

+STORE stored INTO '$OUTPUT/$LANG/sentences_with_links_unordered';
+
 -- Ensure ordering for fast merge with type info later
-ordered = ORDER stored BY linkTarget ASC, title ASC, sentenceOrder ASC;
-STORE ordered INTO '$OUTPUT/$LANG/sentences_with_links';
+-- ordered = ORDER stored BY linkTarget ASC, title ASC, sentenceOrder ASC;
+-- STORE ordered INTO '$OUTPUT/$LANG/sentences_with_links';

@renaud
Copy link

renaud commented Feb 29, 2012

for the record, changing to hadoop-0.20.2 (I tried before hadoop-0.20.205.0 and hadoop-0.23.1) and
switching to single node setup (instead of local) worked for me.

@ogrisel
Copy link
Owner

ogrisel commented Feb 29, 2012

Hum, so this might be a pig / hadoop versioning bug?

@renaud
Copy link

renaud commented Feb 29, 2012

I would assume...

chrishokamp pushed a commit to chrishokamp/pignlproc that referenced this issue May 15, 2013
Pig local mode compatibility.
fredcons pushed a commit to fredcons/pignlproc that referenced this issue Oct 14, 2015
@qwaider
Copy link

qwaider commented Nov 26, 2015

In that pig file set default_parallel to 2 would fix the bug for the local test.

Bests,
Mohammed Qwaider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants