Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4441: TezAppMaster may stuck because of reportError skip send err… #236

Merged
merged 3 commits into from
Aug 29, 2022

Conversation

zhengchenyu
Copy link
Contributor

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 34m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 15m 8s master passed
+1 💚 compile 0m 54s master passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 47s master passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 28s master passed
+1 💚 javadoc 1m 1s master passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 12s master passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 56s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 54s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 30s the patch passed
-0 ⚠️ checkstyle 0m 22s tez-dag: The patch generated 4 new + 112 unchanged - 1 fixed = 116 total (was 113)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 1m 21s the patch passed
_ Other Tests _
+1 💚 unit 5m 35s tez-dag in the patch passed.
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
67m 47s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/1/artifact/out/Dockerfile
GITHUB PR #236
JIRA Issue TEZ-4441
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 64cf49206596 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 621a831
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/1/artifact/out/diff-checkstyle-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/1/testReport/
Max. process+thread count 217 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/1/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -910,7 +910,8 @@ public void reportError(int taskSchedulerIndex, ServicePluginError servicePlugin
LOG.info("Error reported by scheduler {} - {}",
Utils.getTaskSchedulerIdentifierString(taskSchedulerIndex, appContext) + ": " +
diagnostics);
if (taskSchedulerDescriptors[taskSchedulerIndex].getClassName().equals(yarnSchedulerClassName)) {
if (taskSchedulerDescriptors[taskSchedulerIndex].getEntityName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the actual fix

just a note, what are the values in your case:

taskSchedulerDescriptors[taskSchedulerIndex].getClassName()
yarnSchedulerClassName
taskSchedulerDescriptors[taskSchedulerIndex].getEntityName()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the only actual fix.

  • Before this PR
method return value
taskSchedulerDescriptors[taskSchedulerIndex].getClassName() null
yarnSchedulerClassName "org.apache.tez.dag.app.rm.YarnTaskSchedulerService"

taskSchedulerDescriptors[taskSchedulerIndex].getClassName() is set from the variable 'taskSchedulerDescriptors' of DAGAppMaster::serviceInit. In DAGAppMaster::parsePlugin, when we construct NamedEntityDescriptor for tez yarn plugin, the className is all null.

yarnSchedulerClassName is set from tez.am.yarn.scheduler.class, default value is "org.apache.tez.dag.app.rm.YarnTaskSchedulerService".

So for tez yarn plugin, taskSchedulerDescriptors[taskSchedulerIndex].getClassName() will never equals to yarnSchedulerClassName. Then

  • After this PR
taskSchedulerDescriptors[taskSchedulerIndex].getEntityName() will return "TezYarn"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, nice catch:

TezConstants.getTezYarnServicePluginName(), null).setUserPayload(defaultPayload);

we simply don't fill the classname, so we should not rely on it, only use it in case of createCustomTaskScheduler

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 37m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 15m 34s master passed
+1 💚 compile 1m 0s master passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 49s master passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 22s master passed
+1 💚 javadoc 0m 56s master passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 50s master passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 54s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 52s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 29s the patch passed
-0 ⚠️ checkstyle 0m 24s tez-dag: The patch generated 4 new + 112 unchanged - 1 fixed = 116 total (was 113)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 28s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 1m 23s the patch passed
_ Other Tests _
+1 💚 unit 5m 35s tez-dag in the patch passed.
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
70m 52s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/2/artifact/out/Dockerfile
GITHUB PR #236
JIRA Issue TEZ-4441
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 4d56a61e7e30 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 621a831
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/2/artifact/out/diff-checkstyle-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/2/testReport/
Max. process+thread count 253 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/2/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 15m 29s master passed
+1 💚 compile 1m 2s master passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 50s master passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 20s master passed
+1 💚 javadoc 0m 58s master passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 50s master passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 59s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 57s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 29s the patch passed
-0 ⚠️ checkstyle 0m 25s tez-dag: The patch generated 1 new + 112 unchanged - 1 fixed = 113 total (was 113)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 1m 23s the patch passed
_ Other Tests _
+1 💚 unit 5m 32s tez-dag in the patch passed.
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
34m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/3/artifact/out/Dockerfile
GITHUB PR #236
JIRA Issue TEZ-4441
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 5ed386eed7be 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 621a831
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/3/artifact/out/diff-checkstyle-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/3/testReport/
Max. process+thread count 255 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-236/3/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog merged commit 55b6031 into apache:master Aug 29, 2022
asfgit pushed a commit that referenced this pull request Aug 29, 2022
…or event (#236) (zhengchenyu reviewed by Laszlo Bodor)
udaynpusa pushed a commit to mapr/tez that referenced this pull request Jan 30, 2024
…or event (apache#236) (zhengchenyu reviewed by Laszlo Bodor)

(cherry picked from commit 55b6031)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants