Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-3363: Delete intermediate data at the vertex level for Shuffle Handler #60

Merged
merged 1 commit into from
Mar 16, 2022

Conversation

shameersss1
Copy link
Contributor

@shameersss1 shameersss1 commented Feb 20, 2020

For applications like pig where processing times can be very long, applications may choose to delete intermediate data for a sub dag. For example if a DAG has synced data to HDFS, all upstream intermediate data can be safely deleted.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@shameersss1
Copy link
Contributor Author

@abstractdog Could you please review the changes?

@abstractdog
Copy link
Contributor

abstractdog commented Feb 7, 2022

@shameersss1: I'm more than interested in this patch, let me have some time to review it
this needs more thorough testing than TEZ-4129 as TEZ-4129 was on the unhappy code path (failed attempts), but this one seriously affects shuffle
could you please describe what kind of testing process have you done with this patch?

@shameersss1
Copy link
Contributor Author

@shameersss1: I'm more than interested in this patch, let me have some time to review it this needs more thorough testing than TEZ-4129 as TEZ-4129 was on the unhappy code path (failed attempts), but this one seriously affects shuffle could you please describe what kind of testing process have you done with this patch?

@abstractdog - Thanks for showing interest to review. It has been pending for a while now.

The high level idea behind this feature is that, Whenever all the dependent vertex of a particular vertex have succeeded we delete the vertex shuffle data of that particular/parent vertex.

Testing Procedure

  1. I picked a query which spawns a big dag (preferably some TPC-DS query) which runs to quite some time. I changed number of max reducers to 1 so that the final stage takes time
  2. I checked if the shuffle data of the parent vertex are deleted when all the dependent vertex succeeded.

@shameersss1
Copy link
Contributor Author

@shameersss1: I'm more than interested in this patch, let me have some time to review it this needs more thorough testing than TEZ-4129 as TEZ-4129 was on the unhappy code path (failed attempts), but this one seriously affects shuffle could you please describe what kind of testing process have you done with this patch?

@abstractdog Could you please review the changes?

@abstractdog
Copy link
Contributor

abstractdog commented Mar 2, 2022

@shameersss1: sorry, I haven't had the chance, I want to test this on a cluster too, where I face some issues at the moment, also I'm busy with other changes, let me get back to you in 2 weeks, thanks for your patience!

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor

please rebase on top of master, latest compilation error could be due to TEZ-4227

@shameersss1
Copy link
Contributor Author

please rebase on top of master, latest compilation error could be due to TEZ-4227

Rebased to latest master. pending tests

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@shameersss1 shameersss1 requested a review from abstractdog March 11, 2022 09:55
vertex.appContext.getAppMaster().vertexComplete(
vertex.vertexId, nodes);
} else {
LOG.debug(String.format("The number of incomplete child vertex are %s for the vertex %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logger format here {} {}, String.format is not necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Will resolve in next revision.

Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left very minor comments, this is very close @shameersss1 !

@shameersss1
Copy link
Contributor Author

left very minor comments, this is very close @shameersss1 !

I have resolved the comments. Thanks for your valuable feedback.

@shameersss1 shameersss1 requested a review from abstractdog March 13, 2022 09:19
@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 5m 9s Maven dependency ordering for branch
+1 💚 mvninstall 9m 18s master passed
+1 💚 compile 2m 51s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 2m 41s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 3m 2s master passed
+1 💚 javadoc 2m 47s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 0m 42s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 4m 56s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 1m 47s the patch passed
+1 💚 compile 1m 44s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 44s the patch passed
+1 💚 compile 1m 33s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 33s the patch passed
-0 ⚠️ checkstyle 0m 35s tez-dag: The patch generated 3 new + 609 unchanged - 1 fixed = 612 total (was 610)
-0 ⚠️ checkstyle 0m 11s tez-plugins/tez-aux-services: The patch generated 3 new + 67 unchanged - 0 fixed = 70 total (was 67)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 24s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 4m 6s the patch passed
_ Other Tests _
+1 💚 unit 2m 7s tez-api in the patch passed.
+1 💚 unit 0m 26s tez-common in the patch passed.
+1 💚 unit 4m 30s tez-runtime-library in the patch passed.
+1 💚 unit 4m 28s tez-dag in the patch passed.
+1 💚 unit 2m 42s tez-aux-services in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
64m 27s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/artifact/out/Dockerfile
GITHUB PR #60
JIRA Issue TEZ-3363
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux dd0e9beab074 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 132ea4c
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/artifact/out/diff-checkstyle-tez-dag.txt
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/artifact/out/diff-checkstyle-tez-plugins_tez-aux-services.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/testReport/
Max. process+thread count 2099 (vs. ulimit of 5500)
modules C: tez-api tez-common tez-runtime-library tez-dag tez-plugins/tez-aux-services U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@abstractdog - Are we good to take it forward?

@abstractdog
Copy link
Contributor

abstractdog commented Mar 16, 2022

warnings here are easily addressable: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/artifact/out/diff-checkstyle-tez-plugins_tez-aux-services.txt
please fix those and then this can be merged

}

@VisibleForTesting
public VertexShuffleDataDeletionContext getVShuffleDeletionContext() {
Copy link
Contributor

@abstractdog abstractdog Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used currently in any of the tests, vShuffleDeletionContext can remain private to VertexImpl
I'm fine with leaving this part uncovered, can you please remove this method?
(sorry, last minute comments :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used by testVertexShuffleDelete() in TestVertexImpl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, missed it sorry, in this case, you can remove public keyword (make it package visible) to enhance "VisibleForTesting" behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. I will fix it in next revision.

@shameersss1
Copy link
Contributor Author

warnings here are easily addressable: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/artifact/out/diff-checkstyle-tez-plugins_tez-aux-services.txt please fix those and then this can be merged

ack.

@shameersss1 shameersss1 force-pushed the TEZ-3363 branch 3 times, most recently from 89c05dc to db543c9 Compare March 16, 2022 15:35
@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 5s #60 does not apply to master. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez for help.
Subsystem Report/Notes
GITHUB PR #60
JIRA Issue TEZ-3363
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/8/console
versions git=2.17.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog self-requested a review March 16, 2022 15:55
@abstractdog
Copy link
Contributor

can you check @shameersss1 if the latest conflict is because of TEZ-4359 and rebase if needed?

@tez-yetus

This comment was marked as outdated.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 1s The patch appears to include 3 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 5m 11s Maven dependency ordering for branch
+1 💚 mvninstall 9m 24s master passed
+1 💚 compile 2m 50s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 2m 39s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 2m 46s master passed
+1 💚 javadoc 2m 47s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 32s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 0m 42s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 4m 53s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 1m 47s the patch passed
+1 💚 compile 1m 45s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 45s the patch passed
+1 💚 compile 1m 34s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 34s the patch passed
-0 ⚠️ checkstyle 0m 34s tez-dag: The patch generated 3 new + 609 unchanged - 1 fixed = 612 total (was 610)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 26s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 4m 8s the patch passed
_ Other Tests _
+1 💚 unit 2m 6s tez-api in the patch passed.
+1 💚 unit 0m 27s tez-common in the patch passed.
+1 💚 unit 4m 35s tez-runtime-library in the patch passed.
+1 💚 unit 4m 27s tez-dag in the patch passed.
+1 💚 unit 2m 42s tez-aux-services in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
64m 28s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/10/artifact/out/Dockerfile
GITHUB PR #60
JIRA Issue TEZ-3363
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux bbae1f24de65 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 132ea4c
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/10/artifact/out/diff-checkstyle-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/10/testReport/
Max. process+thread count 2099 (vs. ulimit of 5500)
modules C: tez-api tez-common tez-runtime-library tez-dag tez-plugins/tez-aux-services U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/10/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@abstractdog abstractdog merged commit 20873a3 into apache:master Mar 16, 2022
@abstractdog
Copy link
Contributor

merged to master, finally! thanks @shameersss1 for your tireless work and patience on this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants