-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-3363: Delete intermediate data at the vertex level for Shuffle Handler #60
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@abstractdog Could you please review the changes? |
@shameersss1: I'm more than interested in this patch, let me have some time to review it |
@abstractdog - Thanks for showing interest to review. It has been pending for a while now. The high level idea behind this feature is that, Whenever all the dependent vertex of a particular vertex have succeeded we delete the vertex shuffle data of that particular/parent vertex. Testing Procedure
|
@abstractdog Could you please review the changes? |
@shameersss1: sorry, I haven't had the chance, I want to test this on a cluster too, where I face some issues at the moment, also I'm busy with other changes, let me get back to you in 2 weeks, thanks for your patience! |
tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
Outdated
Show resolved
Hide resolved
tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java
Outdated
Show resolved
Hide resolved
tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
please rebase on top of master, latest compilation error could be due to TEZ-4227 |
Rebased to latest master. pending tests |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
Outdated
Show resolved
Hide resolved
vertex.appContext.getAppMaster().vertexComplete( | ||
vertex.vertexId, nodes); | ||
} else { | ||
LOG.debug(String.format("The number of incomplete child vertex are %s for the vertex %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use logger format here {} {}, String.format is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. Will resolve in next revision.
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left very minor comments, this is very close @shameersss1 !
I have resolved the comments. Thanks for your valuable feedback. |
🎊 +1 overall
This message was automatically generated. |
@abstractdog - Are we good to take it forward? |
warnings here are easily addressable: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-60/7/artifact/out/diff-checkstyle-tez-plugins_tez-aux-services.txt |
} | ||
|
||
@VisibleForTesting | ||
public VertexShuffleDataDeletionContext getVShuffleDeletionContext() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used currently in any of the tests, vShuffleDeletionContext can remain private to VertexImpl
I'm fine with leaving this part uncovered, can you please remove this method?
(sorry, last minute comments :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by testVertexShuffleDelete() in TestVertexImpl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, missed it sorry, in this case, you can remove public keyword (make it package visible) to enhance "VisibleForTesting" behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. I will fix it in next revision.
ack. |
89c05dc
to
db543c9
Compare
💔 -1 overall
This message was automatically generated. |
can you check @shameersss1 if the latest conflict is because of TEZ-4359 and rebase if needed? |
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
merged to master, finally! thanks @shameersss1 for your tireless work and patience on this one! |
For applications like pig where processing times can be very long, applications may choose to delete intermediate data for a sub dag. For example if a DAG has synced data to HDFS, all upstream intermediate data can be safely deleted.