-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46700][CORE] Count the last spilling for the shuffle disk spilling bytes metric #44709
Conversation
* @param isLastFile if true, this indicates that we're writing the final output file and that the | ||
* bytes written should be counted towards shuffle spill metrics rather than | ||
* shuffle write metrics. | ||
* @param isFinalFile if true, this indicates that we're writing the final output file and that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename the flag to be more accurate. It doesn't mean the last spill file, but should be the only spill file so that it will be used as the final shuffle output file.
* bytes written should be counted towards shuffle spill metrics rather than | ||
* shuffle write metrics. | ||
* @param isFinalFile if true, this indicates that we're writing the final output file and that | ||
* the bytes written should be counted towards shuffle write metrics rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment was wrong before. If this flag is true, we are writing the final shuffle output file and will increase the shuffle write metrics rather than the spill metrics.
private void writeSortedFile(boolean isFinalFile) { | ||
// Only emit the log if this is an actual spilling. | ||
if (!isFinalFile) { | ||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the logging here so that it applies to the last spilling as well.
writeSortedFile(true); | ||
// Here we are spilling the remaining data in the buffer. If there is no spill before, this | ||
// final spill file will be the final shuffle output file. | ||
writeSortedFile(/* isFinalFile = */spills.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual change of this PR. We should only set this flag if we have not spilled before.
// final write as bytes spilled (instead, it's accounted as shuffle write). The merge needs | ||
// to be counted as shuffle write, but this will lead to double-counting of the final | ||
// SpillInfo's bytes. | ||
writeMetrics.decBytesWritten(spills[spills.length - 1].file.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hack is not needed anymore.
@@ -425,9 +434,8 @@ private void testMergingSpills( | |||
assertSpillFilesWereCleanedUp(); | |||
ShuffleWriteMetrics shuffleWriteMetrics = taskMetrics.shuffleWriteMetrics(); | |||
assertEquals(dataToWrite.size(), shuffleWriteMetrics.recordsWritten()); | |||
assertTrue(taskMetrics.diskBytesSpilled() > 0L); | |||
assertTrue(taskMetrics.diskBytesSpilled() < mergedOutputFile.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous test was simply too relaxed.
cc @mridulm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
…ling bytes metric ### What changes were proposed in this pull request? This PR fixes a long-standing bug in ShuffleExternalSorter about the "spilled disk bytes" metrics. When we close the sorter, we will spill the remaining data in the buffer, with a flag `isLastFile = true`. This flag means the spilling will not increase the "spilled disk bytes" metrics. This makes sense if the sorter has never spilled before, then the final spill file will be used as the final shuffle output file, and we should keep the "spilled disk bytes" metrics as 0. However, if spilling did happen before, then we simply miscount the final spill file for the "spilled disk bytes" metrics today. This PR fixes this issue, by setting that flag when closing the sorter only if this is the first spilling. ### Why are the changes needed? make metrics accurate ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #44709 from cloud-fan/shuffle. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4ea3742) Signed-off-by: Dongjoon Hyun <[email protected]>
…ling bytes metric ### What changes were proposed in this pull request? This PR fixes a long-standing bug in ShuffleExternalSorter about the "spilled disk bytes" metrics. When we close the sorter, we will spill the remaining data in the buffer, with a flag `isLastFile = true`. This flag means the spilling will not increase the "spilled disk bytes" metrics. This makes sense if the sorter has never spilled before, then the final spill file will be used as the final shuffle output file, and we should keep the "spilled disk bytes" metrics as 0. However, if spilling did happen before, then we simply miscount the final spill file for the "spilled disk bytes" metrics today. This PR fixes this issue, by setting that flag when closing the sorter only if this is the first spilling. ### Why are the changes needed? make metrics accurate ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #44709 from cloud-fan/shuffle. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4ea3742) Signed-off-by: Dongjoon Hyun <[email protected]>
Merged to master/3.5/3.4. Thank you, @cloud-fan and @gengliangwang . |
…ling bytes metric ### What changes were proposed in this pull request? This PR fixes a long-standing bug in ShuffleExternalSorter about the "spilled disk bytes" metrics. When we close the sorter, we will spill the remaining data in the buffer, with a flag `isLastFile = true`. This flag means the spilling will not increase the "spilled disk bytes" metrics. This makes sense if the sorter has never spilled before, then the final spill file will be used as the final shuffle output file, and we should keep the "spilled disk bytes" metrics as 0. However, if spilling did happen before, then we simply miscount the final spill file for the "spilled disk bytes" metrics today. This PR fixes this issue, by setting that flag when closing the sorter only if this is the first spilling. ### Why are the changes needed? make metrics accurate ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44709 from cloud-fan/shuffle. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4ea3742) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR fixes a long-standing bug in ShuffleExternalSorter about the "spilled disk bytes" metrics. When we close the sorter, we will spill the remaining data in the buffer, with a flag
isLastFile = true
. This flag means the spilling will not increase the "spilled disk bytes" metrics. This makes sense if the sorter has never spilled before, then the final spill file will be used as the final shuffle output file, and we should keep the "spilled disk bytes" metrics as 0. However, if spilling did happen before, then we simply miscount the final spill file for the "spilled disk bytes" metrics today.This PR fixes this issue, by setting that flag when closing the sorter only if this is the first spilling.
Why are the changes needed?
make metrics accurate
Does this PR introduce any user-facing change?
no
How was this patch tested?
updated tests
Was this patch authored or co-authored using generative AI tooling?
no