Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid materializing list of segment files when finding a partition file during shuffle #11903

Merged
merged 2 commits into from
Nov 11, 2021

Conversation

loquisgon
Copy link

Avoid materializing list of segment files (it can cause OOM/memory pressure) as well as looping over the files.

Description

This fixes a memory pressure/OOM bug seen in the field when the middlemanager may get an "out of memory" error because the list of segments is too large. Besides, looping through the segments files is inefficient and this commit avoids that. It does not need new unit tests, the existing unit tests cover this code already.

This PR has:

  • [ X] been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…essure) as well as looping over the files.
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change, @loquisgon . This greatly simplifies the logic here!

Copy link
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lgtm-com
Copy link

lgtm-com bot commented Nov 10, 2021

This pull request introduces 1 alert when merging 94e419e into d3914c1 - view on LGTM.com

new alerts:

  • 1 for Uncontrolled data used in path expression

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Are there existing unit tests that cover these 2 branches?

final File[] segmentFiles = partitionDir.listFiles();
if (segmentFiles == null) {
return Optional.empty();
final File segmentFile = new File(partitionDir, subTaskId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM is complaining because subTaskId is passed in from user input. A validation check similar to the one done for supervisorTaskId on line 375 should suppress the warning from LGTM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code is semantically the same, with the same branching (i.e. partition found it returns it else it returns Optional.empty() so I think we are good. The Optional.empty() return though is not being hit by any existing unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the existing unit tests don't validate what happens when the segment file is missing.
Screen Shot 2021-11-11 at 9 38 30 AM

This logic is simple enough, and code coverage bot is happy, so I'm ok with not adding a test case for this if it seems like overkill

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍

@loquisgon loquisgon merged commit a13a96d into apache:master Nov 11, 2021
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants