Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. #4445

Conversation

mukund-thakur
Copy link
Contributor

@mukund-thakur mukund-thakur commented Jun 15, 2022

Rebased the feature branch. Old pr link #4427

Description of PR

part of HADOOP-18103.
Handling memoroy fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer is called.

How was this patch tested?

Added new test and re-ran existing tests.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…tation.

part of HADOOP-18103.
Handling memoroy fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer is called.
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ feature-vectored-io Compile Tests _
+0 🆗 mvndep 14m 58s Maven dependency ordering for branch
+1 💚 mvninstall 25m 17s feature-vectored-io passed
+1 💚 compile 23m 19s feature-vectored-io passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 38s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 17s feature-vectored-io passed
+1 💚 mvnsite 5m 1s feature-vectored-io passed
+1 💚 javadoc 4m 17s feature-vectored-io passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 3m 55s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 28s feature-vectored-io passed
+1 💚 shadedclient 21m 45s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 22m 17s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 2m 33s the patch passed
+1 💚 compile 22m 17s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 17s root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 0 new + 2892 unchanged - 3 fixed = 2892 total (was 2895)
+1 💚 compile 20m 45s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 45s root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu120.04-b07 generated 0 new + 2689 unchanged - 3 fixed = 2689 total (was 2692)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 12s the patch passed
+1 💚 mvnsite 5m 2s the patch passed
+1 💚 javadoc 4m 5s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 3m 59s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 40s the patch passed
+1 💚 shadedclient 21m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 31s hadoop-common in the patch passed.
+1 💚 unit 3m 13s hadoop-aws in the patch passed.
+1 💚 unit 1m 19s hadoop-benchmark in the patch passed.
+1 💚 asflicense 1m 37s The patch does not generate ASF License warnings.
253m 45s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4445/3/artifact/out/Dockerfile
GITHUB PR #4445
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 444b19562f75 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-vectored-io / a848052
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4445/3/testReport/
Max. process+thread count 1296 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-benchmark U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4445/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jun 20, 2022
@apache apache deleted a comment from hadoop-yetus Jun 20, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending the little details I've suggested

@@ -47,7 +47,7 @@
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileRange;
import org.apache.hadoop.fs.FileRangeImpl;
import org.apache.hadoop.fs.impl.FileRangeImpl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be needed now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is required actually.

} catch (Exception ex) {
LOG.warn("Exception occurred while reading combined range from file {}", pathStr, ex);
LOG.warn("Exception while reading a range {} from path {} ", combinedFileRange, pathStr, ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we log noisily on an EOFException? as that probably doesn't need a stack trace & can just be logged at DEBUG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay changing to debug.
Or do you think it is better to separate EOF exception in a different catch clause?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ feature-vectored-io Compile Tests _
+0 🆗 mvndep 14m 33s Maven dependency ordering for branch
+1 💚 mvninstall 28m 2s feature-vectored-io passed
+1 💚 compile 25m 5s feature-vectored-io passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 21m 43s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 32s feature-vectored-io passed
+1 💚 mvnsite 4m 12s feature-vectored-io passed
+1 💚 javadoc 3m 21s feature-vectored-io passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 59s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 5m 46s feature-vectored-io passed
+1 💚 shadedclient 24m 22s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 24m 49s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for patch
+1 💚 mvninstall 2m 24s the patch passed
+1 💚 compile 24m 26s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 24m 26s root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 0 new + 2892 unchanged - 3 fixed = 2892 total (was 2895)
+1 💚 compile 21m 46s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 21m 46s root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu120.04-b07 generated 0 new + 2691 unchanged - 3 fixed = 2691 total (was 2694)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 25s the patch passed
+1 💚 mvnsite 4m 13s the patch passed
+1 💚 javadoc 3m 14s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 59s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 9s the patch passed
+1 💚 shadedclient 24m 15s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 13s hadoop-common in the patch passed.
+1 💚 unit 3m 4s hadoop-aws in the patch passed.
+1 💚 unit 0m 55s hadoop-benchmark in the patch passed.
+1 💚 asflicense 1m 19s The patch does not generate ASF License warnings.
258m 41s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4445/4/artifact/out/Dockerfile
GITHUB PR #4445
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux ccb919917c77 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-vectored-io / ffde645
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4445/4/testReport/
Max. process+thread count 1252 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-benchmark U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4445/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@mukund-thakur mukund-thakur merged this pull request into apache:feature-vectored-io Jun 20, 2022
mukund-thakur added a commit that referenced this pull request Jun 21, 2022
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur
@steveloughran
Copy link
Contributor

on the EOF exception
what happens with

  • vectored reads which start before and end after EOF?
  • vectored reads which start and end after EOF
  • vectored reads with two ranges in draining distance, the second of which is beyond the EOF?

should all be tested in the contract tests, if not done already

@mukund-thakur
Copy link
Contributor Author

on the EOF exception what happens with

  • vectored reads which start before and end after EOF?
  • vectored reads which start and end after EOF
  • vectored reads with two ranges in draining distance, the second of which is beyond the EOF?

should all be tested in the contract tests, if not done already

All these will cause EOF because we validate ranges in the start only. Tests are there but not exactly like these. I can add some more next time.

asfgit pushed a commit that referenced this pull request Jun 22, 2022
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur
mukund-thakur added a commit that referenced this pull request Jun 27, 2022
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…#4445)

part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants