Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead #5103

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Nov 2, 2022

Adds new config option fs.azure.enable.readahead to turn off readahead

  • also allows it to be passed in through openFile(), which means that if you set it in a mr/spark/hive job conf with "mapreduce.job.input.file.option." as the prefix, it will be disabled when used through LineRecordReader, FixedLengthRecordReader and a few others.
  • extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower.

How was this patch tested?

  1. modified (scale) itest.
  2. been in the cloudera builds for a long time.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…eadahead

Adds new config option to turn off readahead
* also allows it to be passed in through openFile(),
  which means that if you set it in a mr/spark/hive job conf with
  "mapreduce.job.input.file.option." as the prefix, it will be disabled
  when used through LineRecordReader, FixedLengthRecordReader and
  a few others.
* extends ITestAbfsReadWriteAndSeek to use the option, including one
  replicated test...that shows that turning it off is slower.

Change-Id: Icfc3b6578654aff0b1264d5e116d4bf9eb414881
@steveloughran
Copy link
Contributor Author

testing against s3 cardiff. all ok except that -ITestAbfsReadWriteAndSeek.testReadAndWriteWithDifferentBufferSizesAndSeek[Size=104,857,600-readahead=true]] is taking so long to write the file that the test can time out if other things are under way on the laptop. I'm not going to change that test though to make it configurable, add readFully() actions or similar.

i do have plans there, as it is the single "huge file" test in the codebase, but not in this PR

@steveloughran
Copy link
Contributor Author

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 11s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 1s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 13s trunk passed
+1 💚 compile 0m 48s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 44s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 39s trunk passed
+1 💚 mvnsite 0m 48s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 38s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 25s trunk passed
+1 💚 shadedclient 24m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 34s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 23s the patch passed
+1 💚 mvnsite 0m 39s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 23m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 18s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 40s The patch does not generate ASF License warnings.
106m 49s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/1/artifact/out/Dockerfile
GITHUB PR #5103
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux eff0725486e8 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3db7b36
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/1/testReport/
Max. process+thread count 534 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

i'm getting test timeouts on big uploads tonight.

[ERROR] Errors: 
[ERROR]   ITestAzureBlobFileSystemE2E.testOOBWritesAndReadFail:95 » TestTimedOut test ti...
[ERROR]   ITestAzureBlobFileSystemFlush.testWriteHeavyBytesToFileSyncFlush:160 » TestTimedOut
[ERROR]   ITestAbfsInputStream.testWithNoOptimization:58->createFileWithContent:179 » TestTimedOut
[ERROR]   ITestAbfsInputStreamReadFooter>ITestAbfsInputStream.testExceptionInOptimization:103->ITestAbfsInputStream.createFileWithContent:179 » TestTimedOut
[ERROR]   ITestAbfsInputStreamReadFooter.testSeekToBeginAndReadWithConfTrue:106->testSeekAndReadWithConf:163->seekReadAndTest:196 » TestTimedOut
[ERROR]   ITestAbfsInputStreamReadFooter.testSeekToEndAndReadWithConfTrue:146->testSeekAndReadWithConf:162->ITestAbfsInputStream.createFileWithContent:179 » TestTimedOut
[ERROR]   ITestAbfsInputStreamReadFooter>ITestAbfsInputStream.testWithNoOptimization:58->ITestAbfsInputStream.createFileWithContent:179 » TestTimedOut
[ERROR]   ITestAbfsInputStreamSmallFileReads>ITestAbfsInputStream.testExceptionInOptimization:103->ITestAbfsInputStream.createFileWithContent:179 » TestTimedOut
[ERROR]   ITestAbfsInputStreamSmallFileReads.testSeekToMiddleAndReadBigFileWithConfFalse:156->testSeekAndReadWithConf:166->ITestAbfsInputStream.createFileWithContent:179 » TestTimedOut
[INFO] 

earlier today i was uploading multiGB files to the same a/c, which was triggering throttling (laptop was plugged in to firewall). its now on wifi, so either its that or some throttling is still taking place

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I entered a couple of minor cosmetic comments that I don't consider to be blockers, so I'll leave it to your choice on whether or not to address.

Thanks, Steve!

@steveloughran
Copy link
Contributor Author

thanks; will do the changes, i'd just cherrypicked the internal code.

Copy link
Contributor

@ashutoshcipher ashutoshcipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes makes sense. +1. Thanks for running the Integrations tests as well.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@@ -106,6 +106,7 @@ public final class FileSystemConfigurations {
public static final boolean DEFAULT_ABFS_LATENCY_TRACK = false;
public static final long DEFAULT_SAS_TOKEN_RENEW_PERIOD_FOR_STREAMS_IN_SECONDS = 120;

public static final boolean DEFAULT_ENABLE_READAHEAD = true;
Copy link
Contributor

@mukund-thakur mukund-thakur Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the default value to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Removed the fs.azure.enable.readahead option from openFile(); its
a sysadmin option only.

For openFile(), I think we should focus on "policy", rather than explicit
tuning options. e.g "random/vectored" could disable prefetching; sequential
would enable it...

Change-Id: I26d2a75f8c0815bf35a9fa876c94b9cfe7b8dc5c
@steveloughran
Copy link
Contributor Author

removed openFile() option; this is cluster admin only.

tested with 14 threads and got timeout on lease and throttling out of range. clearly too many threads


[ERROR] Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 94.978 s <<< FAILURE! - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease
[ERROR] testTwoWritersCreateAppendWithInfiniteLeaseEnabled(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease)  Time elapsed: 94.978 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 90000 milliseconds
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at org.apache.hadoop.thirdparty.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:537)
        at org.apache.hadoop.thirdparty.com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:88)
        at org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
        at org.apache.hadoop.fs.azurebfs.services.AbfsLease.<init>(AbfsLease.java:109)

@steveloughran
Copy link
Contributor Author

test with threads=8 happy; timeout on ITestAzureBlobFileSystemLease which was fine in standalone

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 6s trunk passed
+1 💚 compile 0m 47s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 42s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 38s trunk passed
+1 💚 mvnsite 0m 49s trunk passed
+1 💚 javadoc 0m 44s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 36s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 19s trunk passed
+1 💚 shadedclient 23m 47s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚 mvnsite 0m 34s the patch passed
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 25m 3s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 27s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
106m 9s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/2/artifact/out/Dockerfile
GITHUB PR #5103
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f2247d092ded 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 43d9471
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/2/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Change-Id: Id88e40de4a2f509d2941a635df78649105bfbe97
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 9s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 44s trunk passed
+1 💚 compile 0m 57s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 50s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 46s trunk passed
+1 💚 mvnsite 0m 55s trunk passed
+1 💚 javadoc 0m 52s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 43s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 34s trunk passed
+1 💚 shadedclient 27m 19s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 40s the patch passed
+1 💚 compile 0m 46s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 46s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 37s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 24s the patch passed
+1 💚 mvnsite 0m 41s the patch passed
+1 💚 javadoc 0m 33s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 29s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 39s the patch passed
+1 💚 shadedclient 26m 38s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 17s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
118m 33s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/3/artifact/out/Dockerfile
GITHUB PR #5103
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 6baf6ca9e7cd 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c45724e
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/3/testReport/
Max. process+thread count 613 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5103/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit 7f9ca10 into apache:trunk Nov 8, 2022
asfgit pushed a commit that referenced this pull request Nov 8, 2022
…eadahead (#5103)

* HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead

Adds new config option to turn off readahead
* also allows it to be passed in through openFile(),
* extends ITestAbfsReadWriteAndSeek to use the option, including one
  replicated test...that shows that turning it off is slower.

Important: this does not address the critical data corruption issue
HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

What is does do is provide a way to completely bypass the ReadBufferManager.
To mitigate the problem, either fs.azure.enable.readahead needs to be set to false,
or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken)
ReadBufferManager code, but does't trigger the bug.

For safe reading of files through the ABFS connector, readahead MUST be disabled
or the followup fix to HADOOP-18521 applied

Contributed by Steve Loughran
asfgit pushed a commit that referenced this pull request Nov 8, 2022
…eadahead (#5103)

* HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead

Adds new config option to turn off readahead
* also allows it to be passed in through openFile(),
* extends ITestAbfsReadWriteAndSeek to use the option, including one
  replicated test...that shows that turning it off is slower.

Important: this does not address the critical data corruption issue
HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

What is does do is provide a way to completely bypass the ReadBufferManager.
To mitigate the problem, either fs.azure.enable.readahead needs to be set to false,
or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken)
ReadBufferManager code, but does't trigger the bug.

For safe reading of files through the ABFS connector, readahead MUST be disabled
or the followup fix to HADOOP-18521 applied

Contributed by Steve Loughran
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…eadahead (apache#5103)

* HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead

Adds new config option to turn off readahead
* also allows it to be passed in through openFile(),
* extends ITestAbfsReadWriteAndSeek to use the option, including one
  replicated test...that shows that turning it off is slower.

Important: this does not address the critical data corruption issue
HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

What is does do is provide a way to completely bypass the ReadBufferManager.
To mitigate the problem, either fs.azure.enable.readahead needs to be set to false,
or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken)
ReadBufferManager code, but does't trigger the bug.

For safe reading of files through the ABFS connector, readahead MUST be disabled
or the followup fix to HADOOP-18521 applied

Contributed by Steve Loughran
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants