-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead #5103
HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead #5103
Conversation
…eadahead Adds new config option to turn off readahead * also allows it to be passed in through openFile(), which means that if you set it in a mr/spark/hive job conf with "mapreduce.job.input.file.option." as the prefix, it will be disabled when used through LineRecordReader, FixedLengthRecordReader and a few others. * extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower. Change-Id: Icfc3b6578654aff0b1264d5e116d4bf9eb414881
testing against s3 cardiff. all ok except that -ITestAbfsReadWriteAndSeek.testReadAndWriteWithDifferentBufferSizesAndSeek[Size=104,857,600-readahead=true]] is taking so long to write the file that the test can time out if other things are under way on the laptop. I'm not going to change that test though to make it configurable, add readFully() actions or similar. i do have plans there, as it is the single "huge file" test in the codebase, but not in this PR |
🎊 +1 overall
This message was automatically generated. |
i'm getting test timeouts on big uploads tonight.
earlier today i was uploading multiGB files to the same a/c, which was triggering throttling (laptop was plugged in to firewall). its now on wifi, so either its that or some throttling is still taking place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I entered a couple of minor cosmetic comments that I don't consider to be blockers, so I'll leave it to your choice on whether or not to address.
Thanks, Steve!
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamContext.java
Outdated
Show resolved
Hide resolved
thanks; will do the changes, i'd just cherrypicked the internal code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes makes sense. +1. Thanks for running the Integrations tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
@@ -106,6 +106,7 @@ public final class FileSystemConfigurations { | |||
public static final boolean DEFAULT_ABFS_LATENCY_TRACK = false; | |||
public static final long DEFAULT_SAS_TOKEN_RENEW_PERIOD_FOR_STREAMS_IN_SECONDS = 120; | |||
|
|||
public static final boolean DEFAULT_ENABLE_READAHEAD = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change the default value to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
Removed the fs.azure.enable.readahead option from openFile(); its a sysadmin option only. For openFile(), I think we should focus on "policy", rather than explicit tuning options. e.g "random/vectored" could disable prefetching; sequential would enable it... Change-Id: I26d2a75f8c0815bf35a9fa876c94b9cfe7b8dc5c
removed openFile() option; this is cluster admin only. tested with 14 threads and got timeout on lease and throttling out of range. clearly too many threads
|
test with threads=8 happy; timeout on |
🎊 +1 overall
This message was automatically generated. |
Change-Id: Id88e40de4a2f509d2941a635df78649105bfbe97
🎊 +1 overall
This message was automatically generated. |
…eadahead (#5103) * HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead Adds new config option to turn off readahead * also allows it to be passed in through openFile(), * extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower. Important: this does not address the critical data corruption issue HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests What is does do is provide a way to completely bypass the ReadBufferManager. To mitigate the problem, either fs.azure.enable.readahead needs to be set to false, or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken) ReadBufferManager code, but does't trigger the bug. For safe reading of files through the ABFS connector, readahead MUST be disabled or the followup fix to HADOOP-18521 applied Contributed by Steve Loughran
…eadahead (#5103) * HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead Adds new config option to turn off readahead * also allows it to be passed in through openFile(), * extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower. Important: this does not address the critical data corruption issue HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests What is does do is provide a way to completely bypass the ReadBufferManager. To mitigate the problem, either fs.azure.enable.readahead needs to be set to false, or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken) ReadBufferManager code, but does't trigger the bug. For safe reading of files through the ABFS connector, readahead MUST be disabled or the followup fix to HADOOP-18521 applied Contributed by Steve Loughran
…eadahead (apache#5103) * HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead Adds new config option to turn off readahead * also allows it to be passed in through openFile(), * extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower. Important: this does not address the critical data corruption issue HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests What is does do is provide a way to completely bypass the ReadBufferManager. To mitigate the problem, either fs.azure.enable.readahead needs to be set to false, or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken) ReadBufferManager code, but does't trigger the bug. For safe reading of files through the ABFS connector, readahead MUST be disabled or the followup fix to HADOOP-18521 applied Contributed by Steve Loughran
Adds new config option fs.azure.enable.readahead to turn off readahead
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?