-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AUTO_RANDOM mode #1242
base: branch-2.2.x
Are you sure you want to change the base?
Add AUTO_RANDOM mode #1242
Conversation
/gcbrun |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## branch-2.2.x #1242 +/- ##
==================================================
- Coverage 80.83% 80.77% -0.06%
+ Complexity 2417 2416 -1
==================================================
Files 167 167
Lines 10815 10861 +46
Branches 1197 1211 +14
==================================================
+ Hits 8742 8773 +31
- Misses 1544 1552 +8
- Partials 529 536 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageClientReadChannel.java
Outdated
Show resolved
Hide resolved
@@ -210,14 +212,41 @@ private class ContentReadChannel { | |||
// in-place seeks. | |||
private byte[] skipBuffer = null; | |||
private ReadableByteChannel byteChannel = null; | |||
// Keeps track of distance between last 2 consecutive request. | |||
private LimitedFifoQueue<Long> requestDistance = new LimitedFifoQueue<Long>(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final. Also at other places.
} | ||
|
||
@Override | ||
public boolean add(E o) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can vectorized IO call this concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, VectoredRead don't use GcsReadChannel concurrently using multiple threads. GcsReadChannel class in not thread safe so is the FifoQueue.
gcs/CONFIGURATION.md
Outdated
* `AUTO_RANDOM` - in this mode connector starts with bounded range | ||
requests when reading non gzip-encoded object and switches to streaming | ||
request, bounded by `fs.gs.block.size`, if previous two requests follows | ||
sequential read pattern i.e. forward seeks which are within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is scope for improvement in the documentation. I did not get the gist of this flag by reading the documentation.
It is explaining WHAT the feature is doing. If you can also add WHEN this flag makes, sense, it would be useful the future readers of the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageClientReadChannel.java
Outdated
Show resolved
Hide resolved
|
||
private boolean isSequentialAccessPattern() { | ||
if (servedRequestLastIndex != -1) { | ||
requestDistance.add(currentPosition - servedRequestLastIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bit of a code smell. A "get" method updating the state. Is there a way to avoid it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not updating the sate of content but it's internal state about how to access data. This is the feature we are offering with AUTO
and AUTO_RANDOM
.
Similarly, read
operation updates the currentPosition
pointer in file, that is also a state change.
@@ -210,14 +212,41 @@ private class ContentReadChannel { | |||
// in-place seeks. | |||
private byte[] skipBuffer = null; | |||
private ReadableByteChannel byteChannel = null; | |||
// Keeps track of distance between last 2 consecutive request. | |||
private LimitedFifoQueue<Long> requestDistance = new LimitedFifoQueue<Long>(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create this only for AUTO_RANDOM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can but it adds multiple if calls in regular path. Given it's limited to just 2 long I am not too much worried about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyhow, it wasn't a big headache. Removed it and also made it configurable.
@@ -43,6 +44,7 @@ public enum Fadvise { | |||
public static final boolean DEFAULT_FAST_FAIL_ON_NOT_FOUND = true; | |||
public static final boolean DEFAULT_SUPPORT_GZIP_ENCODING = true; | |||
public static final long DEFAULT_INPLACE_SEEK_LIMIT = 8 * 1024 * 1024; | |||
public static final long BLOCK_SIZE = 64 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we take the connector block_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we define it's own constants. Defaults getting picked in [GoogleHadoopFileSystemConfiguration.java] (https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/1242/files#diff-f06c91b66e47300ff6c940ca14f152898b99e6e48033502fd4c1dd69c07f0c68) is using the connector BLOCK_SIZE.
gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageClientReadChannel.java
Show resolved
Hide resolved
long endPosition = objectSize; | ||
|
||
if (sequentialAccess) { | ||
endPosition = objectSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this line required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required.
0e41e79
to
38fc47b
Compare
No description provided.