-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate minio.Object ETag between read requests #662
Conversation
api-get-object.go
Outdated
|
||
// Check if etag from response matches Object's etag. | ||
case o.etag != response.objectInfo.ETag: | ||
return response, errors.New("object has been modified since last read") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the ErrorResponse thingy here and return BadDigest instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshavardhana Since this error is returned by minio.Object
's Stat
, Read
and ReadAt
, I was thinking if syscall.EBADFD
would be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the ErrorResponse thingy here and return BadDigest instead.
@harshavardhana, The Content-Md5 you specified did not match what we received.
can be misleading to someone receiving this error for Read
or ReadAt
on minio.Object
. Further, the request headers sent during a GetObject
were correct, at that point in time.Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syscall.EBADFD is good would work with *os.File behavior.
api-get-object.go
Outdated
|
||
// Check if etag from response matches Object's etag. | ||
case o.etag != response.objectInfo.ETag: | ||
return response, syscall.EBADF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we always return type ErrorResponse
for error it's better to:
return ErrInvalidObjectName("ETag has changed")
instead of returning syscall.EBADF
But this sounds better:
return ErrInvalidObject("ETag has changed")
we could rename ErrInvalidObjectName()
to ErrInvalidObject()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krisis @harshavardhana mentioned that he discussed this with you. The reason ErrInvalidObject("ETag has changed")
is better than syscall.EBADF
is if application logs the error it will know exactly why error was returned (instead of a generic bad FD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krishnasrinivas, I agree that syscall.EBADF
doesn't provide context to the user. How about "object has been modified since last read"
? I find "ETag has changed" not entirely helpful. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be encapsulated inside ErrInvalidObjectNamet() @krisis which provides ErrorResponse compatible error. This way caller can look at the errResp.Code == NoSuchKey
and can decide to re-open the connection to discard and start reading again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshavardhana It doesn't make sense to return NoSuchKey
. IMO InvalidObjectState
[1] is a more apt error response code. thoughts?
[1] http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoSuchKey meaning we don't see the same object anymore i.e based on the ETag.
1ceb3bd
to
75f0a3a
Compare
Actually now that i think about it. This is not the right fix - we should have been using http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html |
is this PR still valid? Please advise if it needs to be closed then? |
It is valid it needs to be re-done since we merged #670 |
@harshavardhana @krishnasrinivas I have updated the PR to use recent changes in |
Will test thanks @krisis |
@@ -105,7 +105,7 @@ func (c Client) GetObject(bucketName, objectName string) (*Object, error) { | |||
// Do not set objectInfo from the first readAt request because it will not get | |||
// the whole object. | |||
reqHeaders.SetRange(req.Offset, req.Offset+int64(len(req.Buffer))-1) | |||
httpReader, _, err = c.getObject(bucketName, objectName, reqHeaders) | |||
httpReader, objectInfo, err = c.getObject(bucketName, objectName, reqHeaders) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing objectInfo
for each ReadAt changes the Size() of the object.. Is the Size() of the object expected to change . Stat() method is supposed to return the actual object size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test this with all other conditions such as a combination of ReadAt() and then Read()/Seek() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing objectInfo for each ReadAt changes the Size() of the object.. Is the Size() of the object expected to change . Stat() method is supposed to return the actual object size
objectInfo (i.e, minio.Object.objectInfo
) is not being replaced on each ReadAt. if !o.objectInfoSet && !req.isReadAt
check in doGetRequest
ensures that objectInfo is set only the first time and never from a ReadAt call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that Size returned is lesser than the original Size of the object. So it means that if the Read() is ensued after ReadAt() - would result in an EOF without reading the whole object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test this with all other conditions such as a combination of ReadAt() and then Read()/Seek() ?
I have added a functional test that makes a ReadAt, Stat and ReadAt call on minio.Object
in that order, in api_functional_v4_test.go. Between the two ReadAt calls the object is modified in the store. I shall see how I can add tests that will cover other sequences of minio.Object
methods, especially Read, ReadAt and Stat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need something like a user did following sequence of events tested.
- ReadAt() to read some bytes at an offset.
- Then proceeds to start Read() instead of ReadAt().
Or
- ReadAt() to read some bytes at offset.
- Seek() try to seek past the offset+length used in ReadAt().
The reason previously objectInfo
was not updated in ReadAt() was to ensure that it is updated only when a Read() or a Seek() is called. I am pretty sure setting objectInfo would cause some unknown situations during ReadAt().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now there is an additional check.. to ignore objectInfo sent from ReadAt..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason previously objectInfo was not updated in ReadAt() was to ensure that it is updated only when a Read() or a Seek() is called. I am pretty sure setting objectInfo would cause some unknown situations during ReadAt().
There is a check in doGetRequest
which ensures objectInfo
is not updated. See https://github.com/minio/minio-go/pull/662/files#diff-0afcc28c2029fa14fe9a221e90b524e5R307.
|
||
// Read again only to find object contents have been modified since last read. | ||
_, err = reader.ReadAt(b, int64(n)) | ||
if err.Error() != s3ErrorResponseMap["PreconditionFailed"] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that err is not nil. Since you need to fail for that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case ReadAt call should fail. If it returned nil
it implies that this test failed. I can add err != nil
for readability sake.
} | ||
|
||
// Confirm that a Stat() call in between doesn't change the Object's cached etag. | ||
_, err = reader.Stat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you be reading the cached value and validating it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this sequence of method calls, namely PutObject, ReadAt, PutObject, Stat ..., we can check for object size to be len(newContent)
. If Read call replaced the (first) ReadAt, then the objectInfo.Size would be len(content)
. In summary, Stat method may return stale objectInfo while Read and ReadAt methods will return error if the object is modified in the object store.
This brings us to if we should return error on Stat if the object is modified since the call to GetObject
.
|
||
defer c.RemoveObject(bucketName, objectName) | ||
|
||
reader, err := c.GetObject(bucketName, objectName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close the reader for posterity.
@@ -2422,3 +2423,75 @@ func TestFunctional(t *testing.T) { | |||
t.Fatal("Error: ", err) | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// GetObject - returns an seekable, readable object.
func (c Client) GetObject(bucketName, objectName string) (*Object, error) {
// Input validation.
if err := isValidBucketName(bucketName); err != nil {
return nil, err
}
if err := isValidObjectName(objectName); err != nil {
return nil, err
}
var httpReader io.ReadCloser
var objectInfo ObjectInfo
ETag information is already available in the objectInfo, so you could use it instead of maintaining it in getRequest
@krishnasrinivas, you are right, the way we use objectInfo is not straightforward. If you see how doGetRequest uses objectInfo you will see why we need ETag saved separately. We can simplify the code structure in a different PR. |
@krisis no, see this is all we need: krishnasrinivas@fa0e640 |
@krishnasrinivas, could you test your changes with Get object followed by ReadAt on the reader, putobject modifying the object and another ReadAt? Meanwhile I shall check how it is different from what is present in this PR. |
This fix looks good to me. I will make a minor change where |
Thanks @krishnasrinivas for the simple approach. - Add a functional test case to confirm the fix.
On each new request, we compare
minio.Object
's cached ETag with the current response ETag to error out if the object has changed since the last read.Fixes #656