Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate minio.Object ETag between read requests #662

Merged
merged 1 commit into from
May 2, 2017

Conversation

krisis
Copy link
Member

@krisis krisis commented Apr 24, 2017

On each new request, we compare minio.Object's cached ETag with the current response ETag to error out if the object has changed since the last read.

Fixes #656


// Check if etag from response matches Object's etag.
case o.etag != response.objectInfo.ETag:
return response, errors.New("object has been modified since last read")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the ErrorResponse thingy here and return BadDigest instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshavardhana Since this error is returned by minio.Object's Stat, Read and ReadAt, I was thinking if syscall.EBADFD would be more appropriate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the ErrorResponse thingy here and return BadDigest instead.

@harshavardhana, The Content-Md5 you specified did not match what we received. can be misleading to someone receiving this error for Read or ReadAt on minio.Object. Further, the request headers sent during a GetObject were correct, at that point in time.Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syscall.EBADFD is good would work with *os.File behavior.


// Check if etag from response matches Object's etag.
case o.etag != response.objectInfo.ETag:
return response, syscall.EBADF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we always return type ErrorResponse for error it's better to:
return ErrInvalidObjectName("ETag has changed")

instead of returning syscall.EBADF

But this sounds better:
return ErrInvalidObject("ETag has changed")

we could rename ErrInvalidObjectName() to ErrInvalidObject()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krisis @harshavardhana mentioned that he discussed this with you. The reason ErrInvalidObject("ETag has changed") is better than syscall.EBADF is if application logs the error it will know exactly why error was returned (instead of a generic bad FD)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krishnasrinivas, I agree that syscall.EBADF doesn't provide context to the user. How about "object has been modified since last read"? I find "ETag has changed" not entirely helpful. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be encapsulated inside ErrInvalidObjectNamet() @krisis which provides ErrorResponse compatible error. This way caller can look at the errResp.Code == NoSuchKey and can decide to re-open the connection to discard and start reading again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshavardhana It doesn't make sense to return NoSuchKey. IMO InvalidObjectState[1] is a more apt error response code. thoughts?

[1] http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoSuchKey meaning we don't see the same object anymore i.e based on the ETag.

@krisis krisis force-pushed the issue/656 branch 2 times, most recently from 1ceb3bd to 75f0a3a Compare April 26, 2017 05:43
@harshavardhana
Copy link
Member

Actually now that i think about it. This is not the right fix - we should have been using If-Match tag in GetObject header and let server return proper error.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html

@deekoder
Copy link
Contributor

is this PR still valid? Please advise if it needs to be closed then?

@harshavardhana
Copy link
Member

is this PR still valid? Please advise if it needs to be closed then?

It is valid it needs to be re-done since we merged #670

@krisis
Copy link
Member Author

krisis commented May 1, 2017

@harshavardhana @krishnasrinivas I have updated the PR to use recent changes in c.getObject method to include request headers.

@harshavardhana
Copy link
Member

@harshavardhana @krishnasrinivas I have updated the PR to use recent changes in c.getObject method to include request headers.

Will test thanks @krisis

@@ -105,7 +105,7 @@ func (c Client) GetObject(bucketName, objectName string) (*Object, error) {
// Do not set objectInfo from the first readAt request because it will not get
// the whole object.
reqHeaders.SetRange(req.Offset, req.Offset+int64(len(req.Buffer))-1)
httpReader, _, err = c.getObject(bucketName, objectName, reqHeaders)
httpReader, objectInfo, err = c.getObject(bucketName, objectName, reqHeaders)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing objectInfo for each ReadAt changes the Size() of the object.. Is the Size() of the object expected to change . Stat() method is supposed to return the actual object size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test this with all other conditions such as a combination of ReadAt() and then Read()/Seek() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing objectInfo for each ReadAt changes the Size() of the object.. Is the Size() of the object expected to change . Stat() method is supposed to return the actual object size

objectInfo (i.e, minio.Object.objectInfo) is not being replaced on each ReadAt. if !o.objectInfoSet && !req.isReadAt check in doGetRequest ensures that objectInfo is set only the first time and never from a ReadAt call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that Size returned is lesser than the original Size of the object. So it means that if the Read() is ensued after ReadAt() - would result in an EOF without reading the whole object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test this with all other conditions such as a combination of ReadAt() and then Read()/Seek() ?

I have added a functional test that makes a ReadAt, Stat and ReadAt call on minio.Object in that order, in api_functional_v4_test.go. Between the two ReadAt calls the object is modified in the store. I shall see how I can add tests that will cover other sequences of minio.Object methods, especially Read, ReadAt and Stat.

Copy link
Member

@harshavardhana harshavardhana May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need something like a user did following sequence of events tested.

  • ReadAt() to read some bytes at an offset.
  • Then proceeds to start Read() instead of ReadAt().

Or

  • ReadAt() to read some bytes at offset.
  • Seek() try to seek past the offset+length used in ReadAt().

The reason previously objectInfo was not updated in ReadAt() was to ensure that it is updated only when a Read() or a Seek() is called. I am pretty sure setting objectInfo would cause some unknown situations during ReadAt().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now there is an additional check.. to ignore objectInfo sent from ReadAt..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason previously objectInfo was not updated in ReadAt() was to ensure that it is updated only when a Read() or a Seek() is called. I am pretty sure setting objectInfo would cause some unknown situations during ReadAt().

There is a check in doGetRequest which ensures objectInfo is not updated. See https://github.com/minio/minio-go/pull/662/files#diff-0afcc28c2029fa14fe9a221e90b524e5R307.


// Read again only to find object contents have been modified since last read.
_, err = reader.ReadAt(b, int64(n))
if err.Error() != s3ErrorResponseMap["PreconditionFailed"] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that err is not nil. Since you need to fail for that as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case ReadAt call should fail. If it returned nil it implies that this test failed. I can add err != nil for readability sake.

}

// Confirm that a Stat() call in between doesn't change the Object's cached etag.
_, err = reader.Stat()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you be reading the cached value and validating it as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this sequence of method calls, namely PutObject, ReadAt, PutObject, Stat ..., we can check for object size to be len(newContent). If Read call replaced the (first) ReadAt, then the objectInfo.Size would be len(content). In summary, Stat method may return stale objectInfo while Read and ReadAt methods will return error if the object is modified in the object store.

This brings us to if we should return error on Stat if the object is modified since the call to GetObject.


defer c.RemoveObject(bucketName, objectName)

reader, err := c.GetObject(bucketName, objectName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close the reader for posterity.

@@ -2422,3 +2423,75 @@ func TestFunctional(t *testing.T) {
t.Fatal("Error: ", err)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment..

harshavardhana
harshavardhana previously approved these changes May 1, 2017
Copy link
Contributor

@krishnasrinivas krishnasrinivas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// GetObject - returns an seekable, readable object.
func (c Client) GetObject(bucketName, objectName string) (*Object, error) {
        // Input validation.
        if err := isValidBucketName(bucketName); err != nil {
                return nil, err
        }
        if err := isValidObjectName(objectName); err != nil {
                return nil, err
        }

        var httpReader io.ReadCloser
        var objectInfo ObjectInfo

ETag information is already available in the objectInfo, so you could use it instead of maintaining it in getRequest

@krisis
Copy link
Member Author

krisis commented May 2, 2017

@krishnasrinivas, you are right, the way we use objectInfo is not straightforward. If you see how doGetRequest uses objectInfo you will see why we need ETag saved separately. We can simplify the code structure in a different PR.

@krishnasrinivas
Copy link
Contributor

@krisis no, see this is all we need: krishnasrinivas@fa0e640

@krisis
Copy link
Member Author

krisis commented May 2, 2017

@krishnasrinivas, could you test your changes with Get object followed by ReadAt on the reader, putobject modifying the object and another ReadAt? Meanwhile I shall check how it is different from what is present in this PR.

@krisis
Copy link
Member Author

krisis commented May 2, 2017

@krisis no, see this is all we need: krishnasrinivas/minio-go@fa0e640

This fix looks good to me. I will make a minor change where var etag string will be moved into the go-routine. I shall revert my changes and include this.

Thanks @krishnasrinivas for the simple approach.

- Add a functional test case to confirm the fix.
@harshavardhana harshavardhana merged commit e461683 into minio:master May 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants