Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8422 list multipart uploads #8531

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open

Conversation

ItamarYuran
Copy link
Contributor

@ItamarYuran ItamarYuran commented Jan 21, 2025

Closes #8422

Introducing List Multiupart uploads command to the s3 gateway.

example command:
✗ aws s3api --endpoint-url=http://127.0.0.1:8000/ list-multipart-uploads --bucket itamar --profile default

This will return all multipart uploads (upload id and path) that are held by the tracker.

@ItamarYuran ItamarYuran added the include-changelog PR description should be included in next release changelog label Jan 21, 2025
@ItamarYuran ItamarYuran linked an issue Jan 21, 2025 that may be closed by this pull request
Copy link

github-actions bot commented Jan 21, 2025

🎊 PR Preview 7b1500d has been successfully built and deployed to https://treeverse-lakeFS-preview-pr-8531.surge.sh

🕐 Build time: 0.01s

🤖 By surge-preview

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link

github-actions bot commented Jan 21, 2025

E2E Test Results - Quickstart

11 passed

@ItamarYuran ItamarYuran requested a review from a team January 22, 2025 10:04
@idanovo
Copy link
Contributor

idanovo commented Jan 22, 2025

@ItamarYuran Can you please make sure the CI passes first? 🙏

@ItamarYuran ItamarYuran force-pushed the 8422-listmultipartuploads branch from ce35f96 to 47a576e Compare January 23, 2025 06:39
@ItamarYuran ItamarYuran force-pushed the 8422-listmultipartuploads branch from 47a576e to cff37f2 Compare January 23, 2025 06:44
@ItamarYuran ItamarYuran requested a review from nopcoder January 28, 2025 09:25
@ItamarYuran
Copy link
Contributor Author

Note - the marker works only when using both key marker & upload is marker. The key marker uses the physical address to mark the pagination. The physical address of the marked upload is of course returned as well when there are uploads left for next pagination.

Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a step back and looking on the request which accepts Path and KeyMarker.

The underlaying storage can't provide this information as it holds the physical address and not the logical path the user provides.

I suggest having a test / use-case that test this code first to provide a working solution.
Implementation wise - it means that the information can be tracked and it means you will need this capability from the tracker. and in this case consider adding it to all the adapters and not specific s3 feature.

@nopcoder
Copy link
Contributor

Update our (@ItamarYuran and myself) conversion output:

  • ListMultipartUploads implementation uses s3 adapter, similar to how ListParts works today (only on s3)
  • We verified that Apache Nifi PutS3 source code to understand that there is no use of 'prefix' or other special case paramters
  • The API should support pagination
  • Calling the API with 'prefix' or unsupported options should fail

@ItamarYuran ItamarYuran requested a review from nopcoder January 28, 2025 14:47
Copy link
Contributor

@idanovo idanovo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor comments, but overall, it looks good!

esti/s3_gateway_test.go Outdated Show resolved Hide resolved
esti/s3_gateway_test.go Outdated Show resolved Hide resolved
Comment on lines 376 to 391
// unsupported headers, expect error
delimiter := aws.String("/")
prefix := aws.String("prefix")
encodingType := types.EncodingTypeUrl

output, err = s3Client.ListMultipartUploads(ctx, &s3.ListMultipartUploadsInput{Bucket: resp1.Bucket, Delimiter: delimiter})
require.Error(t, err)
require.Contains(t, err.Error(), "NotImplemented")

output, err = s3Client.ListMultipartUploads(ctx, &s3.ListMultipartUploadsInput{Bucket: resp1.Bucket, Prefix: prefix})
require.Error(t, err)
require.Contains(t, err.Error(), "NotImplemented")

output, err = s3Client.ListMultipartUploads(ctx, &s3.ListMultipartUploadsInput{Bucket: resp1.Bucket, EncodingType: encodingType})
require.Error(t, err)
require.Contains(t, err.Error(), "NotImplemented")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these checks should be in a different test @nopcoder WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general you are right but because we are going to skip this one for now for control-plane, might be nice to keep it in one spot. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the multipart upload was the only test that we used esti to check as it requires 5M uploads per part.
Agree with Idean - this part can be part the adapter package test as it should represent the spec we supprot.

@@ -204,6 +205,11 @@ func (a APIErrorCode) ToAPIErr() APIError {
// Codes - error code to APIError structure, these fields carry respective
// descriptions for all the error responses.
var Codes = errorCodeMap{
ErrInvalidArgument: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is too generic for this error.
If I wouldn't read the description I could use this error in other places.
Maybe call it ErrInvalidMaxUploads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the err i got from s3 while trying it with bad input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this one - The problem is not with the generic name is the format of the message and how we map the errors in the gateway.
Each error is mapped latest inside:

// Codes - error code to APIError structure, these fields carry respective
// descriptions for all the error responses.
var Codes = errorCodeMap

Map the specific error message and add it into both places.

@ItamarYuran ItamarYuran mentioned this pull request Jan 29, 2025
pkg/block/s3/adapter.go Outdated Show resolved Hide resolved
pkg/gateway/operations/listobjects.go Show resolved Hide resolved
Comment on lines 474 to 475
NextKeyMarker: *mpuResp.NextKeyMarker,
NextUploadIDMarker: *mpuResp.NextUploadIDMarker,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the AWS's helper function to get the value and check if the value is not nil - aws.StringValue

Comment on lines 171 to 173
NextKeyMarker string `xml:"NextKeyMarker"`
NextUploadIDMarker string `xml:"NextUploadIDMarker"`
IsTruncated bool `xml:"IsTruncated"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that if there is no pagination required, does S3 remove these fields - which means that we need to set them as omitempty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case there is no pagination, itTruncated does not appear in s3's answer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we should use omitempty

Comment on lines 433 to 434
maxUploads32 := int32(maxUploads)
opts.MaxUploads = &maxUploads32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We didn't verified that the value is 0-1000
  • Condition settings MaxUploads is above 0. The use of pointers in the AWS APIs is specify optionals and in case we do not pass values, we should use the same while working with the underlying SDK (at least try when it makes sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now aligned with s3, the input is checked and expects a value between 0-2147483647. other values will raise s3's err

pkg/gateway/operations/listobjects.go Show resolved Hide resolved
@@ -289,6 +289,134 @@ func TestS3IfNoneMatch(t *testing.T) {
}
}

func TestListMultipartUploads(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should run only on S3 - I assume that the tests didn't pass for azure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @idanovo asked, these tests run for other adapters as well, only until the the first list mpu request in which they are expected to fail. then they are being skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nopcoder I wanted to make sure we get the right error for other block stores

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I don't understand how it works as we currently do not support this flow for all adapters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if blockStoreType != "s3" {
		require.Contains(t, err.Error(), "NotImplemented")
		return
	}

Copy link
Contributor

@nopcoder nopcoder Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see the check down the code of this function - but why do we even start to upload if we can't test it on non-s3 implementation?
we have a different test to check missing functionality

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be tested in another test.
@ItamarYuran, my suggestion is to:

  1. Change TestListMultipartUploads TestListMultipartUploadsUnsupported back to t.skipp with a proper message if blockStoreType != "s3"
  2. Change TestListMultipartUploadsUnsupported name to indicate it checks only unsupported parameters
  3. Add another test that only checks that ListMultipartUpload returns the right error if blockStoreType != "s3"
    @nopcoder WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think that the latest change include the above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't

  1. We are returning and not skipping
  2. We don't test the error we get for object storage that are not s3

Comment on lines 377 to 379

}
func TestListMultipartUploadsUnsupported(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
func TestListMultipartUploadsUnsupported(t *testing.T) {
}
func TestListMultipartUploadsUnsupported(t *testing.T) {

Comment on lines 390 to 404
delimiter := aws.String("/")
prefix := aws.String("prefix")
encodingType := types.EncodingTypeUrl

_, err := s3Client.ListMultipartUploads(ctx, &s3.ListMultipartUploadsInput{Bucket: Bucket, Delimiter: delimiter})
require.Error(t, err)
require.Contains(t, err.Error(), "NotImplemented")

_, err = s3Client.ListMultipartUploads(ctx, &s3.ListMultipartUploadsInput{Bucket: Bucket, Prefix: prefix})
require.Error(t, err)
require.Contains(t, err.Error(), "NotImplemented")

_, err = s3Client.ListMultipartUploads(ctx, &s3.ListMultipartUploadsInput{Bucket: Bucket, EncodingType: encodingType})
require.Error(t, err)
require.Contains(t, err.Error(), "NotImplemented")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As far as I remember we don't capture unsupported states. Wonder if need this extra test as we don't do it for all the adapters.
  2. Can use t.Run to capture each state

pkg/gateway/operations/listobjects.go Show resolved Hide resolved
@ItamarYuran ItamarYuran requested a review from nopcoder January 30, 2025 08:43
Copy link
Contributor

@idanovo idanovo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address all @nopcoder's comments before merging 🙏

Comment on lines 24 to 25
maxUploadsListMPU = 2147483647
minUploadsListMPU = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if we leave the verification to the underlaying library - you can remove maxUploadsListMPU - it is MaxInt32 value and it will not pass this value as the cast will not let you.

  2. if we plan to pass 0 as a valid value and let the underlying package handle it - we just need to check if maxUploads32 < 0

@ItamarYuran ItamarYuran requested a review from nopcoder January 30, 2025 17:12
Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: LakeFS does not support the S3 ListMultipartUploads API
3 participants