Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3: Add paging to list_object_versions() #6896

Merged
merged 7 commits into from
Dec 11, 2023

Conversation

sveniu
Copy link
Contributor

@sveniu sveniu commented Oct 10, 2023

This adds paging to list_object_versions(). The original motivation was to enable moto testing for an S3 point-in-time recovery tool, where proper handling of paging is important.

CC @bblommers

Copy link
Collaborator

@bblommers bblommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sveniu!

Moto has a @paginator decorator to simplify this - it should allow us to keep the existing logic intact, with the decorator doing all the heavy lifting around pagination.
See the docs here: http://docs.getmoto.org/en/latest/docs/contributing/development_tips/utilities.html#paginator

@sveniu
Copy link
Contributor Author

sveniu commented Oct 10, 2023

Hi @bblommers. Thanks for the pointer, I wasn't aware of that decorator. Looking at it, it seems it doesn't support multiple input tokens such as key-marker + version-id-marker used by the list_object_versions call. I'll see if I can wrap my head around the inner workings and add support for it.

@sveniu
Copy link
Contributor Author

sveniu commented Oct 10, 2023

Hi @bblommers. Spent some time on this now, but I'm not able to figure out how to adapt the paginator to work with S3's list_object_versions, which uses two input tokens (key-marker and version-id-marker). Would you be able to help me out with some pointers: Do you think the paginator should work well with this use case, and I'm just looking at it the wrong way? Or would the paginator need a large-ish rewrite to work with multiple inputs?

@bblommers
Copy link
Collaborator

@sveniu You're right, that is not straightforward with the current implementation. Let's stick with this then for now, we can always refactor it later.

Looks like the linter is failing: https://github.com/getmoto/moto/actions/runs/6465933620/job/17577370491?pr=6896

if version_id_marker is None:
return key_marker == ver.name

return key_marker == ver.name and version_id_marker == ver.version_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be key_marker <= ver.name and version_id_marker <= ver.version_id?

As the docs specifically state:

Specifies the key to start with when listing objects in a bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work if version_id was sequential, but it is basically a random string. I first tried just adding version_id_marker <= ver.version_id, but it produced flaky results depending on the random version_id values.

I opted for itertools.dropwhile here, and the helper function simply needs to signal when the first match is found, using exact matching, and dropwhile will then return the remaining items. Quite suitable for this purpose, I think. Was thinking to implement that version_id "search" manually with a version_marker_found bool, but found dropwhile to be a good fit. Let me know if you have thoughts on it.

The main loop still applies the prefix and delimiter logic, and with the max_keys paging added, it looks to me like it behaves like the real S3 server does.

I'd be happy to add more tests if there are suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Would you mind merging in the latest master? That should fix the linter - if the existing tests all pass, then I'd be happy to merge then.

@bblommers
Copy link
Collaborator

Actually - it looks like the linter is failing on master as well, due to a MyPy update. I've raised #6898 to fix that.

The is_latest handling was mistakenly removed, causing all versions and
delete markers to have <IsLatest>false</IsLatest>. This restores the
earlier, correct behaviour.
Improve readability of the test.
Add a more thorough test that mixes versions and delete markers.
@sveniu sveniu force-pushed the s3-list-object-versions-paging branch from b2b8f2e to 92adb66 Compare October 12, 2023 08:35
@sveniu
Copy link
Contributor Author

sveniu commented Oct 12, 2023

I've rebased on top of master, and added some fixup commits. I assume they'll be squash-merged automatically. Otherwise I can squash on my end and force-push a single commit.

Some tests still fail, but they seem unrelated to the modified code?

@bblommers
Copy link
Collaborator

@sveniu The linter is still failing - I think you would have to use Union[..] instead of X | Y:

https://github.com/getmoto/moto/actions/runs/6493285516/job/17635218175?pr=6896

I'll have a look at the failing Terraform tests later on.

Fix linter error:

  moto/s3/models.py:1818:39: error: X | Y syntax for unions requires Python 3.10  [syntax]
@bblommers
Copy link
Collaborator

OK, so there is a change in behaviour somewhere. The existing test that now fails, test_list_object_versions_with_delimiter, succeeds when running it against AWS, so it should pass with Moto as well.

Conversely, the new test test_list_object_versions_with_paging currently fails when running it against AWS - so something isn't quite right with the implementation.

Fix paging for common prefixes.
Clean up response so that:

- VersionIdMarker is included, even if it's an empty string.

- NextVersionIdMarker is only included if it's non-empty.
@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Merging #6896 (0f6602d) into master (7e891f7) will increase coverage by 0.00%.
Report is 17 commits behind head on master.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master    #6896    +/-   ##
========================================
  Coverage   95.81%   95.82%            
========================================
  Files         826      826            
  Lines       81009    81126   +117     
========================================
+ Hits        77622    77741   +119     
+ Misses       3387     3385     -2     
Flag Coverage Δ
servertests 36.72% <0.00%> (-0.04%) ⬇️
unittests 95.77% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
moto/s3/models.py 97.06% <100.00%> (+0.03%) ⬆️
moto/s3/responses.py 95.57% <100.00%> (+0.03%) ⬆️

... and 17 files with indirect coverage changes

@sveniu
Copy link
Contributor Author

sveniu commented Oct 18, 2023

Finally fixed how paging is done when common prefixes are involved – at least moto now behaves the same as the real S3 server there, and tests look green now.

@bblommers Dunno if you have more thoughts on this – let me know if you do. Otherwise, the branch is full of fixup commits, and I guess you do a squash-merge usually?

@bblommers
Copy link
Collaborator

Hi @sveniu, thanks for continuing to chip away at this!

As far as I can tell, there are still two issues with this:

  1. the next_version_id_marker should actually point to the last version_id that is returned (naming is hard..)
  2. objects should be sorted by type -> name -> date, so that DeleteMarkers are returned first.

Point 2 is an issue with the existing implementation, and the existing tests also do not really verify ordering. But from what I can see, this is the order in which AWS returns things:

  1. DeleteMarkers from first object (sorted by time)
  2. First object
  3. DeleteMarkers from second object
  4. Second object

If you want to test this out, I've modified the new tests slightly so that they pass against AWS.
This s3_aws_verified-decorator will automatically create the bucket for you, and will also automatically delete all objects and the bucket itself after the test ends - so you don't have to worry about having to clean your real AWS account after running these tests.

diff --git a/tests/test_s3/test_s3.py b/tests/test_s3/test_s3.py
index 63ead96d3..3d5ce2422 100644
--- a/tests/test_s3/test_s3.py
+++ b/tests/test_s3/test_s3.py
@@ -2613,11 +2613,10 @@ def test_list_object_versions_with_versioning_enabled_late():
     assert response["Body"].read() == items[-1]
 
 
-@mock_s3
-def test_list_object_versions_with_paging():
[email protected]_verified
+@s3_aws_verified
+def test_list_object_versions_with_paging(bucket_name=None):
     s3_client = boto3.client("s3", region_name=DEFAULT_REGION_NAME)
-    bucket_name = "000" + str(uuid4())
-    s3_client.create_bucket(Bucket=bucket_name)
     s3_client.put_bucket_versioning(
         Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"}
     )
@@ -2647,7 +2646,7 @@ def test_list_object_versions_with_paging():
     assert "NextKeyMarker" in page1
     assert page1["NextKeyMarker"] == "obj1"
     assert "NextVersionIdMarker" in page1
-    assert page1["NextVersionIdMarker"] == obj1ver1["VersionId"]
+    assert page1["NextVersionIdMarker"] == obj1ver2["VersionId"]
 
     # Second page should be the last page and have the oldest version.
     page2 = s3_client.list_object_versions(
@@ -2668,11 +2667,10 @@ def test_list_object_versions_with_paging():
     assert page2["Versions"][0]["VersionId"] == obj1ver1["VersionId"]
 
 
-@mock_s3
-def test_list_object_versions_with_paging_and_delete_markers():
[email protected]_verified
+@s3_aws_verified
+def test_list_object_versions_with_paging_and_delete_markers(bucket_name=None):
     s3_client = boto3.client("s3", region_name=DEFAULT_REGION_NAME)
-    bucket_name = "000" + str(uuid4())
-    s3_client.create_bucket(Bucket=bucket_name)
     s3_client.put_bucket_versioning(
         Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"}
     )
@@ -2706,7 +2704,7 @@ def test_list_object_versions_with_paging_and_delete_markers():
     assert "NextKeyMarker" in page1
     assert page1["NextKeyMarker"] == "obj1"
     assert "NextVersionIdMarker" in page1
-    assert page1["NextVersionIdMarker"] == obj1dmk1["VersionId"]
+    assert page1["NextVersionIdMarker"] == obj1dmk2["VersionId"]
 
     page2 = s3_client.list_object_versions(
         Bucket=bucket_name,
@@ -2728,9 +2726,9 @@ def test_list_object_versions_with_paging_and_delete_markers():
 
     # The next key/version markers should point to obj2 dmk1.
     assert "NextKeyMarker" in page2
-    assert page2["NextKeyMarker"] == "obj2"
+    assert page2["NextKeyMarker"] == "obj1"
     assert "NextVersionIdMarker" in page2
-    assert page2["NextVersionIdMarker"] == obj2dmk1["VersionId"]
+    assert page2["NextVersionIdMarker"] == obj1ver1["VersionId"]
 
     page3 = s3_client.list_object_versions(
         Bucket=bucket_name,
@@ -2756,11 +2754,10 @@ def test_list_object_versions_with_paging_and_delete_markers():
     assert "NextVersionIdMarker" not in page3
 
 
-@mock_s3
-def test_list_object_versions_with_paging_and_delimiter():
[email protected]_verified
+@s3_aws_verified
+def test_list_object_versions_with_paging_and_delimiter(bucket_name=None):
     s3_client = boto3.client("s3", region_name=DEFAULT_REGION_NAME)
-    bucket_name = "000" + str(uuid4())
-    s3_client.create_bucket(Bucket=bucket_name)
     s3_client.put_bucket_versioning(
         Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"}
     )
@@ -2807,7 +2804,6 @@ def test_list_object_versions_with_paging_and_delimiter():
         Delimiter="with-",
         Prefix="key1",
         KeyMarker="key12-with-",
-        VersionIdMarker="",
         MaxKeys=5,
     )

By default, the test will still run against Moto, but run it against AWS like so:

MOTO_TEST_ALLOW_AWS_REQUEST=true pytest -sv tests/test_s3/test_s3.py -m aws_verified

@bblommers
Copy link
Collaborator

Oh, and to answer your question - Yes, I always squash-merge, so don't worry about the commit history. 🙂

@bentsku
Copy link
Contributor

bentsku commented Oct 22, 2023

Hello, just chiming in here regarding the sorting order: if you put an object then delete it without specifying the object version then running ListObjectVersions with MaxKeys=1, you will get the Versions field and not the DeleteMarkers, which shows that the time is sorted before the type. Running the test with more keys shows the order is name > time > type it seems.

@bblommers
Copy link
Collaborator

I'll just merge this as is, considering it is already a major improvement on what we have now. I'll raise a separate PR with the final touches.

Thank you for all your work on this @sveniu!

@bblommers bblommers added this to the 4.2.12 milestone Dec 11, 2023
@bblommers bblommers merged commit 2522bc4 into getmoto:master Dec 11, 2023
@sveniu sveniu deleted the s3-list-object-versions-paging branch December 11, 2023 20:22
Copy link
Contributor

This is now part of moto >= 4.2.12.dev17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants