S3: Add paging to list_object_versions() #6896

sveniu · 2023-10-10T07:11:50Z

This adds paging to list_object_versions(). The original motivation was to enable moto testing for an S3 point-in-time recovery tool, where proper handling of paging is important.

CC @bblommers

bblommers

Hi @sveniu!

Moto has a @paginator decorator to simplify this - it should allow us to keep the existing logic intact, with the decorator doing all the heavy lifting around pagination.
See the docs here: http://docs.getmoto.org/en/latest/docs/contributing/development_tips/utilities.html#paginator

sveniu · 2023-10-10T08:18:09Z

Hi @bblommers. Thanks for the pointer, I wasn't aware of that decorator. Looking at it, it seems it doesn't support multiple input tokens such as key-marker + version-id-marker used by the list_object_versions call. I'll see if I can wrap my head around the inner workings and add support for it.

sveniu · 2023-10-10T13:04:16Z

Hi @bblommers. Spent some time on this now, but I'm not able to figure out how to adapt the paginator to work with S3's list_object_versions, which uses two input tokens (key-marker and version-id-marker). Would you be able to help me out with some pointers: Do you think the paginator should work well with this use case, and I'm just looking at it the wrong way? Or would the paginator need a large-ish rewrite to work with multiple inputs?

bblommers · 2023-10-10T20:42:44Z

@sveniu You're right, that is not straightforward with the current implementation. Let's stick with this then for now, we can always refactor it later.

Looks like the linter is failing: https://github.com/getmoto/moto/actions/runs/6465933620/job/17577370491?pr=6896

bblommers · 2023-10-10T20:44:18Z

moto/s3/models.py

+            if version_id_marker is None:
+                return key_marker == ver.name
+
+            return key_marker == ver.name and version_id_marker == ver.version_id


Should this be key_marker <= ver.name and version_id_marker <= ver.version_id?

As the docs specifically state:

Specifies the key to start with when listing objects in a bucket.

That would work if version_id was sequential, but it is basically a random string. I first tried just adding version_id_marker <= ver.version_id, but it produced flaky results depending on the random version_id values.

I opted for itertools.dropwhile here, and the helper function simply needs to signal when the first match is found, using exact matching, and dropwhile will then return the remaining items. Quite suitable for this purpose, I think. Was thinking to implement that version_id "search" manually with a version_marker_found bool, but found dropwhile to be a good fit. Let me know if you have thoughts on it.

The main loop still applies the prefix and delimiter logic, and with the max_keys paging added, it looks to me like it behaves like the real S3 server does.

I'd be happy to add more tests if there are suggestions.

Fair enough. Would you mind merging in the latest master? That should fix the linter - if the existing tests all pass, then I'd be happy to merge then.

bblommers · 2023-10-10T21:06:57Z

Actually - it looks like the linter is failing on master as well, due to a MyPy update. I've raised #6898 to fix that.

The is_latest handling was mistakenly removed, causing all versions and delete markers to have <IsLatest>false</IsLatest>. This restores the earlier, correct behaviour.

Improve readability of the test.

Add a more thorough test that mixes versions and delete markers.

sveniu · 2023-10-12T12:42:50Z

I've rebased on top of master, and added some fixup commits. I assume they'll be squash-merged automatically. Otherwise I can squash on my end and force-push a single commit.

Some tests still fail, but they seem unrelated to the modified code?

bblommers · 2023-10-12T12:58:35Z

@sveniu The linter is still failing - I think you would have to use Union[..] instead of X | Y:

https://github.com/getmoto/moto/actions/runs/6493285516/job/17635218175?pr=6896

I'll have a look at the failing Terraform tests later on.

Fix linter error: moto/s3/models.py:1818:39: error: X | Y syntax for unions requires Python 3.10 [syntax]

bblommers · 2023-10-12T18:30:27Z

OK, so there is a change in behaviour somewhere. The existing test that now fails, test_list_object_versions_with_delimiter, succeeds when running it against AWS, so it should pass with Moto as well.

Conversely, the new test test_list_object_versions_with_paging currently fails when running it against AWS - so something isn't quite right with the implementation.

Fix paging for common prefixes.

Clean up response so that: - VersionIdMarker is included, even if it's an empty string. - NextVersionIdMarker is only included if it's non-empty.

codecov · 2023-10-17T21:03:07Z

Codecov Report

Merging #6896 (0f6602d) into master (7e891f7) will increase coverage by 0.00%.
Report is 17 commits behind head on master.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master    #6896    +/-   ##
========================================
  Coverage   95.81%   95.82%            
========================================
  Files         826      826            
  Lines       81009    81126   +117     
========================================
+ Hits        77622    77741   +119     
+ Misses       3387     3385     -2

Flag	Coverage Δ
servertests	`36.72% <0.00%> (-0.04%)`	⬇️
unittests	`95.77% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
moto/s3/models.py	`97.06% <100.00%> (+0.03%)`	⬆️
moto/s3/responses.py	`95.57% <100.00%> (+0.03%)`	⬆️

... and 17 files with indirect coverage changes

sveniu · 2023-10-18T06:13:04Z

Finally fixed how paging is done when common prefixes are involved – at least moto now behaves the same as the real S3 server there, and tests look green now.

@bblommers Dunno if you have more thoughts on this – let me know if you do. Otherwise, the branch is full of fixup commits, and I guess you do a squash-merge usually?

bblommers · 2023-10-18T18:24:17Z

Hi @sveniu, thanks for continuing to chip away at this!

As far as I can tell, there are still two issues with this:

the next_version_id_marker should actually point to the last version_id that is returned (naming is hard..)
objects should be sorted by type -> name -> date, so that DeleteMarkers are returned first.

Point 2 is an issue with the existing implementation, and the existing tests also do not really verify ordering. But from what I can see, this is the order in which AWS returns things:

DeleteMarkers from first object (sorted by time)
First object
DeleteMarkers from second object
Second object

If you want to test this out, I've modified the new tests slightly so that they pass against AWS.
This s3_aws_verified-decorator will automatically create the bucket for you, and will also automatically delete all objects and the bucket itself after the test ends - so you don't have to worry about having to clean your real AWS account after running these tests.

diff --git a/tests/test_s3/test_s3.py b/tests/test_s3/test_s3.py
index 63ead96d3..3d5ce2422 100644
--- a/tests/test_s3/test_s3.py
+++ b/tests/test_s3/test_s3.py
@@ -2613,11 +2613,10 @@ def test_list_object_versions_with_versioning_enabled_late():
     assert response["Body"].read() == items[-1]
 
 
-@mock_s3
-def test_list_object_versions_with_paging():
[email protected]_verified
+@s3_aws_verified
+def test_list_object_versions_with_paging(bucket_name=None):
     s3_client = boto3.client("s3", region_name=DEFAULT_REGION_NAME)
-    bucket_name = "000" + str(uuid4())
-    s3_client.create_bucket(Bucket=bucket_name)
     s3_client.put_bucket_versioning(
         Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"}
     )
@@ -2647,7 +2646,7 @@ def test_list_object_versions_with_paging():
     assert "NextKeyMarker" in page1
     assert page1["NextKeyMarker"] == "obj1"
     assert "NextVersionIdMarker" in page1
-    assert page1["NextVersionIdMarker"] == obj1ver1["VersionId"]
+    assert page1["NextVersionIdMarker"] == obj1ver2["VersionId"]
 
     # Second page should be the last page and have the oldest version.
     page2 = s3_client.list_object_versions(
@@ -2668,11 +2667,10 @@ def test_list_object_versions_with_paging():
     assert page2["Versions"][0]["VersionId"] == obj1ver1["VersionId"]
 
 
-@mock_s3
-def test_list_object_versions_with_paging_and_delete_markers():
[email protected]_verified
+@s3_aws_verified
+def test_list_object_versions_with_paging_and_delete_markers(bucket_name=None):
     s3_client = boto3.client("s3", region_name=DEFAULT_REGION_NAME)
-    bucket_name = "000" + str(uuid4())
-    s3_client.create_bucket(Bucket=bucket_name)
     s3_client.put_bucket_versioning(
         Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"}
     )
@@ -2706,7 +2704,7 @@ def test_list_object_versions_with_paging_and_delete_markers():
     assert "NextKeyMarker" in page1
     assert page1["NextKeyMarker"] == "obj1"
     assert "NextVersionIdMarker" in page1
-    assert page1["NextVersionIdMarker"] == obj1dmk1["VersionId"]
+    assert page1["NextVersionIdMarker"] == obj1dmk2["VersionId"]
 
     page2 = s3_client.list_object_versions(
         Bucket=bucket_name,
@@ -2728,9 +2726,9 @@ def test_list_object_versions_with_paging_and_delete_markers():
 
     # The next key/version markers should point to obj2 dmk1.
     assert "NextKeyMarker" in page2
-    assert page2["NextKeyMarker"] == "obj2"
+    assert page2["NextKeyMarker"] == "obj1"
     assert "NextVersionIdMarker" in page2
-    assert page2["NextVersionIdMarker"] == obj2dmk1["VersionId"]
+    assert page2["NextVersionIdMarker"] == obj1ver1["VersionId"]
 
     page3 = s3_client.list_object_versions(
         Bucket=bucket_name,
@@ -2756,11 +2754,10 @@ def test_list_object_versions_with_paging_and_delete_markers():
     assert "NextVersionIdMarker" not in page3
 
 
-@mock_s3
-def test_list_object_versions_with_paging_and_delimiter():
[email protected]_verified
+@s3_aws_verified
+def test_list_object_versions_with_paging_and_delimiter(bucket_name=None):
     s3_client = boto3.client("s3", region_name=DEFAULT_REGION_NAME)
-    bucket_name = "000" + str(uuid4())
-    s3_client.create_bucket(Bucket=bucket_name)
     s3_client.put_bucket_versioning(
         Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"}
     )
@@ -2807,7 +2804,6 @@ def test_list_object_versions_with_paging_and_delimiter():
         Delimiter="with-",
         Prefix="key1",
         KeyMarker="key12-with-",
-        VersionIdMarker="",
         MaxKeys=5,
     )

By default, the test will still run against Moto, but run it against AWS like so:

MOTO_TEST_ALLOW_AWS_REQUEST=true pytest -sv tests/test_s3/test_s3.py -m aws_verified

bblommers · 2023-10-18T18:26:20Z

Oh, and to answer your question - Yes, I always squash-merge, so don't worry about the commit history. 🙂

bentsku · 2023-10-22T00:53:29Z

Hello, just chiming in here regarding the sorting order: if you put an object then delete it without specifying the object version then running ListObjectVersions with MaxKeys=1, you will get the Versions field and not the DeleteMarkers, which shows that the time is sorted before the type. Running the test with more keys shows the order is name > time > type it seems.

bblommers · 2023-12-11T20:02:23Z

I'll just merge this as is, considering it is already a major improvement on what we have now. I'll raise a separate PR with the final touches.

Thank you for all your work on this @sveniu!

github-actions · 2023-12-11T21:13:09Z

This is now part of moto >= 4.2.12.dev17

bblommers requested changes Oct 10, 2023

View reviewed changes

bblommers reviewed Oct 10, 2023

View reviewed changes

sveniu added 4 commits October 12, 2023 10:35

S3: Add paging to list_object_versions()

b5066a7

fixup! S3: Add paging to list_object_versions()

2430ccc

The is_latest handling was mistakenly removed, causing all versions and delete markers to have <IsLatest>false</IsLatest>. This restores the earlier, correct behaviour.

fixup! S3: Add paging to list_object_versions()

67ed151

Improve readability of the test.

fixup! S3: Add paging to list_object_versions()

92adb66

Add a more thorough test that mixes versions and delete markers.

sveniu force-pushed the s3-list-object-versions-paging branch from b2b8f2e to 92adb66 Compare October 12, 2023 08:35

fixup! S3: Add paging to list_object_versions()

2378284

Fix linter error: moto/s3/models.py:1818:39: error: X | Y syntax for unions requires Python 3.10 [syntax]

sveniu added 2 commits October 17, 2023 14:16

fixup! S3: Add paging to list_object_versions()

dc253b3

Fix paging for common prefixes.

fixup! S3: Add paging to list_object_versions()

0f6602d

Clean up response so that: - VersionIdMarker is included, even if it's an empty string. - NextVersionIdMarker is only included if it's non-empty.

bblommers approved these changes Dec 11, 2023

View reviewed changes

bblommers added this to the 4.2.12 milestone Dec 11, 2023

bblommers merged commit 2522bc4 into getmoto:master Dec 11, 2023

sveniu deleted the s3-list-object-versions-paging branch December 11, 2023 20:22

bblommers mentioned this pull request Dec 11, 2023

S3: list_object_versions() now returns the correct NextToken #7118

Merged

sveniu mentioned this pull request Jan 17, 2024

Fix paging-related bugs by reworking the strategy angeloc/s3-pit-restore#36

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3: Add paging to list_object_versions() #6896

S3: Add paging to list_object_versions() #6896

sveniu commented Oct 10, 2023

bblommers left a comment

sveniu commented Oct 10, 2023

sveniu commented Oct 10, 2023

bblommers commented Oct 10, 2023

bblommers Oct 10, 2023

sveniu Oct 11, 2023

bblommers Oct 11, 2023

bblommers commented Oct 10, 2023

sveniu commented Oct 12, 2023

bblommers commented Oct 12, 2023

bblommers commented Oct 12, 2023

codecov bot commented Oct 17, 2023

sveniu commented Oct 18, 2023

bblommers commented Oct 18, 2023

bblommers commented Oct 18, 2023

bentsku commented Oct 22, 2023

bblommers commented Dec 11, 2023

github-actions bot commented Dec 11, 2023

S3: Add paging to list_object_versions() #6896

S3: Add paging to list_object_versions() #6896

Conversation

sveniu commented Oct 10, 2023

bblommers left a comment

Choose a reason for hiding this comment

sveniu commented Oct 10, 2023

sveniu commented Oct 10, 2023

bblommers commented Oct 10, 2023

bblommers Oct 10, 2023

Choose a reason for hiding this comment

sveniu Oct 11, 2023

Choose a reason for hiding this comment

bblommers Oct 11, 2023

Choose a reason for hiding this comment

bblommers commented Oct 10, 2023

sveniu commented Oct 12, 2023

bblommers commented Oct 12, 2023

bblommers commented Oct 12, 2023

codecov bot commented Oct 17, 2023

Codecov Report

sveniu commented Oct 18, 2023

bblommers commented Oct 18, 2023

bblommers commented Oct 18, 2023

bentsku commented Oct 22, 2023

bblommers commented Dec 11, 2023

github-actions bot commented Dec 11, 2023