-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: Add paging to list_object_versions() #6896
S3: Add paging to list_object_versions() #6896
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sveniu!
Moto has a @paginator
decorator to simplify this - it should allow us to keep the existing logic intact, with the decorator doing all the heavy lifting around pagination.
See the docs here: http://docs.getmoto.org/en/latest/docs/contributing/development_tips/utilities.html#paginator
Hi @bblommers. Thanks for the pointer, I wasn't aware of that decorator. Looking at it, it seems it doesn't support multiple input tokens such as key-marker + version-id-marker used by the list_object_versions call. I'll see if I can wrap my head around the inner workings and add support for it. |
Hi @bblommers. Spent some time on this now, but I'm not able to figure out how to adapt the paginator to work with S3's list_object_versions, which uses two input tokens (key-marker and version-id-marker). Would you be able to help me out with some pointers: Do you think the paginator should work well with this use case, and I'm just looking at it the wrong way? Or would the paginator need a large-ish rewrite to work with multiple inputs? |
@sveniu You're right, that is not straightforward with the current implementation. Let's stick with this then for now, we can always refactor it later. Looks like the linter is failing: https://github.com/getmoto/moto/actions/runs/6465933620/job/17577370491?pr=6896 |
moto/s3/models.py
Outdated
if version_id_marker is None: | ||
return key_marker == ver.name | ||
|
||
return key_marker == ver.name and version_id_marker == ver.version_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be key_marker <= ver.name and version_id_marker <= ver.version_id
?
As the docs specifically state:
Specifies the key to start with when listing objects in a bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work if version_id was sequential, but it is basically a random string. I first tried just adding version_id_marker <= ver.version_id
, but it produced flaky results depending on the random version_id values.
I opted for itertools.dropwhile here, and the helper function simply needs to signal when the first match is found, using exact matching, and dropwhile will then return the remaining items. Quite suitable for this purpose, I think. Was thinking to implement that version_id "search" manually with a version_marker_found bool, but found dropwhile to be a good fit. Let me know if you have thoughts on it.
The main loop still applies the prefix and delimiter logic, and with the max_keys paging added, it looks to me like it behaves like the real S3 server does.
I'd be happy to add more tests if there are suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Would you mind merging in the latest master? That should fix the linter - if the existing tests all pass, then I'd be happy to merge then.
Actually - it looks like the linter is failing on |
The is_latest handling was mistakenly removed, causing all versions and delete markers to have <IsLatest>false</IsLatest>. This restores the earlier, correct behaviour.
Improve readability of the test.
Add a more thorough test that mixes versions and delete markers.
b2b8f2e
to
92adb66
Compare
I've rebased on top of master, and added some fixup commits. I assume they'll be squash-merged automatically. Otherwise I can squash on my end and force-push a single commit. Some tests still fail, but they seem unrelated to the modified code? |
@sveniu The linter is still failing - I think you would have to use https://github.com/getmoto/moto/actions/runs/6493285516/job/17635218175?pr=6896 I'll have a look at the failing Terraform tests later on. |
Fix linter error: moto/s3/models.py:1818:39: error: X | Y syntax for unions requires Python 3.10 [syntax]
OK, so there is a change in behaviour somewhere. The existing test that now fails, Conversely, the new test |
Fix paging for common prefixes.
Clean up response so that: - VersionIdMarker is included, even if it's an empty string. - NextVersionIdMarker is only included if it's non-empty.
Codecov Report
@@ Coverage Diff @@
## master #6896 +/- ##
========================================
Coverage 95.81% 95.82%
========================================
Files 826 826
Lines 81009 81126 +117
========================================
+ Hits 77622 77741 +119
+ Misses 3387 3385 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Finally fixed how paging is done when common prefixes are involved – at least moto now behaves the same as the real S3 server there, and tests look green now. @bblommers Dunno if you have more thoughts on this – let me know if you do. Otherwise, the branch is full of fixup commits, and I guess you do a squash-merge usually? |
Hi @sveniu, thanks for continuing to chip away at this! As far as I can tell, there are still two issues with this:
Point 2 is an issue with the existing implementation, and the existing tests also do not really verify ordering. But from what I can see, this is the order in which AWS returns things:
If you want to test this out, I've modified the new tests slightly so that they pass against AWS.
By default, the test will still run against Moto, but run it against AWS like so:
|
Oh, and to answer your question - Yes, I always squash-merge, so don't worry about the commit history. 🙂 |
Hello, just chiming in here regarding the sorting order: if you put an object then delete it without specifying the object version then running |
I'll just merge this as is, considering it is already a major improvement on what we have now. I'll raise a separate PR with the final touches. Thank you for all your work on this @sveniu! |
This is now part of moto >= 4.2.12.dev17 |
This adds paging to list_object_versions(). The original motivation was to enable moto testing for an S3 point-in-time recovery tool, where proper handling of paging is important.
CC @bblommers