Is there a way to add an Index.html for the entire bucket. #18

LanDeQuHuXi · 2017-03-29T18:39:09Z

Hi guys,
Thanks for your great work.
I'm wondering if there is a way to add an Index.html for the entire bucket?
Right now, s3pypi only generates one index.html for each package, is there a way to generate an index.html for the entire s3 bucket?

pdonis · 2017-09-11T22:14:26Z

I have created a pull request, #26, that addresses this issue.

derek-adair · 2018-08-25T16:47:53Z

Nice idea for sure.

igable · 2018-10-18T22:31:26Z

This would be extremely useful to have this available.

skwashd · 2021-05-18T16:49:06Z

This is a problem for tools that expect a PEP 503 compliant package repo. For example dependabot fails if it can't read the root index.html. It needs this index in order to know which packages are stored in the repo.

While #26 is a good start it no longer merges cleanly. The linked repo (natefoo/s3pypi-root-index) hasn't been updated in two years.

@mdwint @rubenvdb I'm happy to work on getting this functionality into the core project. I would appreciate some guidance on how you think we should approach this before kicking off the work. Should the aim be to update @pdonis' and @natefoo's work so it merges cleanly or would you like to see something different?

mdwint · 2021-05-18T18:44:34Z

The reason #26 did not get merged was that it did not handle concurrent updates.

One concern I have is that the top-level index might get overwritten when trying to upload multiple packages at the same time, since S3 does not provide locking.

Implementing this correctly will require some form of locking. Terraform's S3 backend, for example, solves this using a DynamoDB table. It's unfortunate how this adds a dependency on DynamoDB, since not everyone needs this feature. If you decide to go this route, I suggest making it optional, similar to basic auth in #76 (note that this hasn't been merged yet).

A simpler solution raised in #26 is making it a separate command, without consistency guarantees, as in natefoo/s3pypi-root-index. I would not be against including such a command in s3pypi, provided that the lack of locking is clearly stated in the documentation.

mdwint · 2021-05-18T20:03:51Z

Come to think of it, adding a DynamoDB table for locking might not be a bad idea in general, since package-level indexes could benefit from it as well. This could be opt-in, similar to how Terraform does it.

mdwint · 2021-05-18T20:23:05Z

@skwashd If you need a short-term solution, I recommend creating a simple script like natefoo/s3pypi-root-index. We can add a general-purpose solution to s3pypi after #76 is merged, and once the DynamoDB locking idea has been worked out.

skwashd · 2021-05-19T05:09:19Z

@mdwint thanks for the detailed replies.

While I like DynamoDB, it adds another moving part.

Late last year, Amazon made S3 reads after writes strongly consistent. What do you think about making the locking mechanism configurable? I'm happy to explore adding "NONE" and "S3" as possible values initially, then later we could add support for "DYNAMODB".

What are the blockers for merging #76? Should I be basing my work off that branch?

mdwint · 2021-05-19T06:18:34Z

Late last year, Amazon made S3 reads after writes strongly consistent.

Strong read-after-write consistency does not solve the "lost update" problem. Multiple instances of s3pypi could be reading, modifying, and writing the same index file at the same time. The final instance to write its results would overwrite the others just written. Any new packages/versions in those other updates would be lost. This would be a rare occurrence, but common enough to trigger the occasional bug report if used on a big enough scale, especially if we add top-level index pages.

What do you think about making the locking mechanism configurable? I'm happy to explore adding "NONE" and "S3" as possible values initially, then later we could add support for "DYNAMODB".

What would "S3" entail? I don't think multiple locking implementations are necessary. A CLI option like --lock-indexes could be used toggle it on or off.

To be clear: I don't expect you to solve locking just yet. I'll create a separate issue, and can have a look at this myself.

What are the blockers for merging #76? Should I be basing my work off that branch?

I can merge it in the next few days. If you want to base your work off something, I recommend the develop branch.

skwashd · 2021-05-19T06:24:57Z

Sorry I should have been clearer. I understand it takes time to create the index file. I was proposing that we could store the lock file in the S3 bucket. I know there is a small window where more than one process to perform a lock check, not see the file and write one.

That said, the top level index writing is really only problem where multiple processes are adding new packages to the repo. The file is just a list of links to the package directories in the bucket. I don't think the risk of collisions there is very high, except during bulk migrations.

mdwint · 2021-05-19T06:43:56Z

I was proposing that we could store the lock file in the S3 bucket.

That could work if S3's new consistency model is strong enough, but I'm not sure that is the case. Someone proposed this idea for Terraform as well:

We’ve not yet done any research to see if S3’s new guarantees are sufficient for that model to be safe

skwashd · 2021-05-19T07:03:43Z

Sounds like I give it a shot. If it isn't good enough we can switch to Dynamo.

mdwint · 2021-05-19T15:37:40Z

@skwashd FYI, #76 has been merged. I've also created a starting point for this issue in #80. I'm assuming DynamoDB in this draft, but that can be swapped out for other implementations. From what I've read so far, I don't see this coming together using S3 alone, but I may be mistaken.

pdonis mentioned this issue Sep 11, 2017

Add generation and upload of master index.html for repository. #26

Closed

mdwint mentioned this issue May 19, 2021

Add option to lock indexes in S3 using DynamoDB & add option to put root index #80

Merged

6 tasks

mdwint closed this as completed in #80 May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to add an Index.html for the entire bucket. #18

Is there a way to add an Index.html for the entire bucket. #18

LanDeQuHuXi commented Mar 29, 2017

pdonis commented Sep 11, 2017

derek-adair commented Aug 25, 2018

igable commented Oct 18, 2018

skwashd commented May 18, 2021

mdwint commented May 18, 2021 •

edited

Loading

mdwint commented May 18, 2021 •

edited

Loading

mdwint commented May 18, 2021

skwashd commented May 19, 2021

mdwint commented May 19, 2021 •

edited

Loading

skwashd commented May 19, 2021

mdwint commented May 19, 2021

skwashd commented May 19, 2021

mdwint commented May 19, 2021

Is there a way to add an Index.html for the entire bucket. #18

Is there a way to add an Index.html for the entire bucket. #18

Comments

LanDeQuHuXi commented Mar 29, 2017

pdonis commented Sep 11, 2017

derek-adair commented Aug 25, 2018

igable commented Oct 18, 2018

skwashd commented May 18, 2021

mdwint commented May 18, 2021 • edited Loading

mdwint commented May 18, 2021 • edited Loading

mdwint commented May 18, 2021

skwashd commented May 19, 2021

mdwint commented May 19, 2021 • edited Loading

skwashd commented May 19, 2021

mdwint commented May 19, 2021

skwashd commented May 19, 2021

mdwint commented May 19, 2021

mdwint commented May 18, 2021 •

edited

Loading

mdwint commented May 18, 2021 •

edited

Loading

mdwint commented May 19, 2021 •

edited

Loading