-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to add an Index.html for the entire bucket. #18
Comments
I have created a pull request, #26, that addresses this issue. |
Nice idea for sure. |
This would be extremely useful to have this available. |
This is a problem for tools that expect a PEP 503 compliant package repo. For example dependabot fails if it can't read the root index.html. It needs this index in order to know which packages are stored in the repo. While #26 is a good start it no longer merges cleanly. The linked repo (natefoo/s3pypi-root-index) hasn't been updated in two years. @mdwint @rubenvdb I'm happy to work on getting this functionality into the core project. I would appreciate some guidance on how you think we should approach this before kicking off the work. Should the aim be to update @pdonis' and @natefoo's work so it merges cleanly or would you like to see something different? |
The reason #26 did not get merged was that it did not handle concurrent updates.
Implementing this correctly will require some form of locking. Terraform's S3 backend, for example, solves this using a DynamoDB table. It's unfortunate how this adds a dependency on DynamoDB, since not everyone needs this feature. If you decide to go this route, I suggest making it optional, similar to basic auth in #76 (note that this hasn't been merged yet). A simpler solution raised in #26 is making it a separate command, without consistency guarantees, as in natefoo/s3pypi-root-index. I would not be against including such a command in s3pypi, provided that the lack of locking is clearly stated in the documentation. |
Come to think of it, adding a DynamoDB table for locking might not be a bad idea in general, since package-level indexes could benefit from it as well. This could be opt-in, similar to how Terraform does it. |
@skwashd If you need a short-term solution, I recommend creating a simple script like natefoo/s3pypi-root-index. We can add a general-purpose solution to s3pypi after #76 is merged, and once the DynamoDB locking idea has been worked out. |
@mdwint thanks for the detailed replies. While I like DynamoDB, it adds another moving part. Late last year, Amazon made S3 reads after writes strongly consistent. What do you think about making the locking mechanism configurable? I'm happy to explore adding "NONE" and "S3" as possible values initially, then later we could add support for "DYNAMODB". What are the blockers for merging #76? Should I be basing my work off that branch? |
Strong read-after-write consistency does not solve the "lost update" problem. Multiple instances of s3pypi could be reading, modifying, and writing the same index file at the same time. The final instance to write its results would overwrite the others just written. Any new packages/versions in those other updates would be lost. This would be a rare occurrence, but common enough to trigger the occasional bug report if used on a big enough scale, especially if we add top-level index pages.
What would "S3" entail? I don't think multiple locking implementations are necessary. A CLI option like To be clear: I don't expect you to solve locking just yet. I'll create a separate issue, and can have a look at this myself.
I can merge it in the next few days. If you want to base your work off something, I recommend the |
Sorry I should have been clearer. I understand it takes time to create the index file. I was proposing that we could store the lock file in the S3 bucket. I know there is a small window where more than one process to perform a lock check, not see the file and write one. That said, the top level index writing is really only problem where multiple processes are adding new packages to the repo. The file is just a list of links to the package directories in the bucket. I don't think the risk of collisions there is very high, except during bulk migrations. |
That could work if S3's new consistency model is strong enough, but I'm not sure that is the case. Someone proposed this idea for Terraform as well:
|
Sounds like I give it a shot. If it isn't good enough we can switch to Dynamo. |
Hi guys,
Thanks for your great work.
I'm wondering if there is a way to add an Index.html for the entire bucket?
Right now, s3pypi only generates one index.html for each package, is there a way to generate an index.html for the entire s3 bucket?
The text was updated successfully, but these errors were encountered: