Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing API tokens with pypitoken #9184

Open
ewjoachim opened this issue Mar 8, 2021 · 6 comments
Open

Managing API tokens with pypitoken #9184

ewjoachim opened this issue Mar 8, 2021 · 6 comments
Labels
needs discussion a product management/policy issue maintainers and users should discuss tokens Issues relating to API tokens

Comments

@ewjoachim
Copy link
Contributor

ewjoachim commented Mar 8, 2021

This ticket is the counterpart for that discussion for discussing the integration of pypitoken into warehouse. I might start working on the PR at some point, but if there are huge structural problems to doing so, or things that should be figured out beforehand, I'll gladly welcome remarks before working on the integration :)

I plan to remove caveats.py, and have the Macaroons service use pypitoken directly. This should simplify code. Also, the Form for creating tokens will likely be simplified too, as it currently holds some information on the caveats JSON structure, which is not needed if pypitoken takes responsibility for crafting tokens. All in all, I hope, we'll remove much more code than we'll add, while making future evolutions easier.

@di di added the needs discussion a product management/policy issue maintainers and users should discuss label Mar 11, 2021
@ewjoachim
Copy link
Contributor Author

(@rcipkins based on your excellent work on #6935 I believe you might want to have a look :) )

@ewjoachim
Copy link
Contributor Author

ewjoachim commented Mar 14, 2021

So first problem identified: The idea of pypitoken is to make an abstraction layer between the macaroon & caveat format and PyPI. While it works great for generating & checking macaroon, I've noticed that we keep a copy of one caveat in the DB, so that we can display information on the restriction in the UI, with the following problems:

  • This means PyPI needs to know the format of the caveats
  • If we add several restrictions, are they expected to be in a single caveat or in multiple caveats ? If multiple caveats, how will we store them?

Things to keep in mind:

  • We could store the information in a way less linked to the macaroon format. If we decide to change the format, this means we'll likely need a big data migration, but nothing undoable.
  • We could store multiple caveats as a list (it's a jsonb field), but we'd need to handle the migration. Again, nothing undoable, but it will be a multi-step operation
  • As of now, PyPItoken can introspect the restrictions of a macaroon, which would be ideal but we don't have the macaroon itself. We can also dump & load individual restrictions and that's what I'm going to go with. The idea it that we're going to store the json blob, but it's pypitoken that's going to analyse it. Might not be ideal, but it should work for now.

@di
Copy link
Member

di commented Feb 25, 2022

I took a pass through #9264 and while I think the PR is solid, and that pypitoken is a really helpful library, I'm feeling a bit wary about moving some of our core/critical logic out of our codebase and into a dependency, which will make it harder for us (core maintainers) to keep track of changes and ensure testing coverage remains the same as it is now.

I'm curious if there might be a compromise where the pypitoken codebase lives in this repo and is used directly (rather than installed as a dependency) but can also be released for external use. There isn't really a precedent for this and I'm not sure how exactly it would work, but if we found a good solution I'd be interested in exploring it.

Overall, I'd be interested in seeing the short term fixes for #9018 split out, if possible, so that they aren't blocked on us figuring this out.

@ewjoachim
Copy link
Contributor Author

Thanks for your input :)

I think this is not exactly the first time that a part of the logic is taken out of the repository, there's:

  • camo project & conveyor (and I believe a few others) as "side" projects,
  • trove classifiers on the same model as pypitoken would be,
  • (and of course, I believe there are some pyramid plugins we use that aren't used by many other projects except us, which puts us in a kind of similar situation)
  • And on the auth part specifically, we're delegating a part of our critical work to pyramid-multiauth and pymacaroons.

The idea behind splitting this project out is that this lets us have a more scoped approach, and lessen the pressure on the warehouse maintainers, but I've always considered that if we do it, the warehouse maintainers will also be maintainers for pypitoken. Just, it will be easier to have other maintainers onboard that will just need to be maintainers of pypitoken, not of all of warehouse.

Regarding the testing coverage, you'll note that pypitoken has a 100% coverage, and it's supposed to stay this way (I'm not sure it's enforced, but it can be with a single additional option). I think I understand your point in the sense of lessening the opportunities for integration tests that would give us more guarantees, but I think this could apply to all of warehouse, whether this part is integrated or not (see #8616). But if we write and run functional tests in warehouse in the CI and those tests include API auth, we'll still see if they start failing when integrating new pypitoken versions through Dependabot PRs.

Also, keeping strong boundaries within the code itself is, I think, a good way to enforce those boundaries in the future, and avoid having them breached in the future just because people are not enough aware that those boundaries are not to be breached. For example in this specific case, this makes it very clear where the code for generating tokens and implementing their logic is, versus the code for integrating this into warehouse.

With all that being said, I'm not opposed at all to your solution of "vendoring" the project, this could work, it's just that it would only partially solve the existing problem that I guess kept us from going further in the development of caveats so far, the very limited bandwidth for developments in the warehouse codebase from the core maintainers. But if you think there's still value in having a solution where we integrate pypitoken's code within warehouse and adapt this PR, we can do it.

Note that it will probably lead to a few PRs and... Well I'm a bit uncomfortable making the point out loud, but... I don't want to seem demanding but I see that this issue/PR has sit almost 1 year without your point being raised/discussed. I fully understand that there is limited bandwidth and other priorities. What is unclear to me at this point is "what would change if I do it" ? If maintainers didn't have the bandwidth for working on merging a PR that offloads some work to another lib, will you have time to work on merging PR(s) that does the same work AND adds a "vendored" lib AND adds a mechanism for publishing that lib, etc ? To be a bit more blunt, and I hope this will not come out in a bad way, I've spent a non-trivial amount of my 2021 open-source time for this with, so far, very limited (if even existant) impact. I'm slightly hesitant spending quite some more in 2022 with no idea if it will actually help driving the impact level up :)

@di
Copy link
Member

di commented Feb 25, 2022

I think this is not exactly the first time that a part of the logic is taken out of the repository

This would be the first time we've extracted something directly responsible for authentication/authorization. The trove classifiers aren't exactly in the critical path 🙂 (The rest of these were never actually part of this codebase).

Also note that as the current maintainer of pypitoken, merging this would sort of give you maintainer-level abilities for a part of our codebase that you don't already have. I think I'm OK with that given your excellent past contributions but it's not something we usually take lightly. To be blunt: if you wanted to compromise PyPI, this would be a pretty good way to go about it. 🙂

lessen the pressure on the warehouse maintainers, but I've always considered that if we do it, the warehouse maintainers will also be maintainers for pypitoken

I appreciate that and also assumed that would be the case. However: while I can't speak for other maintainers, personally having another separate project to keep track of is generally not going to make things easier for me, or give me more bandwidth, which is the main reason why I'm curious to explore if pypitoken could live in this codebase.

I don't want to seem demanding but I see that this issue/PR has sit almost 1 year without your point being raised/discussed. I fully understand that there is limited bandwidth and other priorities.

Yep, and I'm sorry for that. As a frequent & valued contributor we should have started this discussion with you earlier (in reality, probably when this issue was created and before you drafted a PR). Please note that it's not because the PR is bad in any way, it's just a) not particularly high priority and b) somewhat security-sensitive, so it requires a bit more thought.

To be a bit more blunt, and I hope this will not come out in a bad way, I've spent a non-trivial amount of my 2021 open-source time for this with, so far, very limited (if even existant) impact. I'm slightly hesitant spending quite some more in 2022 with no idea if it will actually help driving the impact level up :)

I think you definitely shouldn't do any more work here until we have some consensus, so I'll ask @ewdurbin to chime in as well.

@ewdurbin
Copy link
Member

I don't have an opinion against migrating to an pypitokens as an abstraction, but do agree with @di regarding concern with breaking it out as a completely separate codebase with a separate release cycle. If embedding the project as a releasable artifact is doable that seems best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion a product management/policy issue maintainers and users should discuss tokens Issues relating to API tokens
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants