-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Registrations RBAC #1975
Comments
Thanks for sending this, @prasadborole1 ! I think it would also be useful to examine the usefulness of this RBAC feature for more than just the Registration APIs so that we come up with a solution that can potentially be extended beyond the set of use cases presented here. Bundle management, for example, seems like another set of operations that would benefit from a finer grained authorization solution. |
Thank you @prasadborole1 for putting this together! |
I agree with both @rturner3 and @amartinezfayo. My vote is for option 2 In the end, we'll want this work to solve RPC authorization across the board, entry CRUD being just one use case... so we should be thinking about it in more generalized terms if possible. Here are a few questions rolling around in my head:
|
Wanted to chime in and mention that this is a great feature to have and am looking forward to it. We have a use case around this where we want to be able to create organizational policy around identities, so having something a call to a policy engine would be ideal. Another more specific use case we have is around the kubernetes workload registrar. The specific issue we want to address with the policies is that the admin of the kubernetes cluster is able to create any identity they want. This is something that we are looking to lock down without necessarily having to create a single trust domain for every separate cluster. Being able to define a policy to enforce that identities created by the workload registrar cannot have the admin flag, or being able to restrict the identities to not be able to impersonate other clusters would be very useful. |
This work may also enable #1207 as well, which I believe is a pretty critical requirement for traceability of registrations and bundles. |
Would like to find out more about the work around this issue, really keen on using this :). is there anything I can do to help out? |
Hi @lumjjb! I think someone @rturner3 works with was going to pick this up, but it has been sitting for a long time. If we provide a detailed design, would you be willing to help implement it? I think we know roughly what it should look like... of course, you'd also have an opportunity to validate that :) |
Yea! We should be able to help out - current planning our future work, and we should be able to fit this into our plan. |
We should have some cycles to start looking into this. @prasadborole1 @rturner3 is this something that you'd be free to chat about? |
Thank you for following up on this and apologies for the delay. I believe the next step here is to come up with the design using OPA strategy. @lumjjb I can sync with you later in this week. |
Awesome - i'm out later this week, would you be okay to sync up early next week? My email is on my github profile if you'd like to send an invite! Thanks! |
@prasadborole1 and I had a chat to sync up on this. I've rebased their work on the main branch https://github.com/lumjjb/spire/tree/tj/policy-dirty and will be working on it from there. I will be working on writing up some documentation around it, and maybe adding the ability for policy to apply to UDS calls as well with caller specified as "UDS". |
Great news @lumjjb! Thank you for picking this up, it is much needed. It'd be great if you could join the SPIRE contributor sync at some point. Most folks undertaking non-trivial works on the codebase join, and it is a useful place to get opinions before going down a long path. They take place every Tuesday at 11:30 Pacific, the details are shared on the SIG-SPIRE shared calendar: https://calendar.google.com/calendar/ical/c_q2don6m2b33gljqftauib3hnuk%40group.calendar.google.com/public/basic.ics |
Awesome - yea i have an appointment already this week but ill probably drop by next week! Thanks Evan. |
Here's a quick update on where I am with this issue... Quick recap:I rebased the branch that folks from Uber worked on. The integration for that was done with the old registration API. So I've integrated the similar functionality with the new SPIRE server APIs. Part of this effort was modifying the middleware API to provide requests to the Preprocess interface for unary interceptors. During discussions during community meetings, we talked about perhaps using the OPA policy to encapsulate the entire policy. Where we are nowBased on this idea, I explored several options that we could take with this. This is tricky mainly because of the interaction between OPA and SPIRE with relation with figuring out of a caller has certain properties - like The policy databindings would mimic the following go code through this JSON Possible SolutionsThere are several approaches to solving the problem stated above:
I will provide more details about these options below, but for as a summary, my Option 1 is infeasible due to costs, Option 2 on paper sounds really enticing but either will result in over-engineering / creating artificial restrictions (i.e. can't use a remote OPA server). But the main drawback for this is that there is no notion of cost from OPA about these external calls, which means that writing an OPA policy that uses this calls will be much more complicated, as there is no way explicit way to Option 3 is close to the initial proposal of the issue, but make it difficult to create finer grained policy controls. Option 4 allows creation of finer grained policies without any additional cost but results in slightly more involved policy definition. This is my personal favorite choice. Big thanks to @ashutosh-narkar, that helped consult on the different options. OptionsOption 1. Incur cost for all calls and pass it to OPA policyThe idea behind this is to obtain all the necessary booleans of whether a caller is admin, etc. and pass all these as inputs to the OPA policy. For example, the policy input would look like:
However, this would cost in terms of implementation, and look like main...lumjjb:policy_opa#diff-95d2951732a69d7a4f84f0982eed9968c2aeedca645cb2bafaad3db6547a510eR12-R79 This way the OPA policy would have all the information needed to make a final decision, however, coming at a cost of preparing all these inputs. Option 2: OPA policy interfaces with SPIRE through HTTP / golang addon to get more informationThis option revolves around either two of these "plugin" features of OPA: These both provide the ability for OPA to call into a HTTP endpoint or golang code for retrieving certain data. There are 2 major implications of using these methods, 1st is engineering overhead, and the 2nd, implicit policy complexity. Engineering ImplicationsThe use of either of them has some engineering implications. For the earlier, a HTTP endpoint needs to be created and serve OPA query of whether a certain SPIFFE ID has the admin flag, or is an agent (which may need certain correlation with golang contexts), which is a fairly involved engineering effort for what we want to achieve. For the later, golang code can be written to replace certain functions within the OPA engine. This is great for golang integration, but the implication of this is that we would not be able to do remote OPA policy evaluation. Remote OPA policy evaluation seems like it may be useful in the future as Policy Complexity ImplicationsOPA is not aware of the cost of these functions. Therefore, it treats these functions like any other OPA functions. This means that the user has very little control on what ends up being executed to fulfill the policy. This could lead to incurring a cost equivalent to getting all the peroperties before hand (like in option 1). This may also make writing of policies "right" fairly complicated since there is an implicit cost based on the execution semantics of the OPA policy. This means that one may have to write a policy a certain way to be more optimal, or is at the mercy of the OPA engine in determining it. Option 3. Use OPA policy and provide an option to "pass" onto default SPIRE policyThis option is similar to the original draft branch proposed. With the difference that on top of the OPA policy evaluation returning just a boolean So instead of a query of a result with:
We will have an additional field:
Thus, the evaluation would look something like the following:
Option 3 is close to the initial proposal of the issue, but make it difficult to create finer grained policy controls that involve additional flags like checking if the caller is admin or an agent. Option 4. OPA policy returns additional variablesOption 4 allows creation of finer grained policies without any additional cost but results in slightly more involved policy definition. This is my personal favorite choice. This option is similar to option 3 but with additional fields defined by the OPA policy, So instead of the OPA policy returning
We have additional fields indicating that the policy would allow it if certain flags are met:
This would then be evaluated like: Admit IF
where it would call the necessary go functions in order of cost. Code that does this is as follows, below are some examples to understand the evaluation main...lumjjb:policy_opa#diff-95d2951732a69d7a4f84f0982eed9968c2aeedca645cb2bafaad3db6547a510eR81-R149 Example 1: all false
In this case, there is no way the it would be allowed so it would return false Example 2: multiple trues
In this case, allow is false, so in order of ease to evaluate, we check if it is a local call, and if it is then we shortcircuit and return true.
|
@lumjjb thank you so much for this, let's definitely discuss tomorrow |
Option 4 looks good to me. It avoids unnecessary performance penalties during authorization policy evaluation while retaining some flexibility as to where the OPA engine is running (in-memory vs. server). The drawback of complicating the authorization decision to be OPA + SPIRE vs. just OPA itself seems acceptable in this case given the available alternatives. |
I am going to NOT include the old registration API to use this since the scope has now increased to handle the other API calls as well, so it doesn't quite match up with the old APIs. Unless there's an exception, I will assume this is the case. Cleaning up and writing tests and stuff, PR should be ready soon. |
Background:
SPIRE implements the concept of “admin” workloads. A workload presenting an admin SVID is able to act broadly across the entire registration API. In an ecosystem where multiple registrants are creating/deleting registrations, it results in registrants being able to affect registrations created by other registrants. Therefore, we need a way to limit the access of registration APIs to subsets of SPIFFE IDs by the registrant. More background discussion can be found in the github issue #716.
Possible solutions:
Plug-in-based approach (Registration authorizer):
The registration authorizer plug-in would be invoked in the path of create, update and delete registration RPCs. This will allow plug-in to get details of requestor identity, action to be performed and subject of the action. It will return a boolean response denoting whether requestor identity is authorized to perform action on registration with specific SPIFFE ID present in the RPC request.
We currently have a plug-in “Notifier” which gets invoked after each bundle update call. The registration authorizer plug-in would behave in a similar way. Since both of these plug-ins are intended to be used for very different purposes, I’m proposing to create a new one instead of overloading existing “Notifier” plug-in.
This approach offers more flexibility as users can potentially implement their own authorization model depending on different use cases or can integrate with any other external authorization solution. By default we will have a Noop plug-in which will always return true to ensure backward compatibility.
OPA-based config:
In this approach, we make changes to SPIRE core to include OPA-based config which will define admin registration authorization. The basic implementation can be found here.
This approach would make authorization a first class citizen of SPIRE which brings both advantages and challenges. All the SPIRE users will get authorization by default but it means arriving at consensus on the basic permission model so that most customer use cases are addressed. Also, there will be challenges around compatibility when modifying the permission model while the feature is settling in the production environment.
Sample Implementation of Registration Authorizer plug-in:
A sample permission model for registration RPCs in the plug-in could look like:
Example:
In the above example, the “Actor” is spiffe://example.ca/foo/workload admin workload which is mapped to the “Role” of the registrant.
The role foobarRegistrant would allow “Actions” like create, update and delete on specific namespaces.
The “Namespaces” holds prefixes of SPIFFE IDs on which the role allows the above Actions.
During each lookup, plug-in will look for a role and action and then verify the namespace of requested identity. The role could be shared with different actors.
Namespaces are prefixes of the SPIFFE IDs.
Resources are a set of RPCs which are authorizing a given role. (Initially we will just have registration RPCs. Adding this field so that in future we can possibly extend this to other RPCs)
List of RPCs which will invoke the plug-in:
The latency impact of plug-in to registration RPCs would be proportional to the number of actors(registrants) and roles. But assuming this can be performed in memory, overall RPC latency would not observe a sizable increase.
As a first step we can get Noop plug-in as a default. The details of implementations of actual registration authorization plug-in can be discussed separately.
Request For Comments:
This proposal is intended to get feedback on the overall approach for authorizing admin workloads in the near future as this is becoming an increasingly essential feature in hardening the SPIRE ecosystem in the production.
The text was updated successfully, but these errors were encountered: