-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Row-level filtering and masking #1480
Comments
@martint Could we just leverage create |
@L0YU, that's something that Presto supports today, but it requires someone to create those views and for users to know which views they are supposed to query. The feature we're discussing here is more powerful, as it would allow plugins and connectors to dynamically enforce filtering/masking policies based on the identity of the user without having to pollute the object namespace. |
OK Cool ! |
It seems like the most user-friendly way to provide filters and masks is to allow connectors and access control plugins to describe them in terms of textual SQL expressions. This can be rationalized as an analog to dynamic views: the table and the filter/mask expressions form a dynamic view that's then parsed, analyzed and inlined into the query plan. Clearly, the expressions must fit within some constraints:
Expressions can contain arbitrary subqueries (as part of an What authorization identifier is used to access tables and functions referenced by the filter? Evaluating the filter and masks as the user who submitted the query is simple, but severely cripples the capabilities of this feature. Just like with views, being able to execute the filter with the permissions of a higher privilege user allows for more advanced filtering scenarios involving lookups into tables that might have restricted access. If we conceptualize row filters as just another form of dynamic views, where the view is returned "in parts" (the FROM clause and the WHERE clause separately), it follows that having a similar authorization model as views makes sense. In this case, the row filter would be returned by the plugin or connector together with an authorization identifier to be used to perform access checks on the tables and functions referenced by the filter. This may seem pointless for a connector access control: if the connector is the one in charge of returning the filter for a table given a user identity, wouldn't that suffice to "trust" the filter and so any further permission checks on the filter should be skipped? Once we consider multiple connectors or global access control, it's clear that we need to perform those checks. Otherwise, a poorly written connector could result in a leak of information from another connector. Should filters and masks be applied to tables/columns referenced by a filter or mask? Assuming filters and masks are associated with an authorization identifier that will be for access control checks against any table referenced by them, should filters and masks be applied recursively to those tables? That would make sense if we want to have consistency, otherwise we'd next to explain why some access checks involve filters and masks and others don't. However, applying filters and masks recursively introduces another potential problem: how far down the rabbit hole do we go? The solution to this may be to continue to resolve and apply row filters and masks as long as the given table and identity haven't been considered. If both a connector and global access control provide masks for a column, what should the behavior be? Some obvious choices are:
|
I think this decision should be left to a plugin, so connectors and access control plugins might return that identity along with the filter or mask. I guess that in most cases it will be the same identity which is running the actual query. This is something that external service that stores authorization rules has to support in a first place. The ones that I most familiar with do not support that anyway. Because of that, we could also defer the support for using different authorization identifier to evaluate filter and mask until the moment we find the use case, otherwise it is going to be a "dead" code anyway.
I don't think so. We should still verify access. Even if plugin return a "trusted" identity that is going to evaluate filter and mask we still would need to make sure what access given "trusted" identity has.
I think so. I would expect to apply filter and then mask.
I think we should go to the bottom with the caveat that planner would need to make sure that we are not within an endless loop.
I vote for |
Since the masks have different "masking strengths" (eg. asterisk for everything vs asterisk for everything except last 4 characters), in principle it is not safe to apply some mask. Now consider a case of connector configured to apply a mask on The masks should be applied both, in order: connector mask, then system-level mask. |
Presto should support row-level filtering and data masking for more granular access controls to data.
The general approach would be for
SystemAccessControl
/AccessControl
to provide filters/projections on the base table given an authorization identifier (user/role) that the query planner injects into the plan.Some questions that need to be considered:
There are some potential benefits of modeling the filter/masks as references to (polymorphic) table functions:
Relates to #18
The text was updated successfully, but these errors were encountered: