Add support for row filtering #21913

JaneHang9305 · 2024-02-13T00:06:11Z

Cherry-pick of trinodb/trino@7efb49c
Co-authored-by: Martin Traverso

Description

SystemAccessControl/AccessControl to provide filters/projections on the base table given an authorization identifier (user/role) that the query planner injects into the plan.

Motivation and Context

Presto should support row-level filtering and data masking for more granular access controls to data.
#20572
trinodb/trino#1480

Impact

There is no performance impact on users unless onboarding the rules.

Test Plan

Test on local.

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

Cherry-pick of trinodb/trino@7efb49c

jaystarshot · 2024-02-14T19:52:10Z

@JaneHang9305 You need to add a coauthor in the commit ref- https://github.com/prestodb/presto/wiki/Review-and-Commit-guidelines#attribution.
@tdcmeehan @imjalpreet @agrawalreetika This has been a stale issue since a while now. Should we just discuss this trino approach in a doc/wg meeting and proceed with merging/path forward?

tdcmeehan · 2024-04-11T18:29:32Z

It seems like this filter is specified on a per-connector level. If that's the case, why not just use a connector rewrite to provide the filter?

JaneHang9305 · 2024-04-11T21:18:10Z

It seems like this filter is specified on a per-connector level. If that's the case, why not just use a connector rewrite to provide the filter?

This is a unified solution for all connectors, and each connector can have its own rules for filtering.

tdcmeehan · 2024-04-11T23:40:17Z

Could we achieve a unified solution through a library, which provides a connector rewrite rule that the connector can add in to add support for row level filtering?

Originally I thought an approach like this might be a good idea, however I think while there are costs to the connector rewrite framework, it also has enormous benefits in the flexibility it allows and the ability to make changes that are just cumbersome to do purely in the engine. So I'm wondering, if the primary motivation is to use one unified approach, can this be accomplished through composition--connectors compose a library which adds a rewrite rule which provides a rule level filter? That way they can be shared among Hive, Hudi, Iceberg and other connectors easily.

tdcmeehan · 2024-04-11T23:56:57Z

The same argument from a different angle: in what way is our existing SPI footprint insufficient to allow for connector-specified row level filtering? Row level filtering is listed as one of the motivations for connector rewrites--in what way does this improve on that? If it's a narrower SPI footprint, can this be achieved through composition as mentioned above? And does it really make sense to add additional SPI hooks if we already have this capability?

jaystarshot · 2024-04-12T02:24:49Z

I do agree that it would be better if we have a smaller SPI footprint. However, I don't think the process of adding the filter
itself is connector specific (we could also have added a common plan rule directly in the planning layers) but rather the rule level filters that connectors need to provide which are connector specific. It's like ConnectorIdentity which is connector specific.
Furthermore, we already have a package for security com.facebook.presto.spi.security and it might be better to centralize all security-connector updates from one location.

tdcmeehan · 2024-04-12T19:57:38Z

Let me know if I'm understanding this correctly.

For a table scan, during analysis, we we check the table we want to filter, and call an SPI hook on the Connector to ask for its row filters. Then we'll process and validate these filters, and put them back into the analysis. Then, during planning, we'll check if any row level filters were specified, and add them to as filters to the table scan node. Filtering accomplished.

It seems like one argument is this is the place where it ought to go, because it's security related, and security related things ought to be centralized in one place. But I would argue that connector plan rewrites are a conscious choice about splitting where things ought to go: certain things are global, and ought to be centralized in the engine, and certain things can only be inferred by the context of the connector, and can be pushed down entirely to the connector. An example is advanced filter pushdown: while this is common in Hive-like systems, other systems treat this all very differently, and there's not a whole lot of benefit for centralizing this in Presto. Likewise, because the rules which specify what rows to filter out are connector-specific, I just don't see the benefit to centralizing this and adding a larger SPI footprint when we already have the ability to do this entirely in the connector.

jaystarshot · 2024-04-12T20:51:31Z

Yes that's correct, however this is not an advanced filter pushdown where we need any connector-specific info regarding the tableScans. I think this pushdown could be handled in presto-main itself. Replicating the pushdown across all connectors might simply lead to code duplication. Additionally, it might make it challenging for administrators to implement central access policies across all connectors/catalogs

tdcmeehan · 2024-04-12T21:00:35Z

I don't understand how this PR solves the problem of making central access policies less challenging. The policies must come from the connector, correct?

jaystarshot · 2024-04-12T21:07:44Z

Yes but they follow the same interface I guess? Maybe it doesn't make central access policies that easier, not sure. But since the filter logic is in presto-main now (lets say to avoid duplication) we would need something in the spi for the connector to communicate with the engine?

jaystarshot · 2024-04-12T21:19:16Z

So I guess the question is whether this filter pushdown deserves its own connector specific optimizers or is this approach perfectly fine to avoid duplication

tdcmeehan · 2024-04-15T13:04:55Z

Let's consider the approaches by analogy. By analogy, the approach in this PR is like object-oriented inheritance. There is base functionality, which is applying the filters, but there are abstract methods, which are the SPI hooks. Similar to object oriented inheritance, you must know how the parent class uses the abstract methods in order to implement them effectively. It seems similar by analogy to a parent class with base functionality, and abstract methods that child classes may implement to use this functionality. There is indeed code reuse here.

Let us consider the connector rewrite rule approach to be more like composition. In this approach, you need not know the parent interface rules. Instead, you compose a new behavior, which defines privacy filters based on arbitrary criteria, and pushes them to table scan nodes. This seems like composition--a set or list of objects that provide behaviors that can be mixed-in to the class. Code reuse can be achieved by making it so that behaviors that are mixed-in can themselves be subclassed or provide composition of behaviors of their own.

The outcome seems the same between both. Because we've embarked on the composition path, I am wary of doing this through the inheritance path, since we already have the ability to do this with composition. The code reuse argument seems to be able to be solved by just making the rewrite a library that connectors mix-in/compose to existing rewrites (which seems no less onerous than connectors needing to self-specify the filters that the engine should add to provide the row-level filtering).

jaystarshot · 2024-05-23T18:39:58Z

Sounds good, yeah connector optimizers will be the better approach for presto. If they need to be "improved" like for eg: connector optimizers for relational plan, that functionality can be added i guess

Add support for row filtering

0d53ff5

Cherry-pick of trinodb/trino@7efb49c

JaneHang9305 marked this pull request as ready for review February 13, 2024 03:23

JaneHang9305 requested a review from a team as a code owner February 13, 2024 03:24

JaneHang9305 requested a review from presto-oss February 13, 2024 03:24

tdcmeehan self-assigned this Apr 9, 2024

Merge branch 'master' into add_rlac_support_test

86e565c

JaneHang9305 requested a review from jaystarshot as a code owner April 10, 2024 18:26

Merge branch 'prestodb:master' into add_rlac_support_test

fe4ec15

JaneHang9305 marked this pull request as draft April 10, 2024 20:37

Merge branch 'prestodb:master' into add_rlac_support_test

cb31b12

tdcmeehan removed their assignment Jun 5, 2024

BryanCutler mentioned this pull request Dec 18, 2024

Add support for row filtering and column masking access control #24278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for row filtering #21913

Add support for row filtering #21913

JaneHang9305 commented Feb 13, 2024 •

edited

Loading

jaystarshot commented Feb 14, 2024

tdcmeehan commented Apr 11, 2024

JaneHang9305 commented Apr 11, 2024

tdcmeehan commented Apr 11, 2024

tdcmeehan commented Apr 11, 2024 •

edited

Loading

jaystarshot commented Apr 12, 2024 •

edited

Loading

tdcmeehan commented Apr 12, 2024

jaystarshot commented Apr 12, 2024

tdcmeehan commented Apr 12, 2024

jaystarshot commented Apr 12, 2024 •

edited

Loading

jaystarshot commented Apr 12, 2024 •

edited

Loading

tdcmeehan commented Apr 15, 2024 •

edited

Loading

jaystarshot commented May 23, 2024

Add support for row filtering #21913

Are you sure you want to change the base?

Add support for row filtering #21913

Conversation

JaneHang9305 commented Feb 13, 2024 • edited Loading

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

jaystarshot commented Feb 14, 2024

tdcmeehan commented Apr 11, 2024

JaneHang9305 commented Apr 11, 2024

tdcmeehan commented Apr 11, 2024

tdcmeehan commented Apr 11, 2024 • edited Loading

jaystarshot commented Apr 12, 2024 • edited Loading

tdcmeehan commented Apr 12, 2024

jaystarshot commented Apr 12, 2024

tdcmeehan commented Apr 12, 2024

jaystarshot commented Apr 12, 2024 • edited Loading

jaystarshot commented Apr 12, 2024 • edited Loading

tdcmeehan commented Apr 15, 2024 • edited Loading

jaystarshot commented May 23, 2024

JaneHang9305 commented Feb 13, 2024 •

edited

Loading

tdcmeehan commented Apr 11, 2024 •

edited

Loading

jaystarshot commented Apr 12, 2024 •

edited

Loading

jaystarshot commented Apr 12, 2024 •

edited

Loading

jaystarshot commented Apr 12, 2024 •

edited

Loading

tdcmeehan commented Apr 15, 2024 •

edited

Loading