-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for row filtering #21913
base: master
Are you sure you want to change the base?
Conversation
Cherry-pick of trinodb/trino@7efb49c
@JaneHang9305 You need to add a coauthor in the commit ref- https://github.com/prestodb/presto/wiki/Review-and-Commit-guidelines#attribution. |
It seems like this filter is specified on a per-connector level. If that's the case, why not just use a connector rewrite to provide the filter? |
This is a unified solution for all connectors, and each connector can have its own rules for filtering. |
Could we achieve a unified solution through a library, which provides a connector rewrite rule that the connector can add in to add support for row level filtering? Originally I thought an approach like this might be a good idea, however I think while there are costs to the connector rewrite framework, it also has enormous benefits in the flexibility it allows and the ability to make changes that are just cumbersome to do purely in the engine. So I'm wondering, if the primary motivation is to use one unified approach, can this be accomplished through composition--connectors compose a library which adds a rewrite rule which provides a rule level filter? That way they can be shared among Hive, Hudi, Iceberg and other connectors easily. |
The same argument from a different angle: in what way is our existing SPI footprint insufficient to allow for connector-specified row level filtering? Row level filtering is listed as one of the motivations for connector rewrites--in what way does this improve on that? If it's a narrower SPI footprint, can this be achieved through composition as mentioned above? And does it really make sense to add additional SPI hooks if we already have this capability? |
I do agree that it would be better if we have a smaller SPI footprint. However, I don't think the process of adding the filter |
Let me know if I'm understanding this correctly. For a table scan, during analysis, we we check the table we want to filter, and call an SPI hook on the Connector to ask for its row filters. Then we'll process and validate these filters, and put them back into the analysis. Then, during planning, we'll check if any row level filters were specified, and add them to as filters to the table scan node. Filtering accomplished. It seems like one argument is this is the place where it ought to go, because it's security related, and security related things ought to be centralized in one place. But I would argue that connector plan rewrites are a conscious choice about splitting where things ought to go: certain things are global, and ought to be centralized in the engine, and certain things can only be inferred by the context of the connector, and can be pushed down entirely to the connector. An example is advanced filter pushdown: while this is common in Hive-like systems, other systems treat this all very differently, and there's not a whole lot of benefit for centralizing this in Presto. Likewise, because the rules which specify what rows to filter out are connector-specific, I just don't see the benefit to centralizing this and adding a larger SPI footprint when we already have the ability to do this entirely in the connector. |
Yes that's correct, however this is not an advanced filter pushdown where we need any connector-specific info regarding the tableScans. I think this pushdown could be handled in presto-main itself. Replicating the pushdown across all connectors might simply lead to code duplication. Additionally, it might make it challenging for administrators to implement central access policies across all connectors/catalogs |
I don't understand how this PR solves the problem of making central access policies less challenging. The policies must come from the connector, correct? |
Yes but they follow the same interface I guess? Maybe it doesn't make central access policies that easier, not sure. But since the filter logic is in presto-main now (lets say to avoid duplication) we would need something in the spi for the connector to communicate with the engine? |
So I guess the question is whether this filter pushdown deserves its own connector specific optimizers or is this approach perfectly fine to avoid duplication |
Let's consider the approaches by analogy. By analogy, the approach in this PR is like object-oriented inheritance. There is base functionality, which is applying the filters, but there are abstract methods, which are the SPI hooks. Similar to object oriented inheritance, you must know how the parent class uses the abstract methods in order to implement them effectively. It seems similar by analogy to a parent class with base functionality, and abstract methods that child classes may implement to use this functionality. There is indeed code reuse here. Let us consider the connector rewrite rule approach to be more like composition. In this approach, you need not know the parent interface rules. Instead, you compose a new behavior, which defines privacy filters based on arbitrary criteria, and pushes them to table scan nodes. This seems like composition--a set or list of objects that provide behaviors that can be mixed-in to the class. Code reuse can be achieved by making it so that behaviors that are mixed-in can themselves be subclassed or provide composition of behaviors of their own. The outcome seems the same between both. Because we've embarked on the composition path, I am wary of doing this through the inheritance path, since we already have the ability to do this with composition. The code reuse argument seems to be able to be solved by just making the rewrite a library that connectors mix-in/compose to existing rewrites (which seems no less onerous than connectors needing to self-specify the filters that the engine should add to provide the row-level filtering). |
Sounds good, yeah connector optimizers will be the better approach for presto. If they need to be "improved" like for eg: connector optimizers for relational plan, that functionality can be added i guess |
Cherry-pick of trinodb/trino@7efb49c
Co-authored-by: Martin Traverso
Description
SystemAccessControl/AccessControl to provide filters/projections on the base table given an authorization identifier (user/role) that the query planner injects into the plan.
Motivation and Context
Presto should support row-level filtering and data masking for more granular access controls to data.
#20572
trinodb/trino#1480
Impact
There is no performance impact on users unless onboarding the rules.
Test Plan
Test on local.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: