Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Identity API to QueryCtx and ConnectorQueryCtx #10107

Open
majetideepak opened this issue Jun 7, 2024 · 6 comments
Open

Add Identity API to QueryCtx and ConnectorQueryCtx #10107

majetideepak opened this issue Jun 7, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@majetideepak
Copy link
Collaborator

majetideepak commented Jun 7, 2024

Description

Presto clients can provide extra credentials to a connector. See https://prestodb.io/docs/current/develop/client-protocol.html#client-request-headers.
A related Prestissimo issue that requires these is prestodb/presto#22849

The Presto protocol sends this information as part of the TaskUpdateRequest.
https://github.com/prestodb/presto/blob/49f6566012bc67d9c06e48e74e1d2d6b4029b6e1/presto-native-execution/presto_cpp/presto_protocol/presto_protocol.h#L3086

In Presto Java, these details are persisted in an Identity object under the security SPI. https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/security/Identity.java

The proposal is to add a similar API in Velox with only the minimal user and credentials fields.
These fields are generic (user is std::string, credentials is std::unordered_map) and other clients can also use them for similar needs.
The Presto Java Identity class has other fields (principal, roles, etc.) that are used for other use cases such as Kerberos support.
These will be added later when needed.

@majetideepak majetideepak added the enhancement New feature or request label Jun 7, 2024
@majetideepak majetideepak self-assigned this Jun 7, 2024
@majetideepak majetideepak changed the title Add Identity API Add Identity API to QueryCtx and ConnectorQueryCtx Jun 7, 2024
@mbasmanova
Copy link
Contributor

@majetideepak Deepak, would you describe some use cases that need this information? Adding Identity struct is not enough I assume. What are the next steps to provide meaningful functionality?

@majetideepak
Copy link
Collaborator Author

@mbasmanova Masha, the immediate need for this is in the Arrow Flight Connector implementation. The authors requested this in the Presto issue here prestodb/presto#22849
From the issue description, the Identity struct should be sufficient for now since the requirement is for the users to be able to specify the user credentials via extraCredentials.

@ashkrisk, @tdcmeehan, can you confirm? Thanks.

@ashkrisk
Copy link
Contributor

@mbasmanova without an Identity API, the only way for connectors to read authentication credentials is through the catalog configuration file. This means that all presto users have the same level of access to the data source, which is not desirable in our case.

Specifying extraCredentials allows multiple users connecting to the same presto cluster to connect to the same data source, but access only the subset of data they are authorized to.

I think adding an "identity" member to the QueryCtx with a "user" and "credentials" field is good enough to address this.

@mbasmanova
Copy link
Contributor

@majetideepak @ashkrisk Are you saying that this new piece of information will be used only in Flight Arrow connector that's being developed outside of Velox codebase? If that's the case, it might be difficult to maintain this piece of code in the library since noone understands how it is being used. Would it be possible to add some comments in the code to explain why this piece is needed to prevent folks from accidentally deleting this as "dead code"?

Also, there are usually some security rules about handling sensitive information like user name and credentials. In particular for multi-tenant systems. Wondering how are you thinking about following best practices here.

@majetideepak
Copy link
Collaborator Author

@mbasmanova I will add some details on the usage. I will include some notes on the security rules/implications, especially in a multi-tenant system as well. I need to look at how Presto implements this.

@hitarth
Copy link
Collaborator

hitarth commented Sep 17, 2024

This feature would be needed in Parquet Reader column level decryption. Parquet files supports columns to encrypted with a key. We at Uber would be using the user identity to verify if the user has access to encrypted column via a KMS service and use that information decrypt the column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants