-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add Apache Arrow Flight Connector #24504
base: master
Are you sure you want to change the base?
[native] Add Apache Arrow Flight Connector #24504
Conversation
Thanks for the release note! New release note guidelines. Please remove the manual PR link in the following format from the release note entries for this PR.
I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link. |
c9b9df2
to
2418e65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! One nit of formatting.
The `tls_certs` directory contains placeholder TLS certificates generated for unit testing the Arrow Flight Connector with TLS enabled. These certificates are not intended for production use and should only be used in the context of unit tests. | ||
|
||
### Generating TLS Certificates | ||
To create the TLS certificates and keys inside the `tls_certs` folder, run the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create the TLS certificates and keys inside the `tls_certs` folder, run the following command: | |
To create the TLS certificates and keys inside the `tls_certs` folder, run the following command: | |
Add a line after this blank line to separate this text and the command. Without the line, it looks like this:
![Screenshot 2025-02-05 at 4 51 27 PM](https://private-user-images.githubusercontent.com/7013443/410198895-b086ec88-7b14-4375-9c97-9c439bdcf12b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMjk5MTIsIm5iZiI6MTczOTAyOTYxMiwicGF0aCI6Ii83MDEzNDQzLzQxMDE5ODg5NS1iMDg2ZWM4OC03YjE0LTQzNzUtOWM5Ny05YzQzOWJkY2YxMmIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMTU0NjUyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZjA5NGRhZjE0ZjM5OGUzMjA0YjU2NmNjOTBmMDM2OGFhM2FjYzMwZjNmY2UzZmIxOGVmYmI0YmExODZjN2E4NiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.4zVFBbJiODzSmW_j-7XlyKXWpNGYiK_dCJ30Hi31RO8)
} | ||
} | ||
|
||
void FlightDataSource::addSplit(std::shared_ptr<ConnectorSplit> split) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some minor cleanup in this method to simply use of the serialized FlightEndpoint
# limitations under the License. | ||
find_package(Arrow REQUIRED) | ||
find_package(PkgConfig REQUIRED) | ||
pkg_check_modules(ARROW_FLIGHT REQUIRED IMPORTED_TARGET GLOBAL arrow-flight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to add GLOBAL
for using PkgConfig to prevent an -larrow-flight
linker error. This seems like a better general solution to reference the libraries, see https://stackoverflow.com/questions/29191855/what-is-the-proper-way-to-use-pkg-config-from-cmake for more details
return config_->get<bool>(kServerVerify, true); | ||
} | ||
|
||
std::optional<std::string> FlightConfig::serverSslCertificate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was folly::Optional
, changed to be consistent
@@ -0,0 +1,251 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added these tests to verify adding a custom authenticator works
|
||
target_link_libraries( | ||
presto_flight_connector_infra_test presto_protocol | ||
presto_flight_connector_test_lib GTest::gtest GTest::gtest_main ${GLOG}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added G stuff due to linker error
virtual arrow::flight::Location getServerLocation(); | ||
|
||
virtual void setFlightServerOptions( | ||
arrow::flight::FlightServerOptions* serverOptions) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server options now specified by subclasses
f6eefb6
to
4b83c44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
#include "velox/common/base/Exceptions.h" | ||
|
||
namespace facebook::presto::connector::arrow_flight::auth { | ||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to be in an anonymous namespace, instead of static
@@ -351,7 +351,8 @@ std::unique_ptr<velox::connector::ConnectorSplit> | |||
SystemPrestoToVeloxConnector::toVeloxSplit( | |||
const protocol::ConnectorId& catalogId, | |||
const protocol::ConnectorSplit* const connectorSplit, | |||
const protocol::SplitContext* splitContext) const { | |||
const protocol::SplitContext* splitContext, | |||
const std::map<std::string, std::string>& extraCredentials) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rijin-N @elbinpallimalilibm is there a specific requirement to use these extraCredentials
to authenticate a Flight client? I would think most users would prefer to authenticate the client with a token in the header, rather than use this. The Java Arrow Connector does not have this either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler the base Presto (Java) Flight Connector is a template and not a full connector, and elects not to do authentication at all. The idea is that the authentication mechanism is largely dependent on the specific Flight server, so the whole business of authentication should be dealt with by the specialized connectors.
The problem we need to solve is that different Presto users may have different sets of access permissions on the connected Arrow Flight server. In this case we need to "pass through" the Presto user's authentication to the connected Flight Server, which means we somehow need to send the credentials from the Presto coordinator to the Presto workers.
In Java, there is an Identity API which we can use through connectorSession.getIdentity().getExtraCredentials()
.
IBM's specialization of the base Flight connector uses this API to pass the user's token to the workers. This isn't possible in Prestissimo because Velox connectors have no concept of Identity at all, so we need to pass the token in the split.
This issue provides more context:
#22849
There's also a corresponding Velox issue:
facebookincubator/velox#10107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the background @ashkrisk . This related issues you linked are proposing adding an Identity data structure that all prestissimo connectors can access, but it is so far unresolved. I think this is a better solution that simply passing through a map, so we should hold off of adding these extraCredentials
until we can get the right API with an Identity
. WDYT?
cc @majetideepak @aditi-pandit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler we need this to get the multi-user setup working (IBM's flavor of the Flight connector relies on it). You can leave this off for now as long as you're able to maintain the diff in IBM code until there's a better solution in OSS.
The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from `presto-base-arrow-flight` can use this connector as it's counterpart for the Prestissimo layer. Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code by extending `arrow_flight::auth::Authenticator`. RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation Co-authored-by: Ashwin Kumar <[email protected]> Co-authored-by: Rijin-N <[email protected]> Co-authored-by: Nischay Yadav <[email protected]>
4b83c44
to
49a3234
Compare
} | ||
|
||
bool FlightConfig::serverVerify() { | ||
return config_->get<bool>(kServerVerify, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed default to "false" to be consistent with Java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want the default to be true
for both Java and C++, otherwise by default there's no protection against server impersonation. Having this off is good for testing, but in production it's much better to leave it on unless there's a VERY good reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the default behavior for the Flight client is to raise an error if using TLS and a certificate is provided but verify is disabled, and you need to explicitly disable it when using TLS.
I didn't catch that the Java connector defaults the config to NULL, which translates to true. I think that's confusing and we should not make it nullable. I'll change Java and here to default to true.
Description
Add Prestissimo support for Apache Arrow Flight connectors.
The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from
presto-base-arrow-flight
can use this connector as it's counterpart for the Prestissimo layer.Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code by extending
arrow_flight::auth::Authenticator
.Motivation and Context
RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation
Continues from #24082
Impact
Arrow Flight based connector will be supported in Prestissimo.
Test Plan
Unit Tests set up a testing Arrow Flight server and exchange data using the new connector.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.