Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add Apache Arrow Flight Connector #24504

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BryanCutler
Copy link
Contributor

@BryanCutler BryanCutler commented Feb 5, 2025

Description

Add Prestissimo support for Apache Arrow Flight connectors.

The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from presto-base-arrow-flight can use this connector as it's counterpart for the Prestissimo layer.

Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code by extending arrow_flight::auth::Authenticator.

Motivation and Context

RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation

Continues from #24082

Impact

Arrow Flight based connector will be supported in Prestissimo.

Test Plan

Unit Tests set up a testing Arrow Flight server and exchange data using the new connector.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Prestissimo (Native Execution)
* Add support for Apache Arrow Flight connectors :pr:`24504`

@BryanCutler BryanCutler requested review from steveburnett and a team as code owners February 5, 2025 21:47
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 5, 2025
@prestodb-ci prestodb-ci requested review from a team, pdabre12 and psnv03 and removed request for a team February 5, 2025 21:47
@steveburnett
Copy link
Contributor

Thanks for the release note!

New release note guidelines. Please remove the manual PR link in the following format from the release note entries for this PR.


:pr:`**`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@BryanCutler BryanCutler force-pushed the arrow-connector-native-24082 branch from c9b9df2 to 2418e65 Compare February 5, 2025 21:50
@BryanCutler
Copy link
Contributor Author

Great work from @Rijin-N in #24082. I didn't make any major changes, just updated the protocol to what was merged in Java from #24427, fixed some build issues, added tests for custom authentication and some minor tweaks. I'll try to highlight my changes below.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! One nit of formatting.

The `tls_certs` directory contains placeholder TLS certificates generated for unit testing the Arrow Flight Connector with TLS enabled. These certificates are not intended for production use and should only be used in the context of unit tests.

### Generating TLS Certificates
To create the TLS certificates and keys inside the `tls_certs` folder, run the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To create the TLS certificates and keys inside the `tls_certs` folder, run the following command:
To create the TLS certificates and keys inside the `tls_certs` folder, run the following command:

Add a line after this blank line to separate this text and the command. Without the line, it looks like this:

Screenshot 2025-02-05 at 4 51 27 PM

}
}

void FlightDataSource::addSplit(std::shared_ptr<ConnectorSplit> split) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some minor cleanup in this method to simply use of the serialized FlightEndpoint

# limitations under the License.
find_package(Arrow REQUIRED)
find_package(PkgConfig REQUIRED)
pkg_check_modules(ARROW_FLIGHT REQUIRED IMPORTED_TARGET GLOBAL arrow-flight)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to add GLOBAL for using PkgConfig to prevent an -larrow-flight linker error. This seems like a better general solution to reference the libraries, see https://stackoverflow.com/questions/29191855/what-is-the-proper-way-to-use-pkg-config-from-cmake for more details

return config_->get<bool>(kServerVerify, true);
}

std::optional<std::string> FlightConfig::serverSslCertificate() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was folly::Optional, changed to be consistent

@@ -0,0 +1,251 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these tests to verify adding a custom authenticator works


target_link_libraries(
presto_flight_connector_infra_test presto_protocol
presto_flight_connector_test_lib GTest::gtest GTest::gtest_main ${GLOG})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added G stuff due to linker error

virtual arrow::flight::Location getServerLocation();

virtual void setFlightServerOptions(
arrow::flight::FlightServerOptions* serverOptions) {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server options now specified by subclasses

@BryanCutler BryanCutler force-pushed the arrow-connector-native-24082 branch 2 times, most recently from f6eefb6 to 4b83c44 Compare February 5, 2025 22:12
steveburnett
steveburnett previously approved these changes Feb 5, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

#include "velox/common/base/Exceptions.h"

namespace facebook::presto::connector::arrow_flight::auth {
namespace {
Copy link
Contributor Author

@BryanCutler BryanCutler Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to be in an anonymous namespace, instead of static

@@ -351,7 +351,8 @@ std::unique_ptr<velox::connector::ConnectorSplit>
SystemPrestoToVeloxConnector::toVeloxSplit(
const protocol::ConnectorId& catalogId,
const protocol::ConnectorSplit* const connectorSplit,
const protocol::SplitContext* splitContext) const {
const protocol::SplitContext* splitContext,
const std::map<std::string, std::string>& extraCredentials) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rijin-N @elbinpallimalilibm is there a specific requirement to use these extraCredentials to authenticate a Flight client? I would think most users would prefer to authenticate the client with a token in the header, rather than use this. The Java Arrow Connector does not have this either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler the base Presto (Java) Flight Connector is a template and not a full connector, and elects not to do authentication at all. The idea is that the authentication mechanism is largely dependent on the specific Flight server, so the whole business of authentication should be dealt with by the specialized connectors.

The problem we need to solve is that different Presto users may have different sets of access permissions on the connected Arrow Flight server. In this case we need to "pass through" the Presto user's authentication to the connected Flight Server, which means we somehow need to send the credentials from the Presto coordinator to the Presto workers.

In Java, there is an Identity API which we can use through connectorSession.getIdentity().getExtraCredentials().
IBM's specialization of the base Flight connector uses this API to pass the user's token to the workers. This isn't possible in Prestissimo because Velox connectors have no concept of Identity at all, so we need to pass the token in the split.

This issue provides more context:
#22849

There's also a corresponding Velox issue:
facebookincubator/velox#10107

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the background @ashkrisk . This related issues you linked are proposing adding an Identity data structure that all prestissimo connectors can access, but it is so far unresolved. I think this is a better solution that simply passing through a map, so we should hold off of adding these extraCredentials until we can get the right API with an Identity. WDYT?
cc @majetideepak @aditi-pandit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler we need this to get the multi-user setup working (IBM's flavor of the Flight connector relies on it). You can leave this off for now as long as you're able to maintain the diff in IBM code until there's a better solution in OSS.

The native Arrow Flight connector can be used to connect to any Arrow Flight
enabled Data Source. The metadata layer is handled by the Presto coordinator
and does not need to be re-implemented in C++. Any Java connector that inherits
from `presto-base-arrow-flight` can use this connector as it's counterpart for
the Prestissimo layer.

Different Arrow-Flight enabled data sources can differ in authentication styles.
A plugin-style interface is provided to handle such cases with custom
authentication code by extending `arrow_flight::auth::Authenticator`.

RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation

Co-authored-by: Ashwin Kumar <[email protected]>
Co-authored-by: Rijin-N <[email protected]>
Co-authored-by: Nischay Yadav <[email protected]>
}

bool FlightConfig::serverVerify() {
return config_->get<bool>(kServerVerify, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed default to "false" to be consistent with Java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want the default to be true for both Java and C++, otherwise by default there's no protection against server impersonation. Having this off is good for testing, but in production it's much better to leave it on unless there's a VERY good reason not to.

Copy link
Contributor

@elbinpallimalilibm elbinpallimalilibm Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server verify is by default true in Java
image

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the default behavior for the Flight client is to raise an error if using TLS and a certificate is provided but verify is disabled, and you need to explicitly disable it when using TLS.

I didn't catch that the Java connector defaults the config to NULL, which translates to true. I think that's confusing and we should not make it nullable. I'll change Java and here to default to true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants