Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REST API for remote functions RFC #25

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 219 additions & 0 deletions RFC-0007-remote-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# **RFC-0007 for Presto**

## [Title] REST API for remote functions

Proposers

* Tim Meehan
* Abe Varghese
* Joe Abraham
* Jakob Khaliqi

## [Related Issues]

* https://github.com/prestodb/presto/issues/14053

## Summary

### Dynamic functions in remote function servers

A new REST API is defined, along with a REST plugin implementation, which allows for consistent and unified metadata and execution
of remote functions in a single API definition. This API is designed to be extensible, allowing for the definition of new functions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean dynamic here instead of extensible ? Can we presume that new functions can be defined and used without a server restart ?

at runtime.


## Background

[prestodb/presto#14053](https://github.com/prestodb/presto/issues/14053) introduced the ability to define functions as being executed
in a remote server, and made changes to the planner to accommodate remote function execution. A limitation of the current implementation is that it
presumes the functions returned from the function server is purely static. However, it is common in many cloud data warehousing
systems to allow for defining remote functions at runtime through `CREATE FUNCTION` statements. This RFC proposes to extend the
remote function design to allow for dynamic function registration and execution.

Additionally, the current implementation of the remote function server plugin implementation is agnostic to the API of the function
server. The lack of a reference API, and additionally lack of documentation around function namespace managers, makes it
challenging to create a remote function server: in addition to creating a new function server API, you also need to define a new
namespace manager plugin. To allow for a more consistent and unified experience, this RFC proposes a new REST API for remote function
servers, which will allow for the definition of functions at runtime. The hope is to reduce the work required to integrate a custom
function server with Presto, firstly by reducing or eliminating the work required to write a new plugin, and secondly, by providing
a reference implementation which itself will be extensible and hopefully cover most needs.

### [Optional] Goals

* Standardize on a single preferred API for remote function servers
* Unify the metadata and execution of remote scalar functions under a single API
* Allow for the definition of scalar functions at runtime

### [Optional] Non-goals

* Deprecate existing function server APIs
* Provide support for aggregate functions or table-valued functions

## Proposed Implementation

### Design

Fundamentally, the design of the REST API for remote functions will be based on the existing `FunctionNamespaceManager` interface SPI.
Additionally, for Presto C++, currently there is no corresponding SPI for function namespace managers. This RFC proposes to
create a new REST-based implementation of the function execution framework. By creating a REST API for remote functions that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you evaluate REST vs gRPC ?

unifies the metadata and execution of remote functions, C++ deployments can similarly customize their function servers
by implementing the REST API, in the same way that current Presto Java users can implement the `FunctionNamespaceManager` SPI.

The REST API will power all `FunctionNamespaceManager` method implementations, including listing functions, retrieving function
metadata, executing functions, and providing DDL support for functions (`ADD` and `DROP` support).

#### Presto C++ special considerations

The current Presto C++ implementation does not have a `FunctionNamespaceManager` SPI. This RFC proposes to extend the Velox

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the function servers configured in a static list or is there a dynamic discovery of function servers ? Will the function server need to announce itself to a discovery service ?

function server to support the REST API for remote functions. This will allow for the definition of functions at runtime in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function server and Presto server should have independent lifetimes. If the user wanted to upgrade their function servers, then how does that proceed ?

Presto C++ deployments, and will unify function server implementations under one API.

In the future, a plugin concept for remote functions should be added to Presto C++ to allow for custom function server implementations,
however this is out of scope for this RFC.

![REST API for remote functions](RFC-0007/Diagram.png)

### Functions API

An OpenAPI specification will be created which defines a REST API for remote function servers. This API will feature the following
endpoints:

#### Functions version

> Endpoint: /v1/functions
>
> HTTP verb: HEAD
>
> Request body: empty
>
> Response body: empty

Returns the headers of the GET response. This is useful for checking the version of the API, which will be returned as a header.

#### List all functions

> Endpoint: /v1/functions
>
> HTTP verb: GET
>
> Request body: empty
>
> Response body: JSON array of function metadata objects

Returns the complete listing of functions across all schemas.

#### List functions at schema

> Endpoint: /v1/functions/{schema}
>
> HTTP verb: GET
>
> Request body: empty
>
> Response body: JSON array of function metadata objects

Returns the complete listing of all functions in the specified schema.

#### List functions with at schema with name

> Endpoint: /v1/functions/{schema}/{functionName}
>
> HTTP verb: GET
>
> Request body: empty
>
> Response body: JSON array of function metadata objects

Returns the complete listing of functions in the specified schema with the specified function name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return the function Ids as well so that they can be used in subsequent calls ?


#### Add a function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you envision this being used ? PrestoSQL could expose CREATE FUNCTION that uses this API. But then how do you give access to the function body ?

How do you make this work across function implementations in different languages ?


> Endpoint: /v1/functions/{schema}/{functionName}
>
> HTTP verb: POST
>
> Request body: JSON object representing the function to be added
>
> Response body: the function ID of the newly created function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to give more details about functionId. functionId is local to the function server. Will Presto use it at all ?


Creates a new function in the specified schema with the specified name. The function object will contain the metadata of the
function, including its arguments, return type, and other metadata. It will return an identifier representing this
specific function, which is useful to differentiate multiple functions which share the same name but have different
arguments.

#### Update a function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the e2e flow of interactions in which this API is used ?


> Endpoint: /v1/functions/{schema}/{functionName}/{functionId}
>
> HTTP verb: PUT
>
> Request body: JSON object representing the function to be updated
>
> Response body: the function ID of the newly created function

Updates the function in the specified schema with the specified name and function ID.

#### Delete a function

> Endpoint: /v1/functions/{schema}/{functionName}/{functionId}
>
> HTTP verb: DELETE
>
> Request body: empty
>
> Response body: empty

Deletes the function in the specified schema with the specified name and function ID.

#### Execute a function

> Endpoint: /v1/functions/{schema}/{functionName}/{functionId}/{version}
>
> HTTP verb: POST
>
> Request body: Presto Page of input data
>
> Response body: Presto Page of output data

Executes the function in the specified schema with the specified name and function ID. The version parameter is used to
specify the version of the function to execute, and is required to ensure a consistent version of the function is used during
query execution. The input data is passed as a Presto Page, and the output data is returned as a Presto Page.


### Function Server Plugin

A new implementation of a function namespace manager will be created which will use the Functions API to list functions,
retrieve function metadata, execute functions, and provide DDL support for functions (`ADD` and `DROP` support). It will
use the REST API defined in this RFC to delegate implementations of these capabilities to a REST server.

## [Optional] Metrics

The performance of this approach should be measured in terms of the latency of function execution. This should be comparable
to the latency of executing a function in a Thrift server.

## [Optional] Other Approaches Considered

N/A

## Adoption Plan

- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar?
- This is a new plugin and API so there is no impact to existing users.
- If we are changing behaviour how will we phase out the older behaviour?
- There is no change in behavior.
- If we need special migration tools, describe them here.
- N/A
- When will we remove the existing behaviour, if applicable.
- N/A
- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed?
- Documentation will need to be updated to include the new API and plugin. Documentation will also be added for the new
REST function namespace manager.
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
- N/A

## Test Plan

There will be unit tests for all components in the function namespace manager, as well as integration tests for the REST API reference
implementation. Additionally, there will be tests for the function server plugin to ensure that it can correctly list functions, retrieve
function metadata, execute functions, and provide DDL support for functions. Finally, there will be infrastructure tests that shwo
correctness of the function server plugin in a Presto cluster.
Binary file added RFC-0007/Diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.