Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Dynamically registered functions #9613

Closed
7 of 9 tasks
martint opened this issue Dec 22, 2017 · 6 comments
Closed
7 of 9 tasks

Enable Dynamically registered functions #9613

martint opened this issue Dec 22, 2017 · 6 comments
Labels
enhancement Roadmap A top level roadmap item

Comments

@martint
Copy link
Contributor

martint commented Dec 22, 2017

Goals

  • Allow for coexistence of functions and types with similar names but different semantics (e.g., compatibility layer for other databases like Teradata, Oracle, MySQL, Vertica, PostgreSQL, etc)
  • Allow for tighter control of what functions are available in a given deployment
  • Make it possible to implement user-defined functions and types in the future

Features

  • Configurable/mountable function plugins

Considerations and open questions

  • Function should be resolved to a handle during analysis just like tables.
  • Function resolution should be transaction aware.
  • Function metadata can be fetch using a handle on the coordinator.
  • Function implementation can be retrieved using the handle on coordinator and workers.
  • Optimizers can lookup functions using fully a specified signature.
  • Function handles should be transported to worker without requiring a full rewrite of the plan IR.
  • SPI interface should be easy to implement.
  • Need to define migration strategy for re-packaging existing function library

TODOs

References

4.34 SQL-paths

An SQL-path is a list of one or more <schema name>s that determines the search order for one of the following:
— The subject routine of a <routine invocation> whose <routine name> does not contain a .
— The user-defined type when the <path-resolved user-defined type name> does not contain a <schema name>.

4.43.3 SQL-session properties

An SQL-session has an SQL-path that is used to effectively qualify unqualified <routine name>s [...]
The SQL-path is initially set to an implementation-de ned value, but can subsequently be changed by the successful execution of a <set path statement>.
The text defining the SQL-path can be referenced by using the <general value specification> CURRENT_PATH.

6.4 <value specification> and <target specification>

  1. The declared type of CURRENT_USER, CURRENT_ROLE, SESSION_USER, SYSTEM_USER, CURRENT_CATALOG, CURRENT_SCHEMA, and CURRENT_PATH is character string. Whether the character string is fixed-length or variable-length, and its length if it is fixed-length or maximum length if it is variable-length, are implementation-de ned. The character set of the character string is SQL_IDENTIFIER. The declared type collation is the character set collation of SQL_IDENTIFIER, and the collation derivation is implicit.
  2. The value specified by CURRENT_PATH is a <schema name list> where <catalog name>s are <delimited identifier>s and the <unqualified schema name>s are <delimited identifier>s. Each <schema name> is separated from the preceding <schema name> by a <comma> with no intervening <space>s. The schemas referenced in this <schema name list> are those referenced in the SQL-path of the current SQL-session context, in the order in which they appear in that SQL-path.

10.3 <path specification>

Function

Specify an order for searching for an SQL-invoked routine.

Format

<path specification> ::= PATH <schema name list>
<schema name list> ::= <schema name> [ { <comma> <schema name> }... ]
<schema name> ::= [ <catalog name> <period> ] <unqualified schema name>
<unqualified schema name> ::= <identifier>
<catalog name> ::= <identifier>

Syntax Rules

  1. No two <schema name>s contained in <schema name list> shall be equivalent.

19.8 <set path statement>

Function

Set the SQL-path used to determine the subject routine of <routine invocation>s with unqualified <routine name>s in <preparable statement>s that are prepared in the current SQL-session by an <execute immediate statement> or a <prepare statement> and in <direct SQL statement>s that are invoked directly. The SQL-path remains the current SQL-path of the SQL-session until another SQL-path is successfully set.

Format

<set path statement> ::= SET <SQL-path characteristic>
<SQL-path characteristic> ::= PATH <value specification>

Syntax Rules

  1. The declared type of the shall be a character string type.

Access Rules

None.

General Rules

  1. Let S be <value specification> and let V be the character string that is the value of TRIM ( BOTH ' ' FROM S )
    a) If V does not conform to the Format and Syntax Rules of a <schema name list>, then an exception condition is raised: invalid schema name list specification.
    b) The SQL-path of the current SQL-session is set to V.
    NOTE 724 — A <set path statement> that is executed between a <prepare statement> and an <execute statement> has no effect on the prepared statement.

Conformance Rules

  1. Without Feature S071, “SQL paths in function and type name resolution”, Conforming SQL language shall not contain a <set path statement>.
@sopel39
Copy link
Contributor

sopel39 commented Jan 2, 2018

Make it possible to implement user-defined functions and types in the future

This is possible correctly using JARs. Could you elaborate a bit? Do you mean a more holistic approach where:

  1. User would be able to define function (which language?) in CLI/SQL
  2. Such function would be serialized and managed by Presto.
  3. It would be possible to remove such user defined function

How does planner/optimizer refer to function names (for a rewrite), given that the fully-qualified name depends on deployment configuration?

Could you give an example?

@martint
Copy link
Contributor Author

martint commented Jan 2, 2018

This is possible correctly using JARs. Could you elaborate a bit? Do you mean a more holistic approach where:

Yes, functions in languages such as Javascript, Ruby, Python or even the SQL-specific language --- basically, anything that can be properly sandboxed. Efforts such as Truffle/Graal can make this easier (and performant). See CREATE FUNCTION and CREATE TYPE in the spec.

How does planner/optimizer refer to function names (for a rewrite), given that the fully->> qualified name depends on deployment configuration?

Could you give an example?

For example, the upcoming spatial join optimizations. Or any of the desugaring optimizers if we decide to place the built-in functions into a namespace (although, those might go into a fixed namespace that's guaranteed to exist, similar to "sys" or "information_schema").

@sopel39
Copy link
Contributor

sopel39 commented Jan 3, 2018

Efforts such as Truffle/Graal can make this easier (and performant). See CREATE FUNCTION and CREATE TYPE in the spec.

I didn't expect that, but it seems that there is serious effort to support R and Python: http://gigasquidsoftware.com/blog/2017/10/22/embedded-interop-between-clojure-r-and-python-with-graalvm/.

Do you plan to embrace Truffle/Graal in the next year or so?

@findepi
Copy link
Contributor

findepi commented Jan 3, 2018

functions in languages such as Javascript, Ruby, Python or even the SQL-specific language

I like that!

@martint
Copy link
Contributor Author

martint commented Jan 3, 2018

Do you plan to embrace Truffle/Graal in the next year or so?

I'd like to start experimenting with it, but it's unlikely we'll support it for production workloads, yet. The first step is to get on Java 9.

@rongrong
Copy link
Contributor

Updated based on trinodb/trino#8

@rongrong rongrong mentioned this issue Apr 16, 2019
@aweisberg aweisberg added the Roadmap A top level roadmap item label Jul 22, 2019
@rongrong rongrong changed the title Function and type namespaces Enable Dynamically registered functions Aug 15, 2019
@martint martint closed this as completed Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Roadmap A top level roadmap item
Projects
None yet
Development

No branches or pull requests

5 participants