Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Presto functions #2262

Open
mbasmanova opened this issue Aug 11, 2022 · 48 comments
Open

Add Presto functions #2262

mbasmanova opened this issue Aug 11, 2022 · 48 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@mbasmanova
Copy link
Contributor

mbasmanova commented Aug 11, 2022

Velox includes many of the PrestoSQL functions, but a few are still missing. It would great to add these.

Function coverage map: https://facebookincubator.github.io/velox/functions/coverage.html

A subset of missing functions that would be most helpful to add.

Array functions

all_match (lambda function) #3356
any_match (lambda function)
array_average #2434
array_frequency #3807
array_has_duplicates #3320
array_normalize
array_remove
array_union
flatten
repeat
sequence
shuffle #3404
zip_with (lambda function) #2685

Map functions

map_from_entries #3417
map_normalize #9086
map_zip_with (lambda function) #2711

JSON functions

is_json_scalar #2291
json_array_contains #2299
json_array_get
json_array_length #2294
json_extract #5269
json_format #3525
json_parse #3663
json_size #3413

String functions

split_to_map
split_to_multimap
strrpos

Regular expression functions

regexp_split

Date and Time functions

timezone_hour
timezone_minute
week #2287
week_of_year #2287

Mathematical functions

truncate

@mbasmanova mbasmanova added enhancement New feature or request good first issue Good for newcomers labels Aug 11, 2022
@mbasmanova
Copy link
Contributor Author

CC: @majetideepak @aditi-pandit

@pramodsatya
Copy link
Collaborator

pramodsatya commented Aug 15, 2022

Thanks for sharing the list of missing functions. Following functions have been added:
Date and Time functions
week, week_of_year: #2287

JSON functions
is_json_scalar: #2291
json_array_length: #2294
json_array_contains: #2299

Working on the following functions:
Mathematical functions
truncate (shelved till decimal to double cast is supported)

JSON functions
json_array_get (Not implementing because the usage of this function is not recommended)

@jwyles-ahana
Copy link
Contributor

I am going to start working on array_union.

@aditi-pandit
Copy link
Collaborator

aditi-pandit commented Aug 23, 2022

I am going to start working on array_union.

There was a prior PR for array_union #867 that was abandoned. Maybe you can check with @kagamiori about it.

@kagamiori
Copy link
Contributor

I am going to start working on array_union.

There was a prior PR for array_union #867 that was abandoned. Maybe you can check with @kagamiori about it.

@aditi-pandit Thank you for bringing up this! It was a PR I didn't finish. I originally planned to rewrite it as a simple function once it's supported. (I just confirmed that it's not supported yet currently.) Do you need this function soon? Please feel free to take it over, or let me know if you want me to finish #867.

@aditi-pandit
Copy link
Collaborator

There isn't an urgency for the array_union function rightaway. We were just picking stuff from Masha's list above.

@jwyles-ahana
Copy link
Contributor

I will leave array_union to @kagamiori and instead start on array_average instead.

@gosharz
Copy link
Contributor

gosharz commented Oct 17, 2022

Thanks for sharing the list of missing functions. Following functions have been added: Date and Time functions week, week_of_year: #2287

JSON functions is_json_scalar: #2291 json_array_length: #2294 json_array_contains: #2299

Working on the following functions: Mathematical functions truncate (shelved till decimal to double cast is supported)

JSON functions json_array_get (Not implementing because the usage of this function is not recommended)

Hi @pramodsatya!

Looking for functions to pick up. Wondering if you are still working on truncate?

Cheers,
Gosh

@pramodsatya
Copy link
Collaborator

Hi @gosharz, I am not working on truncate function. Thanks for checking.

@gosharz
Copy link
Contributor

gosharz commented Oct 17, 2022

@pramodsatya mind if I pick it up?

@pramodsatya
Copy link
Collaborator

No, please go for it. Thank you!

@pramodsatya mind if I pick it up?

@gosharz
Copy link
Contributor

gosharz commented Oct 17, 2022

Adding truncate: #2862

@gosharz
Copy link
Contributor

gosharz commented Oct 20, 2022

Will also give a try to strrpos if nobody minds :)

@gosharz
Copy link
Contributor

gosharz commented Oct 20, 2022

Here we go: #2903

@darrenfu
Copy link
Contributor

darrenfu commented Dec 1, 2022

Hi @mbasmanova,

I'd like to claim this array function first:
array_has_duplicates: #3397

@darrenfu
Copy link
Contributor

darrenfu commented Dec 2, 2022

Hi @mbasmanova,

I'd like to claim this array function first: array_has_duplicates: #3397

Looks like there is a duplicate WIP PR on the same udf, array_has_duplicates: #3320

I switched to shuffle: #3404 (ready for review)

@czentgr
Copy link
Collaborator

czentgr commented Mar 17, 2023

Hello @mbasmanova,

I'm claiming functions:
timezone_hour
timezone_minute
current_date
current_time

@duanmeng
Copy link
Collaborator

Hi @mbasmanova
I'm claiming any_match (lambda function). I am working on all_match (lambda function) #3356, and will continue to work on any_match #4327 once #3356 is merged.

@mbasmanova
Copy link
Contributor Author

@duanmeng

I'm claiming any_match (lambda function).

Sounds great.

@svm1
Copy link
Collaborator

svm1 commented Apr 18, 2023

Hi @mbasmanova, I would like to claim the json_parse function.

@mbasmanova
Copy link
Contributor Author

@svm1 Looks like json_parse was added in #3663

@svm1
Copy link
Collaborator

svm1 commented Apr 18, 2023

Thanks @mbasmanova, must've missed that. Then may I claim theflatten array function? Doesn't look like it's been added yet.

@svm1
Copy link
Collaborator

svm1 commented Apr 19, 2023

Hi @mbasmanova, I would actually like to take the split_to_map string function first instead if that's alright.

@mbasmanova
Copy link
Contributor Author

Hi @mbasmanova, I would actually like to take the split_to_map string function first instead if that's alright.

That's fine. Thanks.

@svm1
Copy link
Collaborator

svm1 commented Apr 19, 2023

Hi @mbasmanova, I would also like to claim the following functions:

from_iso8601_date
from_iso8601_timestamp
current_timezone

@SANTHOSH-MAMIDISETTI
Copy link

hello all , seems like a lot has been done !. Is there anything that I could be able to work on ? , I am a newbie to opensource , but I believe I have good skills in C++ , C , Python and such , I hope @mbasmanova or someone would be able to help me soon , cheers!

@dusx1981
Copy link

dusx1981 commented Jul 25, 2023

I want to join this interesting work,
My code analysis about Velox compilation execution: Velox--compile
@mbasmanova
I'm claiming array_remove

@mbasmanova
Copy link
Contributor Author

@dusx1981 Welcome. FYI, someone might be already working on array_remove: #5538

@mbasmanova
Copy link
Contributor Author

@SANTHOSH-MAMIDISETTI @dusx1981 Welcome, folks. Would you provide some context re: your interest in Velox. Are you part of the teams that use Velox? If so, what are these team do?

@dusx1981
Copy link

@SANTHOSH-MAMIDISETTI @dusx1981 Welcome, folks. Would you provide some context re: your interest in Velox. Are you part of the teams that use Velox? If so, what are these team do?

We are working on a distributed database, and we need to use things like ICompiledCall.

@dusx1981
Copy link

Which array related functions can I claim?

@mbasmanova
Copy link
Contributor Author

@dusx1981 Curious, which database do you work on? BTW, development of codegen in Velox has been paused long time ago.

Consider, adding a family of map_top_n_xxx functions: https://prestodb.io/docs/current/functions/map.html#map_top_n_keys

@dusx1981
Copy link

dusx1981 commented Jul 26, 2023

@dusx1981 Curious, which database do you work on? BTW, development of codegen in Velox has been paused long time ago.

Consider, adding a family of map_top_n_xxx functions: https://prestodb.io/docs/current/functions/map.html#map_top_n_keys

We are a project being worked on by a studio, and we are currently in the development stage, and we are also soliciting the name of the database.

Do you mean adding functions to this framework?presto

@mbasmanova
Copy link
Contributor Author

We are a project being worked on by a studio,

Curious which studio is this?

Do you mean adding functions to this framework?presto

This link is in Chinese, which I unfortunately cannot read. You asked "Which array related functions can I claim?" and I suggested to pick up map_top_n_xxx Presto functions.

@mbasmanova
Copy link
Contributor Author

We are choosing a solution now, and we don’t know what implementation solution to choose for a language like GO. Can you give some guidance.

Sure. I would need to understand a bit more about the system you are building to advise. What kind of problems are you looking to solve and what is your tentative solution?

@dusx1981
Copy link

We are choosing a solution now, and we don’t know what implementation solution to choose for a language like GO. Can you give some guidance.

Sure. I would need to understand a bit more about the system you are building to advise. What kind of problems are you looking to solve and what is your tentative solution?

I just have a question, we want to use Go to implement a set of JVM compilation and execution logic like Presto. But you know that Java can directly generate bytecode at runtime and load it through ClassLoader, but like Go or C++, it can only be compiled and executed statically. So we want to refer to the implementation of Velox and implement a Go version.

Volex stopped the development of codegen, is it because of performance reasons, is my idea above feasible?

@mbasmanova
Copy link
Contributor Author

@dusx1981 What is the problem you are trying to solve? Are you looking to build a more efficient / faster query engine? Where do you think the speedup / efficiency will come from?

We stopped codegen development primarily because codegen is harder to debug and develop. We do not believe it will be faster for analytical workloads.

@dusx1981
Copy link

dusx1981 commented Jul 26, 2023

@dusx1981 What is the problem you are trying to solve? Are you looking to build a more efficient / faster query engine? Where do you think the speedup / efficiency will come from?

We stopped codegen development primarily because codegen is harder to debug and develop. We do not believe it will be faster for analytical workloads.

Another question, why not perform compiling and linking operations in memory instead of writing to files, which adds IO operations? Will this have a significant impact on performance?

@mbasmanova
Copy link
Contributor Author

Another question, why not perform compiling and linking operations in memory instead of writing to files, which adds IO operations? Will this have a significant impact on performance?

@dusx1981 I suggest to open a separate GitHub issue and continue discussion there. CC: @laithsakka

@dusx1981
Copy link

Another question, why not perform compiling and linking operations in memory instead of writing to files, which adds IO operations? Will this have a significant impact on performance?

@dusx1981 I suggest to open a separate GitHub issue and continue discussion there. CC: @laithsakka

#5840

marin-ma pushed a commit to marin-ma/velox-oap that referenced this issue Dec 15, 2023
@Real-Chen-Happy
Copy link
Contributor

Hi all, I am new to OLAP databases and I am extremely interested in Velox. Is there anything that I could work on? I am thinking of adding array_frequency if nobody is currently working on it. Thanks!

@mbasmanova
Copy link
Contributor Author

@Real-Chen-Happy Welcome! I suggest to start with #3728 or one of fuzzer-found issues: https://github.com/facebookincubator/velox/issues?q=is%3Aopen+is%3Aissue+label%3Afuzzer-found

@mbasmanova
Copy link
Contributor Author

@Real-Chen-Happy array_frequency function has been added in #3807

@Real-Chen-Happy
Copy link
Contributor

@Real-Chen-Happy Welcome! I suggest to start with #3728 or one of fuzzer-found issues: https://github.com/facebookincubator/velox/issues?q=is%3Aopen+is%3Aissue+label%3Afuzzer-found

Thank you for your reply! I will start to take a look at #3728

@mbasmanova
Copy link
Contributor Author

I will start to take a look at #3728

@Real-Chen-Happy Thanks. BTW, would you introduce yourself and share a bit about where / how you use Velox? Do you know anyone in Velox community who can help you onboard to the codebase and provide guidance on your first PRs? If not, please, create a GitHub issue to ask if anyone is willing to help with that.

@Real-Chen-Happy
Copy link
Contributor

Real-Chen-Happy commented Mar 26, 2024

Yeah sure! I am Real and I have some experiences in OLTP systems. My current work is not related to Velox directly. Contributing to Velox is my personal interest because I believe the future of DBMS will be composable, and Velox will definitely play a key role. I would love to contribute my efforts in this area. I am new in this community, so let me know if anybody in this community is interested in providing some mentorship! #9262

@Sutter099
Copy link

Sutter099 commented Mar 28, 2024

Where should I find the functions that need to be supplemented? What I see in this link seems a bit outdated

@mbasmanova
Copy link
Contributor Author

@Sutter099 A coverage map might be a good place to find functions still missing in Velox:

https://facebookincubator.github.io/velox/functions/presto/coverage.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests