Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize TypedSet Usage #12033

Open
wenleix opened this issue Dec 7, 2018 · 3 comments
Open

Optimize TypedSet Usage #12033

wenleix opened this issue Dec 7, 2018 · 3 comments
Assignees

Comments

@wenleix
Copy link
Contributor

wenleix commented Dec 7, 2018

TypedSet are wildly used in Presto in map functions and functions with set semantics. However, use of TypedSet is sub-optimal as it has to copy the data.

There are three proposals:

  1. Add TypeSet.getBlock to allow extract data (proposed in Optimize ARRAY_UNION #12029)
  2. Make TypedSet take a constructor that says "build be a set from this existing block". (proposed in Change ARRAY_INTERSECT to use TypedSet #11984 (review))
  3. Similar to Proposal 2, but more flexible in the sense TypedSet doesn't have to hold every element in the provided block. This can be done via generalizing JsonUtil#HashTable.

Proposal 1 doesn't seem to work in every case because:

  1. Presto scalar function can use a cached PageBuilder to avoid creating new BlockBuilder in each invocation.
  2. For use case in map functions, we often require to insert into a provided block (e.g. a SingleMapBlockWriter created via beginBlockEntry()

Proposal 2 also doesn't work with the use case in map functions, as we only want to insert entry 0, 2, 4, ... into the Set.

@sopel39
Copy link
Contributor

sopel39 commented Dec 10, 2018

Just curious, why not use both proposal 1) and 2)?

@stale
Copy link

stale bot commented Dec 12, 2020

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Dec 12, 2020
@wenleix
Copy link
Contributor Author

wenleix commented Dec 21, 2020

@yingsu00 : Can you link your latest PR to this issue and close it? :)

@stale stale bot removed the stale label Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants