-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document set digest functions #8269
Conversation
850c7e1
to
4bd3226
Compare
The following example showcases how the Set Digest functions are | ||
employed to estimate the similarity between texts:: | ||
|
||
WITH text_input(id, text) AS (VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query depends on the changes from the PR #8295
Once the PR is accepted, also the documentation of make_set_digest
function needs to be correspondingly adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath Just wanted to confirm if the changes to documentation you are talking about are done already?
4bd3226
to
3afb353
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we need to ensure we link to and potentially move related functions into this doc as well (e.g. murmur3)
jaccard_index(digest1, digest2) AS jaccard_index | ||
FROM setdigest_side_by_side | ||
ORDER BY id1, id2; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a sentence here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be an appropriate sentence in this context?
Are you implying that the reader may think that the two code blocks are not related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a full sentence. E.g.
The query produces the following results:
534699f
to
6caba17
Compare
6caba17
to
d4bea30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also rebase master and then add doc to the new list by topic
jaccard_index(digest1, digest2) AS jaccard_index | ||
FROM setdigest_side_by_side | ||
ORDER BY id1, id2; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a full sentence. E.g.
The query produces the following results:
d4bea30
to
32a1ed6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now! Thank you @findinpath
3e39273
to
26bdb38
Compare
Could you look and merge @hashhar ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % a question and a nit @findinpath .
The following example showcases how the Set Digest functions are | ||
employed to estimate the similarity between texts:: | ||
|
||
WITH text_input(id, text) AS (VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath Just wanted to confirm if the changes to documentation you are talking about are done already?
yes, the showcased functionality is already fully implemented in Trino. |
26bdb38
to
b9fc92d
Compare
Thank you! This is brilliant documentation. |
This PR adresses the documentation of hash_counts function #7659 and the documentation of setdigest functions #8269
It also provides documentation on how to deal with Set Digest Trino functions which deal with the MinHash technique (used to estimate similarity between two sets).