Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document set digest functions #8269

Merged
merged 1 commit into from
Aug 26, 2021

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Jun 11, 2021

This PR adresses the documentation of hash_counts function #7659 and the documentation of setdigest functions #8269

It also provides documentation on how to deal with Set Digest Trino functions which deal with the MinHash technique (used to estimate similarity between two sets).

@mosabua mosabua added the docs label Jun 11, 2021
@findinpath findinpath force-pushed the docs/setdigest-functions branch 3 times, most recently from 850c7e1 to 4bd3226 Compare June 16, 2021 23:11
@findinpath findinpath requested a review from nineinchnick June 16, 2021 23:12
The following example showcases how the Set Digest functions are
employed to estimate the similarity between texts::

WITH text_input(id, text) AS (VALUES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query depends on the changes from the PR #8295

Once the PR is accepted, also the documentation of make_set_digest function needs to be correspondingly adjusted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath Just wanted to confirm if the changes to documentation you are talking about are done already?

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we need to ensure we link to and potentially move related functions into this doc as well (e.g. murmur3)

jaccard_index(digest1, digest2) AS jaccard_index
FROM setdigest_side_by_side
ORDER BY id1, id2;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a sentence here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be an appropriate sentence in this context?
Are you implying that the reader may think that the two code blocks are not related?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a full sentence. E.g.

The query produces the following results:

@findinpath findinpath force-pushed the docs/setdigest-functions branch 6 times, most recently from 534699f to 6caba17 Compare June 21, 2021 20:41
@findinpath findinpath force-pushed the docs/setdigest-functions branch from 6caba17 to d4bea30 Compare June 26, 2021 23:03
@martint martint requested a review from mosabua July 1, 2021 16:14
Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rebase master and then add doc to the new list by topic

jaccard_index(digest1, digest2) AS jaccard_index
FROM setdigest_side_by_side
ORDER BY id1, id2;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a full sentence. E.g.

The query produces the following results:

@findinpath findinpath force-pushed the docs/setdigest-functions branch from d4bea30 to 32a1ed6 Compare August 8, 2021 20:29
Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now! Thank you @findinpath

@findinpath findinpath force-pushed the docs/setdigest-functions branch 2 times, most recently from 3e39273 to 26bdb38 Compare August 19, 2021 20:11
@mosabua
Copy link
Member

mosabua commented Aug 20, 2021

Could you look and merge @hashhar ?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a question and a nit @findinpath .

The following example showcases how the Set Digest functions are
employed to estimate the similarity between texts::

WITH text_input(id, text) AS (VALUES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath Just wanted to confirm if the changes to documentation you are talking about are done already?

@findinpath
Copy link
Contributor Author

@hashhar
#8269 (comment)

yes, the showcased functionality is already fully implemented in Trino.

@findinpath findinpath force-pushed the docs/setdigest-functions branch from 26bdb38 to b9fc92d Compare August 24, 2021 20:08
@hashhar hashhar merged commit a0ff0f3 into trinodb:master Aug 26, 2021
@hashhar hashhar added this to the 361 milestone Aug 26, 2021
@hashhar
Copy link
Member

hashhar commented Aug 26, 2021

Thank you! This is brilliant documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants