Skip to content

Commit

Permalink
Improve Document types.html setdigest Function
Browse files Browse the repository at this point in the history
Cherry-pick of trinodb/trino@c501d7e (trinodb/trino#9064)

Co-authored-by: Marius Grama <[email protected]>
  • Loading branch information
2 people authored and tdcmeehan committed Sep 18, 2023
1 parent 923b973 commit 1e7bb75
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions presto-docs/src/main/sphinx/language/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,31 @@ KHyperLogLog
A KHyperLogLog is a data sketch that can be used to compactly represents the association of two
columns. See :doc:`/functions/khyperloglog`.

SetDigest
---------

.. _setdigest_type:

``SetDigest``
^^^^^^^^^^^^^

A SetDigest (setdigest) is a data sketch structure used
in calculating `Jaccard similarity coefficient <https://wikipedia.org/wiki/Jaccard_index>`_
between two sets.

SetDigest encapsulates the following components:

- `HyperLogLog <https://wikipedia.org/wiki/HyperLogLog>`_
- `MinHash with a single hash function <http://wikipedia.org/wiki/MinHash#Variant_with_a_single_hash_function>`_

The HyperLogLog structure is used for the approximation of the distinct elements
in the original set.

The MinHash structure is used to store a low memory footprint signature of the original set.
The similarity of any two sets is estimated by comparing their signatures.

SetDigests are additive, meaning they can be merged together.

Quantile Digest
---------------

Expand Down

0 comments on commit 1e7bb75

Please sign in to comment.