-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Iceberg] Add native NDV read and write support
This PR introduces the changes required to read+write the distinct value count statistics as described by Iceberg's Puffin file specification[[1]]. The change can be broken down into three main parts. - Updates to the SPI to allow connectors to define the function used to calculate a specific statistic. - The addition of 3 new functions: sketch_theta, sketch_theta_estimate, and sketch_theta_summary. - Plumbing and implementation in the Iceberg connector to support reading and writing of the NDVs [1]: https://iceberg.apache.org/puffin-spec/
- Loading branch information
Showing
37 changed files
with
1,302 additions
and
140 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,3 +35,4 @@ Functions and Operators | |
functions/teradata | ||
functions/internationalization | ||
functions/setdigest | ||
functions/sketch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
=========================== | ||
Sketch Functions | ||
=========================== | ||
|
||
Sketches are data structures that can approximately answer particular questions | ||
about a dataset when full accuracy is not required. The benefit of approximate | ||
answers is that they are often faster and more efficient to compute than | ||
functions which result in full accuracy. | ||
|
||
Presto provides support for computing some sketches available in the `Apache | ||
DataSketches`_ library. | ||
|
||
.. function:: sketch_theta(data) -> varbinary | ||
|
||
Computes a `theta sketch`_ from an input dataset. The output from | ||
this function can be used as an input to any of the other ``sketch_theta_*`` | ||
family of functions. | ||
|
||
.. function:: sketch_theta_estimate(sketch) -> double | ||
|
||
Returns the estimate of distinct values from the input sketch. | ||
|
||
.. function:: sketch_theta_summary(sketch) -> row(estimate double, theta double, upper_bound_std double, lower_bound_std double, retained_entries int) | ||
|
||
Returns a summary of the input sketch which includes the distinct values | ||
estimate alongside other useful information such as the sketch theta | ||
parameter, current error bounds corresponding to 1 standard deviation, and | ||
the number of retained entries in the sketch. | ||
|
||
|
||
.. _Apache DataSketches: https://datasketches.apache.org/ | ||
.. _theta sketch: https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.