count_by
and endpoints
grouping modes
#1312
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #384
This adds two grouping mode:
count_by
for Number typed columns, andendpoints
for any sets of columns which can be ordered.Technical details
endpoints
This mode is meant to be internal for now. That said, it can be called from the API if desired. The reason for making it internal is that some validation of the parameters (specifically the ordering of the endpoint tuples described below) isn't performant without multiple queries per transaction, and we don't currently have that included in our
execute_query
function. To avoid duplicating work, I'm deferring implementing that validation till we have that functionality.The way this mode works is that you give an ascending-order array of arrays where each inner array represents a tuple of values from the columns chosen for the grouping. The values do not need to exist in the columns, but they do need to be of appropriate type. That is, the bounds can be chosen between values, as long as there is space between those values for that type. Order for tuples is defined in the same way that PostgreSQL orders rows by a set of columns. This means that if you give the columns in a different order, the order of the tuples in the given array needs to change correspondingly.
The value defining the mode is
"endpoints"
, and the extra parameter"bound_tuples"
is required (and is an array of arrays where each inner array has the same number of elements as the number of columns).count_by
This mode lets a user specify a
global_min
,global_max
andcount_by
parameters, each of which should be a number (ideally withcount_by < global_max - global_min
). This will return groups satisfying those parameters in the following way:Given
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
, if the user choosesglobal_min = 2
,global_max = 17
, andcount_by = 3
, then the resulting groups will be:[2, 3, 4], [5, 6, 7], [8, 9, 10], [11, 12, 13], [14, 15]
. The parameters do not need to be integers. Internally, this sets up bounds by choosing theglobal_min
as the lowest tuple, then addingcount_by
iteratively until theglobal_max
is reached, then using theendpoints
mode internally. Note that for continuous data, this means the intervals will be greater than or equal to their lower bound, but strictly less than their upper bound.Checklist
Update index.md
).master
branch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin