Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new query property for aligning group by intervals to start times #1110

Merged
merged 1 commit into from
Jan 11, 2017
Merged

Add new query property for aligning group by intervals to start times #1110

merged 1 commit into from
Jan 11, 2017

Conversation

goakley
Copy link

@goakley goakley commented Jan 5, 2017

Allow QueryNodes to automatically align GROUP BY time intervals to the start time for each query sent to influx. This leverages the offset argument to GROUP BY time() to offset directly to the start time.

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

@nathanielc
Copy link
Contributor

@goakley On a quick read through this looks great! I'll give it a full review soon. In the meantime could you sign the CLA?

@goakley
Copy link
Author

goakley commented Jan 6, 2017

@nathanielc Signed 👍

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is great! Thanks for the PR. I have made a few small suggestions below.

@@ -192,6 +214,10 @@ func (q *Query) Dimensions(dims []interface{}) error {
}
// Add time dimension
hasTime = true
q.groupByOffsetDL.Val = dim.Offset
if q.alignGroup {
q.groupByOffsetDL.Val = q.StartTime().Sub(time.Unix(0, 0)) - dim.Offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also document this behavior? Where if both .alignGroup and a specific offset is spefified the offset becomes relative to the aligned group by time offset?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@@ -96,6 +98,9 @@ func (q *Query) StopTime() time.Time {
// Set the start time of the query
func (q *Query) SetStartTime(s time.Time) {
q.startTL.Val = s
if q.alignGroup && q.groupByOffsetDL != nil {
q.groupByOffsetDL.Val = s.Sub(time.Unix(0, 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces queries like ... GROUP BY time(1m, 1484002977413326u). Can you perform the modulo so that the offset is always in the range [-1m, 1m] when the interval is 1m?

That should make debugging easier for an end user.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

Thanks for doing the arithmetic on that, I wasn't confident it was safe to do even though it seemed reasonable. [-x,0] is covered by [0,x], so I implemented it as such.

@nathanielc
Copy link
Contributor

@goakley Thanks for the latest changes, they look good. Everything seems to be working well, but I am having a hard time finding good use cases for this feature that the normal .align() doesn't already cover.

Can you share some examples that lead you to creating this PR? Thanks

@goakley
Copy link
Author

goakley commented Jan 10, 2017

@nathanielc The trouble is that InfluxDB ignores the query dates when it determines the start times of the group by time intervals. If you execute a query with a minimum time set anywhere between 2017-01-10T18:06:00 and 2017-01-10T18:06:59 with a GROUP BY time(1m), the interval will start at 2017-01-10T18:06:00, which means a query time > 2017-01-10T18:06:30 AND time < 2017-01-10T18:08:30 will not only return 3 buckets instead of two, but 2 of those 3 buckets will be "incomplete" since InfluxDB's boundaries force the 1st and 3rd bucket to contain only 30 second of data instead of the full minute.

Here's part of a TICKscript we have for NGINX data processing:

var batchquery = batch
    |query('''
    SELECT DIFFERENCE(LAST(handled))-DIFFERENCE(LAST(accepts)) AS value
    FROM "telegraf"."default"."nginx"
    WHERE environment = 'production' GROUP BY time(1m), "host"
  ''')
        .groupBy(time(1m), 'host')
        .period(2m)
        .every(30s)
        .align()

We want to calculate the number of unhandled requests for the past minute, and without being able to change the group by time bucket alignment, every other query (the ones at 30s instead of 0s due to the every(30s)) gives partially incorrect data.

@@ -178,6 +182,21 @@ func (n *QueryNode) ChainMethods() map[string]reflect.Value {
//
// It is recommended to use QueryNode.Align and QueryNode.Offset in conjunction with
// group by time dimensions so that the time bounds match up with the group by intervals.
// To automatically align the group by intervals to the start of the query time,
// use QueryNode.AlignGroup. This is useful in more complex situations, such as when
// the groupBy period is larger than the query interval.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your example can you change this to read such as when the groupBy time period is longer than the query frequency.

And maybe add something below the TICKscript example about how for every other query the groupBy time bounds would not be aligned (the ones ending on 30s instead of 0s) without the .alignGroup call.

Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@nathanielc
Copy link
Contributor

You example makes sense to me. I added a few notes where that can be clarified in the doc comments. After that I think this will be ready to be merged. Thanks again!

@goakley
Copy link
Author

goakley commented Jan 10, 2017

@nathanielc should I resolve this conflict and squash the fixups?

@nathanielc
Copy link
Contributor

@goakley Yes please.

// (at :30 past the minute) will align to :00 seconds instead of the desired :30 seconds,
// which would create 6 group by intervals instead of 5, the first and last of which
// would only have 30 seconds of data instead of a full minute.
// If QueryNode.Offset is used in conjunction with QueryNode.AlignGroup,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QueryNode.Offset is different from the group by time offset. This should read

If the group by time offset (i.e. time(t, offset)) is used in conjunction with QueryNode.AlignGroup,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@nathanielc nathanielc merged commit f2ba187 into influxdata:master Jan 11, 2017
nathanielc added a commit that referenced this pull request Jan 11, 2017
Add new query property for aligning group by intervals to start times
@nathanielc
Copy link
Contributor

@goakley Thanks!

@goakley goakley deleted the align-group branch January 12, 2017 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants