-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix string first/last aggregator comparator #12773
Conversation
5ed7797
to
071660d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable, though I think it also changes ingest time behavior so that columns ingested after this change might produce different segments with the same input data than prior to the changes here?
I tagged this with release notes and incompatible labels so we are sure to document these behavior changes in the next release. I don't personally think we need to retain backwards compatible behavior, but it might be worth getting some additional approval.
if (o1 == null && o2 == null) { | ||
comparation = 0; | ||
} else if (o1 == null) { | ||
comparation = -1; | ||
} else if (o2 == null) { | ||
comparation = 1; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, should we just wrap this comparator in Comparator.nullsFirst
so that this part only has to deal with non-null pairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the pointer, re-wrote the comparators in a consice manner 👍
regarding the ingestion part, I've kept the ingestion comparator to be the same as before. Further, I'm not sure if we provide ability to order rows by metric values. In any case, I think the effect should only be at query time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. And, we should document the change in behavior. I think that the change won't actually be significant in terms of impacting anyone, but we should call it out in case it is and, indicate that anyone who cares about the old behavior should be doing an ORDER+LIMIT on the MAX(__time)
.
I do worry a little bit about the virtual function call overhead of the composed-chained Comparator versus a method that just does the work, but we can let profiling tell us that it's a bottleneck before we go on the warpath.
Currently, the first/last string aggregator's
getComparator
method (used in topN ranking, grouping by ordering, post agg ordering) compares two aggregator values on__time
first and then on values. This change makes that comparator to compare only on values which is the correct behavior. The numerical first/last aggregators already compare on the values.The current users who are ordering their results on the string first/last aggregator would need to augment their queries a
max(timeCol)
aggreagtor to achieve the same results.For instance, a query like
SELECT stringCol2, LATEST(stringCol, 1024) as lat FROM table GROUP BY stringCol2 ORDER BY lat
would need to be rewritten asSELECT stringCol2, LATEST(stringCol, 1024) as lat FROM table GROUP BY stringCol2 ORDER BY MAX(timeCol)
The ingestion ordering is kept as is it since it doesn't impact the query results.
This PR has: