-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Union bySegment queries can get confused about which descriptors to use #1727
Comments
Ah, the reason behind introducing UnionTimelineLookup at the historical was to distribute the merging load on both historical and broker. It seems the historical does not have enough information to work on multiple DataSources and figure out the correct one here. I think we could revert back the breaking changes and let the broker send the query for individual DS down to the historical nodes. @fjy @gianm I will make a PR for reverting them if it sounds good to you ? |
@nishantmonu51 that approach sounds good to me- I think the other alternative would be to add dataSources to SegmentDescriptors, which seems like a big change for a feature that is not very heavily used. |
Fixes apache#1727. revert to doing merging for results for union queries on broker. revert unrelated changes Add test for union query runner Add test remove unused imports fix imports fix renamed file fix test update docs.
fix #1727 - Union bySegment queries fix
When historicals get a union query with multiple datasources, they can get confused and scan segments other than the ones the broker intended. This is because they use
UnionTimeLineLookup.findEntry(interval, version)
to resolve the descriptors into Segments, but it's possible for segments in different datasources to have the same interval and version. This is actually always going to happen if the datasources were ingested with standalone realtime, since the version is fixed to the start of the hour.This shows a couple of union queries where one segment is scanned twice and one is not scanned at all (which one is scanned depends on the order of the datasources; see union.json vs union2.json): https://gist.github.com/gianm/fc3a1eba0dbafbf87720
The text was updated successfully, but these errors were encountered: