-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for MongoDB time-series collections #173
base: master
Are you sure you want to change the base?
Conversation
1) HighCPUForHosts panics when trying to generate a query for zero hostnames 2) Option parsing code used mongo-use-native rather than mongo-use-naive for "naive" schema 3) Helper scripts didn't have a way to choose between bucketized and document-per schema
1) GroupByTimeAndPrimaryTag used 2 sorts (on b and then a) when a sort on (a,b) was needed. The double sort is probably inefficient and only correct when a stable sort is guaranteed. 2) LastPointPerHost had a predicate on the "measurement" field which did not exist in the pipeline at that stage, so the query had an empty result
Hello @gregorynoma ! thanks for the PR and all improvements related to mongodb! would you mind fixing the mongo test errors for the broken build?
|
* Use the official MongoDB Go Driver rather than mgo * Add timeseries-collection, retryable-writes, and ordered-inserts options
b3f59f5
to
ec9f163
Compare
Hi @jonatas sorry about that, should be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the fixes! Just a few things that can make the project easy to maintain:
- I was trying to test locally and, I see the project has the full_cycle_minitest folder with one script for each database. Maybe that would be great to have it for mongo too. It can help me to review and understand some basic scenarios from a short benchmark example.
- I also see an opportunity to leave some sample-configs with the same purpose but YAML files.
// GroupByOrderByLimit populates a query.Query that has a time WHERE clause, that groups by a | ||
// truncated date, orders by that date, and takes a limit, e.g. in pseudo-SQL: | ||
// | ||
// SELECT minute, MAX(cpu) FROM cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the implementation, the column name usage_user
. Should we fix the docs too?
// SELECT minute, MAX(cpu) FROM cpu | |
// SELECT minute, MAX(usage_user) FROM cpu |
Hey @jonatas, I added those files you suggested as well as incorporated some additional changes we made since the original PR! |
Thank you, @gregorynoma! I'll review it again! |
Hi All, Any updates on this? Looking forward to the benchmark results! |
This PR is needed. Currently your code doesn't actually support MongoDB at all, but your documentation does, which is misleading. |
@y123456yz please, use the branch while it's not merged. |
got it, thanks. |
This adds support for MongoDB time-series collections, which are available starting in MongoDB 5.0. The changes include: