Support for MongoDB time-series collections #173

gregorynoma · 2021-07-14T19:09:35Z

This adds support for MongoDB time-series collections, which are available starting in MongoDB 5.0. The changes include:

Switch from mgo to the official MongoDB Go driver
Add options to the MongoDB loader
- --url
- --document-per-event
- --timeseries-collection
- --retryable-writes
- --ordered-inserts
- --random-field-order
Add option to query generator
- --mongo-use-naive
Move measurement data to the top level of inserted objects
Support queries using MongoDB naive data format

1) HighCPUForHosts panics when trying to generate a query for zero hostnames 2) Option parsing code used mongo-use-native rather than mongo-use-naive for "naive" schema 3) Helper scripts didn't have a way to choose between bucketized and document-per schema

1) GroupByTimeAndPrimaryTag used 2 sorts (on b and then a) when a sort on (a,b) was needed. The double sort is probably inefficient and only correct when a stable sort is guaranteed. 2) LastPointPerHost had a predicate on the "measurement" field which did not exist in the pipeline at that stage, so the query had an empty result

CLAassistant · 2021-07-14T19:09:43Z

All committers have signed the CLA.

jonatas · 2021-08-04T18:45:21Z

Hello @gregorynoma ! thanks for the PR and all improvements related to mongodb! would you mind fixing the mongo test errors for the broken build?

# github.com/timescale/tsbs/pkg/query [github.com/timescale/tsbs/pkg/query.test]
1440pkg/query/mongo_test.go:23:20: cannot use "github.com/globalsign/mgo/bson".M literal (type "github.com/globalsign/mgo/bson".M) as type primitive.M in append
1441FAIL	github.com/timescale/tsbs/pkg/query [build failed]
1442?   	github.com/timescale/tsbs/pkg/query/config	[no test files]
1443?   	github.com/timescale/tsbs/pkg/query/factories	[no test files]
1444?   	github.com/timescale/tsbs/pkg/targets	[no test files]

* Use the official MongoDB Go Driver rather than mgo * Add timeseries-collection, retryable-writes, and ordered-inserts options

…ndomized or not

…ctions

gregorynoma · 2021-08-06T14:51:48Z

Hi @jonatas sorry about that, should be fixed now.

jonatas

Thanks for all the fixes! Just a few things that can make the project easy to maintain:

I was trying to test locally and, I see the project has the full_cycle_minitest folder with one script for each database. Maybe that would be great to have it for mongo too. It can help me to review and understand some basic scenarios from a short benchmark example.
I also see an opportunity to leave some sample-configs with the same purpose but YAML files.

jonatas · 2022-01-19T18:42:42Z

cmd/tsbs_generate_queries/databases/mongo/devops-naive.go

+// GroupByOrderByLimit populates a query.Query that has a time WHERE clause, that groups by a
+// truncated date, orders by that date, and takes a limit, e.g. in pseudo-SQL:
+//
+// SELECT minute, MAX(cpu) FROM cpu


In the implementation, the column name usage_user. Should we fix the docs too?

Suggested change

// SELECT minute, MAX(cpu) FROM cpu

// SELECT minute, MAX(usage_user) FROM cpu

gregorynoma · 2022-06-21T16:12:48Z

Hey @jonatas, I added those files you suggested as well as incorporated some additional changes we made since the original PR!

jonatas · 2022-06-23T13:10:34Z

Thank you, @gregorynoma! I'll review it again!

lcasassa · 2023-12-03T10:18:28Z

Hi All, Any updates on this? Looking forward to the benchmark results!

y123456yz · 2024-07-11T09:51:19Z

This PR is needed. Currently your code doesn't actually support MongoDB at all, but your documentation does, which is misleading.

jonatas · 2024-07-12T16:21:39Z

@y123456yz please, use the branch while it's not merged.

y123456yz · 2024-07-16T14:12:04Z

@y123456yz please, use the branch while it's not merged.

got it, thanks.

mdcallag added 2 commits July 12, 2021 18:43

Fix a few problems

7ed5c5a

1) HighCPUForHosts panics when trying to generate a query for zero hostnames 2) Option parsing code used mongo-use-native rather than mongo-use-naive for "naive" schema 3) Helper scripts didn't have a way to choose between bucketized and document-per schema

gregorynoma and others added 3 commits August 6, 2021 10:34

Add initial support for MongoDB time-series collections

18b800b

* Use the official MongoDB Go Driver rather than mgo * Add timeseries-collection, retryable-writes, and ordered-inserts options

Add --random-field-order flag to indicate if field order should be ra…

b9e4a07

…ndomized or not

Add functionality for generated cpu-only queries on time-series colle…

ec9f163

…ctions

gregorynoma force-pushed the pull-request branch from b3f59f5 to ec9f163 Compare August 6, 2021 14:40

jonatas requested changes Jan 19, 2022

View reviewed changes

seybi87 mentioned this pull request May 9, 2022

MongoDB load_queries: incomplete read of message header -> i/o timeout #204

Open

ruchen and others added 7 commits June 3, 2022 14:09

Allow sharding time-series collection

6888797

Add loader script for uncompressed data

1073b63

Rewrite lastpoint query to use DISTINCT_SCAN

035ac6c

Add full_cycle_minitest for MongoDB

4d7576a

Add mongo-cpu-only.yaml

be1bdf1

Change MAX(cpu) to MAX(usage_user)

971c5b6

Merge remote-tracking branch 'timescale/master' into pull-request

ed5fe4d

gregorynoma requested a review from jonatas June 21, 2022 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for MongoDB time-series collections #173

Support for MongoDB time-series collections #173

gregorynoma commented Jul 14, 2021

CLAassistant commented Jul 14, 2021 •

edited

Loading

jonatas commented Aug 4, 2021

gregorynoma commented Aug 6, 2021

jonatas left a comment

jonatas Jan 19, 2022

gregorynoma commented Jun 21, 2022

jonatas commented Jun 23, 2022

lcasassa commented Dec 3, 2023

y123456yz commented Jul 11, 2024

jonatas commented Jul 12, 2024

y123456yz commented Jul 16, 2024

	// SELECT minute, MAX(cpu) FROM cpu
	// SELECT minute, MAX(usage_user) FROM cpu

Support for MongoDB time-series collections #173

Are you sure you want to change the base?

Support for MongoDB time-series collections #173

Conversation

gregorynoma commented Jul 14, 2021

CLAassistant commented Jul 14, 2021 • edited Loading

jonatas commented Aug 4, 2021

gregorynoma commented Aug 6, 2021

jonatas left a comment

Choose a reason for hiding this comment

jonatas Jan 19, 2022

Choose a reason for hiding this comment

gregorynoma commented Jun 21, 2022

jonatas commented Jun 23, 2022

lcasassa commented Dec 3, 2023

y123456yz commented Jul 11, 2024

jonatas commented Jul 12, 2024

y123456yz commented Jul 16, 2024

CLAassistant commented Jul 14, 2021 •

edited

Loading