Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a database_routing_tag argument for InfluxDB output #397

Closed
schen59 opened this issue Nov 26, 2015 · 24 comments
Closed

Make a database_routing_tag argument for InfluxDB output #397

schen59 opened this issue Nov 26, 2015 · 24 comments
Assignees
Labels
area/influxdb feature request Requests for new plugin and for new features to existing plugins
Milestone

Comments

@schen59
Copy link

schen59 commented Nov 26, 2015

Right now telegraf only support writing metrics to single pre-defined influxdb database. Can we add some support to dispatch different metrics to different influxdb databases dynamically? For example, based on some tags, I would like metrics which have tag 'A' to output to influxdb database 'A' and metrics which have tags 'B' to output to influxdb database 'B'.

@oldmantaiter
Copy link
Contributor

Was just thinking about this as well for allowing overrides of certain plugins to change the database they go into. @sparrc would that be a good idea to pursue? Or have you had thoughts on this in the past that you'd like to see done instead.

EDIT: Just remembered that InfluxDB is not the only output. Based on the initial request above, and that I would like to send specific series to specific databases, would it be easier to implement custom tags in plugins and/or add "routing" to the influxdb output module to make database decisions based on series and/or tag?

@oldmantaiter
Copy link
Contributor

Update, took a look into this and what I have so far (can submit a PR if this is OK) is adding a config option like the following:

[[outputs.influxdb]]
...
..
.
    [outputs.influxdb.overrides.<TAG/MEASUREMENT>]
        database = <OTHERDB>

@sparrc
Copy link
Contributor

sparrc commented Nov 28, 2015

@oldmantaiter @schen59 I think this is a good idea, but I'd like to have a generic way of solving this that wouldn't be specific to InfluxDB.

I was thinking that we could solve this by making the pass/drop/tagpass/tagdrop plugin parameters available for outputs as well. That way, you could have a configuration like this:

[[outputs.influxdb]]
database = "A"
[outputs.influxdb.tagpass]
mytag = ["A"]

[[outputs.influxdb]]
database = "B"
[outputs.influxdb.tagpass]
mytag = ["B"]

The reason I like this is because it's generic and can be used by all outputs, and it also makes output and plugin configurations have consistent arguments.

Let me know what you guys think of that, would it solve the metric routing problem for you?

@oldmantaiter
Copy link
Contributor

@sparrc Sounds good, I forgot about multiplexing outputs of the same output plugin type. Did you want one of us to tackle adding the params?

@schen59
Copy link
Author

schen59 commented Nov 28, 2015

This looks good. Thanks for taking the time to look into it. However, instead of hard coded the database in the config file, can we determine it base on tags/metrics at run time? For example, I can just use the tag/metrics as database name to output to at run time.

Sent from my iPhone

On Nov 28, 2015, at 9:31 AM, Cameron Sparr [email protected] wrote:

@oldmantaiter @schen59 I think this is a good idea, I was thinking that we could solve this by making the pass/drop/tagpass/tagdrop plugin parameters available for outputs as well. That way, you could have a configuration like this:

[[outputs.influxdb]]
database = "A"
[outputs.influxdb.tagpass]
mytag = ["A"]

[[outputs.influxdb]]
database = "B"
[outputs.influxdb.tagpass]
mytag = ["B"]
The reason I like this is because it's generic and can be used by all outputs, and it also makes output and plugin configurations have consistent arguments.

Let me know what you guys think of that, would it solve the metric routing problem for you?


Reply to this email directly or view it on GitHub.

@sparrc
Copy link
Contributor

sparrc commented Nov 28, 2015

@schen59 I see what you mean, the Kafka output already has something similar called routing_tag which uses that tag's value as the kafka routing key.

We could add an argument to the influxdb output called something like database_routing_tag that could route the metric to the database given by the tag specified.

@schen59 but what should the output do if the database doesn't exist? This might be tricky because we don't want to issue a CREATE DATABASE for every single write,

I'll make a separate issue for adding drop/pass/tagdrop/tagpass to outputs in general. @oldmantaiter if you had time to submit a PR for either of those it would be much appreciated 👍

@sparrc sparrc changed the title influxdb output dynamically dispatch metrics to different database Make a database_routing_tag argument for InfluxDB output Nov 28, 2015
@schen59
Copy link
Author

schen59 commented Nov 29, 2015

@sparrc In my case, it is OK for me to discard the metrics if the database it routes to doesn't exist. As long as the database it routes to does not exist, it keeps discarding them. In order to keep those metrics, I can manually create the database beforehand.

oldmantaiter added a commit to oldmantaiter/telegraf that referenced this issue Nov 29, 2015
Allows user to configure a tag that, if it exists, will be used as
the database name for the metric.

X-Github-Closes influxdata#397
oldmantaiter added a commit to oldmantaiter/telegraf that referenced this issue Nov 29, 2015
Allows user to configure a tag that, if it exists, will be used as
the database name for the metric.

X-Github-Closes influxdata#397
@sparrc
Copy link
Contributor

sparrc commented Nov 30, 2015

Now that I see this in practice (PR #400) I'm not sure I understand the utility of it, especially since using this feature would add a significant amount of processing.

Tags exist to separate metrics within a database, why would you want to use tags to separate metrics into separate DBs? Is this a feature that many in the community would use?

IMHO it would encourage bad practices using InfluxDB, and seems like a bit of a niche use-case

cc @oldmantaiter @schen59

@schen59
Copy link
Author

schen59 commented Nov 30, 2015

In my case, I have several organizations and each of which has applications running on different docker containers. I use the dock plugin to collect all the container metrics and use one of the container label(indicates the name of the organization) to routes all the metrics to different databases. In this way, each organization has a influxdb database for it's own metrics and I can set up the authorization in influxdb database for the access permission. So all the members in the org can access the metrics on it's own database.

Date: Mon, 30 Nov 2015 09:58:38 -0800
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [telegraf] Make a database_routing_tag argument for InfluxDB output (#397)

Now that I see this in practice (PR #400) I'm not sure I understand the utility of it, especially since using this feature would add a significant amount of processing.

Tags exist to separate metrics within a database, why would you want to use tags to separate metrics into separate DBs? Is this a feature that many in the community would use?

IMHO it would encourage bad practices using InfluxDB, and seems like a bit of a niche use-case

cc @oldmantaiter @schen59


Reply to this email directly or view it on GitHub.

@sparrc
Copy link
Contributor

sparrc commented Nov 30, 2015

@schen59 that makes sense but still doesn't explain why they need to be dynamic tags, the dynamic nature is what creates the extra processing within the influxdb output.

@schen59
Copy link
Author

schen59 commented Nov 30, 2015

@sparrc In my case, the number of organizations keeps growing so I can not have a static list of orgs. I can not depend on the static config to list all the configs for each org.

@sparrc
Copy link
Contributor

sparrc commented Nov 30, 2015

But you need to create the databases anyways, don't you? If you don't keep track of the orgs how do you know which databases you need to create?

@schen59
Copy link
Author

schen59 commented Nov 30, 2015

Ya. That's right. I will have another service to create the influxdb databases beforehand.

@oldmantaiter
Copy link
Contributor

@sparrc @schen59 This seems like it could be used with #398 and "dynamically" setting up options in the config with tagpass (and a drop on the general telegraf DB for dock_ or whatever the leading metric is for docker). I guess it wouldn't hurt to get #69 in the mix so a reload of the config presents new options without dropping any metrics.

I could also throw in a bunch of disclaimers and warning messages into #400 when database_routing_tag is set, letting people know that it is not going to result in predictable performance especially on high-cardinality tags used to select the database.

@sparrc
Copy link
Contributor

sparrc commented Feb 3, 2016

I believe this functionality is now taken care of with tagpass/tagdrop parameters available to all outputs. Feel free to re-open if not.

@sparrc sparrc closed this as completed Feb 3, 2016
@schen59
Copy link
Author

schen59 commented Feb 3, 2016

Thanks for the update

Sent from my iPhone

On Feb 2, 2016, at 9:05 PM, Cameron Sparr [email protected] wrote:

I believe this functionality is now taken care of with tagpass/tagdrop parameters available to all outputs. Feel free to re-open if not.


Reply to this email directly or view it on GitHub.

@ghost
Copy link

ghost commented Sep 26, 2016

We have more than one influxdb instance running and would like to route datapoint base on database name instead of using filter. The filter approach require more manual updates all the time than the database name approach...can we reopen this issue as I believe this is a useful feature.

@sparrc sparrc reopened this Sep 27, 2016
@danielnelson danielnelson added the feature request Requests for new plugin and for new features to existing plugins label Aug 19, 2017
@azmeuk
Copy link

azmeuk commented Apr 24, 2018

Hi. There has been almost two years since the last news from this thread.
Has something been planed?

@russorat
Copy link
Contributor

@azmeuk this has not been scheduled. Could you describe your use case for this so we can better understand how this would be used?

@sfitts
Copy link

sfitts commented Oct 20, 2018

@russorat We have exactly the use case that @schen59 described in some detail. Currently our servers send their data directly to Influx so we can control which database the data goes to (so when we send data for organization "A" it goes to database "A"). We'd like to switch to using telegraf as an intermediary (for a variety of reasons), but can't due to this limitation.

There was an assertion that the tagpass/tagdrop on outputs solved the problem. However, using this mechanism means that every time a new organization is added to our platform (aka every time we add a new customer) the configuration of all telegraf instances would have to change (since you have to name the tag value and database, in this case the organization name, in the configuration). Unless I'm missing something.

Re-configuring the system's supporting software, across a clustered environment isn't really something we'd want to do when on-boarding a new customer. If we could instead provide telegraf with some way to know, from the data, which database to use, that would allow us to have provide a single configuration which would not have to change with every addition/subtraction of organization.

Anyway, just figured I'd chime in that I'm not sure the use case has actually been addressed.

@russorat
Copy link
Contributor

@sfitts thank you for the background! I will chat with the team to see what the best way to accomplish what you are trying to do is.

@russorat russorat added this to the 1.10.0 milestone Nov 6, 2018
@danielnelson
Copy link
Contributor

We can add new options, similar to discussed before, to lookup tag values:

[[outputs.influxdb]]
  database_tag = "foo"
  retention_policy_tag = "bar"

[[outputs.influxdb_v2]]
  bucket_tag = "foo"

Keep in mind that sparrc is correct in saying that using these options will perform worse than using static tagpass/tagdrop with multiple outputs because the mixed data to the output and dynamic values prevents the creation of appropriately sized and balanced batches.

@sfitts
Copy link

sfitts commented Dec 4, 2018

Sorry for the delayed response, been heads down on a release. Anyway, I understand the concerns wrt performance, but as it stands now this omission makes it impossible for us to use telegraf at all as an intermediary.

Perhaps it would be possible to do some batching by flushing the batch for a given database on a combination of size and time. If the distribution of incoming data across the currently active databases is evenly distributed (which for us it is) then it shouldn't be too bad (doesn't seem like you'd have to disable the batching entirely, though of course I don't know the code base).

Either way I appreciate you taking a look at this again.

@danielnelson danielnelson self-assigned this Feb 25, 2019
@danielnelson
Copy link
Contributor

Closed in #5490.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/influxdb feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

No branches or pull requests

7 participants