-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make a database_routing_tag argument for InfluxDB output #397
Comments
Was just thinking about this as well for allowing overrides of certain plugins to change the database they go into. @sparrc would that be a good idea to pursue? Or have you had thoughts on this in the past that you'd like to see done instead. EDIT: Just remembered that InfluxDB is not the only output. Based on the initial request above, and that I would like to send specific series to specific databases, would it be easier to implement custom tags in plugins and/or add "routing" to the influxdb output module to make database decisions based on series and/or tag? |
Update, took a look into this and what I have so far (can submit a PR if this is OK) is adding a config option like the following:
|
@oldmantaiter @schen59 I think this is a good idea, but I'd like to have a generic way of solving this that wouldn't be specific to InfluxDB. I was thinking that we could solve this by making the pass/drop/tagpass/tagdrop plugin parameters available for outputs as well. That way, you could have a configuration like this:
The reason I like this is because it's generic and can be used by all outputs, and it also makes output and plugin configurations have consistent arguments. Let me know what you guys think of that, would it solve the metric routing problem for you? |
@sparrc Sounds good, I forgot about multiplexing outputs of the same output plugin type. Did you want one of us to tackle adding the params? |
This looks good. Thanks for taking the time to look into it. However, instead of hard coded the database in the config file, can we determine it base on tags/metrics at run time? For example, I can just use the tag/metrics as database name to output to at run time. Sent from my iPhone
|
@schen59 I see what you mean, the Kafka output already has something similar called We could add an argument to the influxdb output called something like @schen59 but what should the output do if the database doesn't exist? This might be tricky because we don't want to issue a I'll make a separate issue for adding drop/pass/tagdrop/tagpass to outputs in general. @oldmantaiter if you had time to submit a PR for either of those it would be much appreciated 👍 |
@sparrc In my case, it is OK for me to discard the metrics if the database it routes to doesn't exist. As long as the database it routes to does not exist, it keeps discarding them. In order to keep those metrics, I can manually create the database beforehand. |
Allows user to configure a tag that, if it exists, will be used as the database name for the metric. X-Github-Closes influxdata#397
Allows user to configure a tag that, if it exists, will be used as the database name for the metric. X-Github-Closes influxdata#397
Now that I see this in practice (PR #400) I'm not sure I understand the utility of it, especially since using this feature would add a significant amount of processing. Tags exist to separate metrics within a database, why would you want to use tags to separate metrics into separate DBs? Is this a feature that many in the community would use? IMHO it would encourage bad practices using InfluxDB, and seems like a bit of a niche use-case |
In my case, I have several organizations and each of which has applications running on different docker containers. I use the dock plugin to collect all the container metrics and use one of the container label(indicates the name of the organization) to routes all the metrics to different databases. In this way, each organization has a influxdb database for it's own metrics and I can set up the authorization in influxdb database for the access permission. So all the members in the org can access the metrics on it's own database. Date: Mon, 30 Nov 2015 09:58:38 -0800 Now that I see this in practice (PR #400) I'm not sure I understand the utility of it, especially since using this feature would add a significant amount of processing. Tags exist to separate metrics within a database, why would you want to use tags to separate metrics into separate DBs? Is this a feature that many in the community would use? IMHO it would encourage bad practices using InfluxDB, and seems like a bit of a niche use-case ― |
@schen59 that makes sense but still doesn't explain why they need to be dynamic tags, the dynamic nature is what creates the extra processing within the influxdb output. |
@sparrc In my case, the number of organizations keeps growing so I can not have a static list of orgs. I can not depend on the static config to list all the configs for each org. |
But you need to create the databases anyways, don't you? If you don't keep track of the orgs how do you know which databases you need to create? |
Ya. That's right. I will have another service to create the influxdb databases beforehand. |
@sparrc @schen59 This seems like it could be used with #398 and "dynamically" setting up options in the config with tagpass (and a drop on the general telegraf DB for I could also throw in a bunch of disclaimers and warning messages into #400 when database_routing_tag is set, letting people know that it is not going to result in predictable performance especially on high-cardinality tags used to select the database. |
I believe this functionality is now taken care of with tagpass/tagdrop parameters available to all outputs. Feel free to re-open if not. |
Thanks for the update Sent from my iPhone
|
We have more than one influxdb instance running and would like to route datapoint base on database name instead of using filter. The filter approach require more manual updates all the time than the database name approach...can we reopen this issue as I believe this is a useful feature. |
Hi. There has been almost two years since the last news from this thread. |
@azmeuk this has not been scheduled. Could you describe your use case for this so we can better understand how this would be used? |
@russorat We have exactly the use case that @schen59 described in some detail. Currently our servers send their data directly to Influx so we can control which database the data goes to (so when we send data for organization "A" it goes to database "A"). We'd like to switch to using telegraf as an intermediary (for a variety of reasons), but can't due to this limitation. There was an assertion that the tagpass/tagdrop on outputs solved the problem. However, using this mechanism means that every time a new organization is added to our platform (aka every time we add a new customer) the configuration of all telegraf instances would have to change (since you have to name the tag value and database, in this case the organization name, in the configuration). Unless I'm missing something. Re-configuring the system's supporting software, across a clustered environment isn't really something we'd want to do when on-boarding a new customer. If we could instead provide telegraf with some way to know, from the data, which database to use, that would allow us to have provide a single configuration which would not have to change with every addition/subtraction of organization. Anyway, just figured I'd chime in that I'm not sure the use case has actually been addressed. |
@sfitts thank you for the background! I will chat with the team to see what the best way to accomplish what you are trying to do is. |
We can add new options, similar to discussed before, to lookup tag values: [[outputs.influxdb]]
database_tag = "foo"
retention_policy_tag = "bar"
[[outputs.influxdb_v2]]
bucket_tag = "foo" Keep in mind that sparrc is correct in saying that using these options will perform worse than using static tagpass/tagdrop with multiple outputs because the mixed data to the output and dynamic values prevents the creation of appropriately sized and balanced batches. |
Sorry for the delayed response, been heads down on a release. Anyway, I understand the concerns wrt performance, but as it stands now this omission makes it impossible for us to use telegraf at all as an intermediary. Perhaps it would be possible to do some batching by flushing the batch for a given database on a combination of size and time. If the distribution of incoming data across the currently active databases is evenly distributed (which for us it is) then it shouldn't be too bad (doesn't seem like you'd have to disable the batching entirely, though of course I don't know the code base). Either way I appreciate you taking a look at this again. |
Closed in #5490. |
Right now telegraf only support writing metrics to single pre-defined influxdb database. Can we add some support to dispatch different metrics to different influxdb databases dynamically? For example, based on some tags, I would like metrics which have tag 'A' to output to influxdb database 'A' and metrics which have tags 'B' to output to influxdb database 'B'.
The text was updated successfully, but these errors were encountered: