Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mutations on data #36

Closed
C-Duv opened this issue Jun 7, 2018 · 12 comments
Closed

Allow mutations on data #36

C-Duv opened this issue Jun 7, 2018 · 12 comments

Comments

@C-Duv
Copy link

C-Duv commented Jun 7, 2018

TL;DR: Create new fields from other fields (or replace values of existing ones) via regex or other builtin fonctions, just like logstash's mutate plugin.

Context: I've just dicsovered this tool (and mtail) after trying to perform tail+parse+count data processing in existing PHP application (disclaimer: it failed).

grok_exporter seems great but there is one feature I would miss: data mutations.
The ability to alter fields before exporting to Prometheus (just like logstash's mutate plugin) would be awesome.

In my use case I am reading Apache access.log file and I want to export HTTP requests count with the following dimensions/labels:

  • Straightforward (simple grok field to Prometheus label):
    • Status code
    • HTTP verb
    • Response size
  • Add fixed labels: possible since how do i add custom labels? #8
  • Field computation required:
    • Only keep the base URL (scheme+FQDN) from referrer (eg. http://example.com/foo.asp?id=42 => http://example.com): some regex would do (or I could adapt the line matching regex)
    • Extract query string part from the referrer and add some labels from it. For example, from http://example.com/foo.asp?id=42&source=github&foo=bar I want the following fields (thus labels): id and foo. I get that dropping labels is already something grok_exporter can do, so having a mutation that creates a label for every found query parameter is fine.

Other use cases (not mine):

  • Compute hash of string.
  • Anonymize values (eg. for the the auth HTTP field, replace any value different than - by connected_user and the - value by guest).
  • Replace raw values by meaningful values (eg. 404 => Not Found).
@C-Duv
Copy link
Author

C-Duv commented Jun 7, 2018

(I failed some Markdown in my issue and my web mobile client does not support the "Edit" button, I'll fix this once I get access to a full featured client.) Done

@fstab
Copy link
Owner

fstab commented Jun 8, 2018

Thanks for the feature request. I think it's a good idea. grok_exporter uses go templates to define label values. The go template language is extensible: go provides some predefined functions, but grok_exporter could also provide some custom functions. It should be straightforward to implement some functions from the logstash mutate filter (like gsub) as custom functions for label templates.

I will look into it.

Apart from that: The cases you describe (only keep base url, extract query string labels) can also be achieved with a grok pattern.

@C-Duv
Copy link
Author

C-Duv commented Jun 8, 2018

Thanks for the positive reply, happy to hear that you find the idea is relevant :)

I knew adapting the line matching regex could help for some mutations but for the query string problem I cannot see how would it works for matching all query strings (and name them accordingly).
The best I can think of is matching a preset number of query parameters.

Bonus: Decode URL encoded query strings value cannot be done using regex matching ;)

Disclosure: I'll pair logstash to grok_exporter for now (logstash will ingest file, mutate it and feed it to grok_exporter), but as soon as I can drop logsatsh and perform my mutations directly into grok_exporter I will :)

Other disclosure: I opened a similar feature request on mtail's bugtracker ;)

@fstab
Copy link
Owner

fstab commented Jun 13, 2018

I implemented an experimental gsub function for label templates. It's not in a release yet and it's not documented, but if you compile grok_exporter from source you can try it.

Example: If .url is a value containing your example url from above http://example.com/foo.asp?id=42&source=github&foo=bar and you want to use just the value 42 from the id as a label, it should work as follows:

labels:
    id: '{{gsub .url ".*id=([^&]*).*" "\\1"}}'

The syntax is {{gsub input pattern replacement}} (see golang's text/template for general info on templates).

The pattern and replacement syntax is similar to Elastic's mutate filter's gsub (derived from Ruby's String.gsub()), except that you need to double-escape backslashes (\\ instead of \).

In the example, \\1 is the first capture group, which is ([^&]*) in the pattern. This matches 42, so the match is replaced with 42.

Let me know if this is helpful.

fstab added a commit that referenced this issue Jun 19, 2018
@andreyev
Copy link

andreyev commented Oct 5, 2018

Great! This is very helpful to us, do you plans to release this?

Thanks in advance!

@fstab
Copy link
Owner

fstab commented Oct 8, 2018

Done. Rel v0.2.6. Documentation still needs to be updated though.

@fstab fstab closed this as completed Oct 8, 2018
@fstab
Copy link
Owner

fstab commented Oct 10, 2018

Documentation updated.

@eskornev
Copy link

Can you please explain how can I transform text strings than contains "True" and "False" to boolean metrics with "gsub"?

@fstab
Copy link
Owner

fstab commented Oct 20, 2018

You should use conditionals for that instead of gsub. Example:

grok:
    additional_patterns:
    - 'BOOLEAN True|False'
metrics:
    - type: gauge
      name: boolean_test
      help: boolean test
      match: '%{BOOLEAN:bool}'
      value: '{{if eq .bool "True"}}1{{else}}0{{end}}'

@eskornev
Copy link

Thanks a lot!

@mad-ady
Copy link

mad-ady commented Oct 4, 2021

@fstab please add the following as an example to the documentation where you feel it's appropriate. I'm using it to convert units of measurement to base units (e.g. GB to bytes):
{{ if eq .unit "G" }}1000000000{{ else if eq .unit "M" }}1000000{{else if eq .unit "K"}}1000{{else}}{{.heap}}{{end}}
The data looks like heap.memory.used=8.9G,, and I'm breaking it up with this regex:

    -   help: Process heap memory used
        labels:
            host: myhost
            logfile: '{{.logfile}}'
        match: 'heap.memory.used=(?<heap>[0-9\.]+)(?<unit>[GMK]),'
        name: heap_memory_used
        type: gauge
        value: '{{ if eq .unit "G" }}{{ multiply .heap 1000000000 }}{{ else if eq .unit "M" }}{{ multiply .heap 1000000 }}{{else if eq .unit "K"}}{{ multiply .heap 1000}}{{else}}{{.heap}}{{end}}'

The tricky part is - I'm maintaining the config files from ansible, and I need to escape every '{{' and '}}' and it looks awful... Any suggestions on forcing a different separator? :)

@HarinathReddyA
Copy link

@fstab I was able to get a constant value from my logfile and set it to gauge metric , but I want it to be updated every time with latest value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants