Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiline_grok doesn't capture unmatched multi-line events in grok_failure output #25

Closed
jitran opened this issue Jan 11, 2017 · 9 comments

Comments

@jitran
Copy link

jitran commented Jan 11, 2017

Hi okimoto,

We've been comparing fluentd with logstash for some time now, and both work well. We like logstash's grok parsing and error handling abilities, hence we're trying your plugin. We have a requirement to capture all logs even if it fails parsing.

Apache Test:

<source>
  @type tail
  <parse>
    @type grok
    grok_failure_key grokfailure
    grok_pattern %{COMBINEDAPACHELOG}
    time_format "%d/%b/%Y:%H:%M:%S %z"
  </parse>
  ...
</source>

Input 1 (expected to work):
127.0.0.1 - - [19/Dec/2016:11:30:43 +1100] "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

Produces:
{"clientip":"127.0.0.1","ident":"-","auth":"-","timestamp":"19/Dec/2016:11:52:29 +1100","verb":"GET","request":"/","httpversion":"1.0","response":"403","bytes":"3985","referrer":""-"","agent":""ApacheBench/2.3"","server":"tranj-fluentd-s3b","stack":"my-app-dev-01","application":"my-app","log_type":"filter.test","time":"2017-01-10T20:56:49Z"}

Input 2 (expected to fail):
127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

Produces:
{"message":"127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"","grokfailure":"No grok pattern matched","server":"tranj-fluentd-s3b","stack":"my-app-dev-01","application":"my-app","log_type":"filter.test","time":"2017-01-10T21:05:42Z"}

Exactly what we wanted.

Tomcat Test:

patterns file (/tmp/patterns):

MY_CATALINA_LOG (?<timestamp>%{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM)) %{JAVACLASS:class} %{WORD:method}%{GREEDYDATA:tomcatmsg}
MY_TOMCAT_LOG (?<timestamp>20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})) %{LOGLEVEL:level}\s+%{USERNAME:class}\s+%{GREEDYDATA:tomcatmsg}
MY_TOMCAT %{MY_CATALINA_LOG}|%{MY_TOMCAT_LOG}
<source>
  @type tail
  <parse>
    @type multiline_grok
    grok_failure_key grokfailure
    custom_pattern_path /tmp/patterns
    grok_pattern %{MY_TOMCAT}
    multiline_start_regexp /^(\w+\s\d+,\s\d+)|(\d+-\d+-\d+\s)/
  </parse>
  ...
</source>

Input 1 (expected to work):

Dec 16, 2016 9:29:56 AM org.apache.jasper.servlet.TldScanner scanJars
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
09:29:56.861 [localhost-startStop-1] WARN com.something.common.ServiceContext - Found application_config, profile is: test

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/

Produces:
{"timestamp":"Dec 16, 2016 9:29:56 AM","class":"org.apache.jasper.servlet.TldScanner","method":"scanJars","tomcatmsg":"\nINFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.\n09:29:56.861 [localhost-startStop-1] WARN com.something.common.ServiceContext - Found application_config, profile is: test\n\n . ____ _ __ _ \n /\\ / ' __ _ () __ __ _ \ \ \ \\n( ( )\_ | '_ | '| | ' \/ ` | \ \ \ \\n \\/ )| |)| | | | | || (| | ) ) ) )\n ' || .__|| ||| |\, | / / / /\n =========||==============|/=///_/","server":"tranj-fluentd-s3b","stack":"my-app-pdev-01","application":"my-app","log_type":"filter.test","time":"2017-01-11T01:14:20Z"}

Input 2:
127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

Produces no output, but in the td-agent.log:
2017-01-11 12:14:51 +1100 [warn]: plugin/in_tail.rb:390:block in parse_multilines: got incomplete line before first line from /var/log/filter_test.log: "127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"\n"

With Logstash's grok filter, it would capture the unmatched lines and store it in the output with grokparsefailure tag, but this plugin drops the invalid log lines.

Is it possible to make the plugin capture unmatched multiline log events?

jitran pushed a commit to jitran/fluentd that referenced this issue Jan 11, 2017
…pattern, we should still store the unmatched lines in the line buffer so that plugins such as fluent-plugin-grok-parser can report the log event as a grok parse error, see: fluent/fluent-plugin-grok-parser#25
@jitran
Copy link
Author

jitran commented Jan 11, 2017

Here's an attempt to fix the potential issue in the core code:
jitran/fluentd@019fcb1

input:

abcd efgh
asdfadsasda

output:

{"message":"abcd efgh\nasdfadsasda","grokfailure":"No grok pattern matched","server":"tranj-fluentd-s3b","stack":"my-app-pdev-01","application":"my-app","log_type":"filter.test","time":"2017-01-11T04:19:56Z"}

@okkez
Copy link
Collaborator

okkez commented Jan 11, 2017

127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

does not match multiline_start_regexp /^(\w+\s\d+,\s\d+)|(\d+-\d+-\d+\s)/.
The plugin does not handle multiline input2 as multiline log.

@jitran
Copy link
Author

jitran commented Jan 13, 2017

hi @okkez, that line wasn't meant to match, and I was hoping the plugin will capture that as a grok parse failure event. This is currently how logstash's implementation behaves. Can this plugin reflect the original behaviour?

jitran added a commit to jitran/fluentd that referenced this issue Jan 15, 2017
…pattern, we should still store the unmatched lines in the line buffer so that plugins such as fluent-plugin-grok-parser can report the log event as a grok parse error, see: fluent/fluent-plugin-grok-parser#25. To enable this, set enable_multiline_catch_all to true in the of source section
@okkez
Copy link
Collaborator

okkez commented Jan 16, 2017

This plugin cannot process lines that don't match multiline_start_regexp. (Lines processed as multiline)
Because in_tail does not pass those lines to this plugin.

@jitran
Copy link
Author

jitran commented Jan 16, 2017

I have a fix for in_tail that passes those unmatched lines to your plugin, or anything that uses multiline: fluent/fluentd#1416
There's a few odd test cases that fail to run on ruby 23 x64..while the other ruby version work fine.

jitran added a commit to jitran/fluentd that referenced this issue Jan 17, 2017
…pattern, we should still store the unmatched lines in the line buffer so that plugins such as fluent-plugin-grok-parser can report the log event as a grok parse error, see: fluent/fluent-plugin-grok-parser#25. To enable this, set enable_multiline_catch_all to true in the of source section
@okkez
Copy link
Collaborator

okkez commented Jul 3, 2017

fluent/fluentd#1421

@okkez
Copy link
Collaborator

okkez commented Jul 3, 2017

@jitran Can I close this issue?

@jitran
Copy link
Author

jitran commented Jul 3, 2017

Sure, closed now.

@jitran jitran closed this as completed Jul 3, 2017
@okkez
Copy link
Collaborator

okkez commented Jul 3, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants