Improve additional field type discovery #121

mp911de · 2017-10-27T14:46:26Z

Right now, additional fields (string/double-typed) in discovery mode have a high cost of type discovery because states are communicated using exceptions.

A benchmark proves this cost:

Benchmark                                   Mode  Cnt     Score    Error  Units
GelfMessageBenchmark.configuredDoubleField  avgt    5    52,522 ±  1,848  ns/op
GelfMessageBenchmark.configuredLongField    avgt    5    36,982 ±  0,614  ns/op
GelfMessageBenchmark.configuredStringField  avgt    5    18,341 ±  0,324  ns/op
GelfMessageBenchmark.discoverDoubleField    avgt    5  1520,149 ± 88,000  ns/op
GelfMessageBenchmark.discoverLongField      avgt    5    25,537 ±  0,131  ns/op
GelfMessageBenchmark.discoverStringField    avgt    5  2714,702 ± 49,806  ns/op

The text was updated successfully, but these errors were encountered:

mp911de · 2017-10-27T14:49:32Z

Improving discovery with own character scanning to discover the most appropriate type could help to improve parsing durations.

Benchmark                                   Mode  Cnt   Score   Error  Units
GelfMessageBenchmark.configuredDoubleField  avgt    5  53,310 ± 3,263  ns/op
GelfMessageBenchmark.configuredLongField    avgt    5  39,175 ± 1,899  ns/op
GelfMessageBenchmark.configuredStringField  avgt    5  19,136 ± 0,771  ns/op
GelfMessageBenchmark.discoverDoubleField    avgt    5  56,483 ± 4,365  ns/op
GelfMessageBenchmark.discoverLongField      avgt    5  29,863 ± 3,405  ns/op
GelfMessageBenchmark.discoverStringField    avgt    5   6,131 ± 0,481  ns/op

Additional field values in discovery mode are now inspected before the actual parsing to determine whether the value is qualified for long/double parsing. Empty values, values exceeding 32 chars and these containing String-only chars fall back directly to string. Long values (containing only +,- and 0-9) are parsed as such directly and double values don't require long parsing anymore. Parsing still falls back through the layers if the discovery yielded a different result than the parser understands (applies especially for hex and scientific notation double values).

mp911de added the type: enhancement A general enhancement label Oct 27, 2017

mp911de added this to the logstash-gelf 1.11.2 milestone Oct 27, 2017

mp911de closed this as completed Oct 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve additional field type discovery #121

Improve additional field type discovery #121

mp911de commented Oct 27, 2017

mp911de commented Oct 27, 2017

Improve additional field type discovery #121

Improve additional field type discovery #121

Comments

mp911de commented Oct 27, 2017

mp911de commented Oct 27, 2017