Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging: Add ability to deduplicate log messages #48

Merged
merged 1 commit into from
Jul 26, 2017
Merged

Conversation

ekimekim
Copy link
Contributor

@ekimekim ekimekim commented Jul 21, 2017

Motivating example is to not excessively log the same http 4xx code over and over from the same ip,
as this doesn't add info and bloats our log volume.

This requires some abuse of logrus to work, but we do this via a formatter which can either emit the original message,
or the empty string to log no message (this is harmless). Later, a seperate goroutine will emit a new log
that logs the original log and how many times it occurred.

We always log the first time we see a given message, then start a timer. If we see the message again at least once in that
time frame, we log at the end of the time frame how many times.
Note this means we can say "repeated once", which is a little confusing but preferable to displaying the original message
unchanged but time delayed, which can be very misleading as there's no indication of the time delay.

Fixes https://github.com/weaveworks/service-conf/issues/968

@ekimekim ekimekim requested a review from jml July 21, 2017 04:02
Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are just about linter errors. One bit I'm unsure about.

Rest looks great, thanks.


"golang.org/x/net/context"

log "github.com/Sirupsen/logrus"
"github.com/weaveworks/common/user"
)

const (
DEFAULT_DEDUPE_INTERVAL = time.Minute

This comment was marked as abuse.

// already seen, increment count and do not log
ec.count++
f.seen[key] = ec
return false

This comment was marked as abuse.

// 00:01:00 INFO Repeated 3 times: User 123 did xyz
// The interval argument controls how many seconds to wait for additional messages to arrive before reporting.
// Increase it to deduplicate more aggressively, decrease it to lower latency from a log occurring to it appearing.
// Set it to 0 to pick a sensible default value (reccomended).

This comment was marked as abuse.

// you would get:
// 00:00:00 INFO User 123 did xyz
// 00:01:00 INFO Repeated 3 times: User 123 did xyz
// The interval argument controls how many seconds to wait for additional messages to arrive before reporting.

This comment was marked as abuse.

// We use a special Formatter to either format the log using the original formatter, or to return ""
// so nothing will be written for that event. The repeated entries are later logged along with a field flag
// that tells the formatter to ignore the message.
stdLogger := log.StandardLogger()

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@leth
Copy link
Contributor

leth commented Jul 26, 2017

IIRC the motivator for this was the 429 logs coming from cortex.
We've since added the series name to the log
Would each different series log message be throttled separately?
Would that mean that this doesn't help with the 429s?

@ekimekim
Copy link
Contributor Author

Yes, this would mean each series is still logged, though if the same series were logged more than once per minute it would still help. We could:

  • Increase the dedup interval in authfe so that the same series won't be logged again for, say, 10 minutes instead of 1. That's probably a larger latency than I'm comfortable with, however.
  • Add some functionality to specify 'fields to ignore' in the dedup, though that massively complicates things and could result in surprising behaviour, and we'd still need to refactor the logs in question to only include the series in the field data and NOT the message, since we really do need to discriminate on the message text.

Motivating example is to not excessively log the same http 4xx code over and over from the same ip,
as this doesn't add info and bloats our log volume.

This requires some abuse of logrus to work, but we do this via a formatter which can either emit the original message,
or the empty string to log no message (this is harmless). Later, a seperate goroutine will emit a new log
that logs the original log and how many times it occurred.

We always log the first time we see a given message, then start a timer. If we see the message again at least once in that
time frame, we log at the end of the time frame how many times.
Note this means we can say "repeated once", which is a little confusing but preferable to displaying the original message
unchanged but time delayed, which can be very misleading as there's no indication of the time delay.
@ekimekim
Copy link
Contributor Author

@jml PTAL, and what are your thoughts on #48 (comment)?

@jml
Copy link
Contributor

jml commented Jul 26, 2017

what are your thoughts on #48 (comment)?

First, that merging this PR as-is is an incremental improvement so we should just do it.

I think @ekimekim's analysis is spot on.

The recent change adds both the series name (e.g. http_request_duration_seconds) and the full metric label set (e.g. http_request_duration_seconds{path="/foo/bar",method="get"}) in the log. The latter was extremely useful in diagnosing the client-side issue and communicating that to the customer, but I only really needed a couple of examples.

I'd be happy to refactor the logging code in Cortex to work with whatever we decide.

I wonder if we can get away with something simpler than a 'fields to ignore' that massively complicates things, e.g.

  • a whitelist of fields to respect, rather than a blacklist of things to ignore. Sure, this will cause us to miss some things, but we can iterate as we discover them.
  • a convention such that fields beginning with _ get ignored (I like this less, because it's harder to explain for open source projects like Cortex, but maybe it could be a thing)

@jml
Copy link
Contributor

jml commented Jul 26, 2017

Does that help at all?

@ekimekim
Copy link
Contributor Author

The latter was extremely useful in diagnosing the client-side issue and communicating that to the customer, but I only really needed a couple of examples.

Sounds like a good use case for a DEBUG level log instead of an INFO, surely?

As an aside, do we have the capability to, say, log at DEBUG at the service but only forward INFO and above to loggly? Then we could still get debug logs with kubectl if needed.

Anyway, merging this.

@ekimekim ekimekim merged commit a23f449 into master Jul 26, 2017
@ekimekim ekimekim deleted the mike/log-dedup branch July 26, 2017 15:59
bboreham added a commit that referenced this pull request Mar 22, 2018
This reverts commit a23f449.

It has never de-duplicated a single line for us in production.
bboreham added a commit that referenced this pull request Oct 19, 2021
This reverts commit a23f449.

It has never de-duplicated a single line for us in production.
bboreham added a commit that referenced this pull request Oct 19, 2021
Revert "logging: Add ability to deduplicate log messages (#48)"
yeya24 pushed a commit to yeya24/common that referenced this pull request Jun 12, 2024
Use strings.Replacer instead of hand-rolled escapeString
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants