-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JUJU-3776] Log forwarding implementation #218
Conversation
- move test helper functions to end of manager_test.go file - remove unused mock logger from managerSuite
- fix up some comments - rearrange logGatherer fields - remove LogTarget.Type in test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like good progress here.
} | ||
} | ||
if err := parser.Err(); err != nil { | ||
logger.Noticef("Cannot read logs from service %q: %v", f.serviceName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from what I can tell, if we get a parser.Err() != nil here, that means we had a failure with the RingBuffer iterator, so it is probably safe to either:
- exit after printing an error
- reset the parser + get the buffer.TailIterator() and start again.
- this code is already fine as iterator.Next() should return false if the RingBuffer has reached the end, and we will never actually have an error with the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I'm asking for here is clarity on error handling, either by adding a comment, or handling the error in the way it needs to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per voice discussion, this approach is still problematic due to the buffer locking in addLog
, so addLog
is essentially blocking if the sends are slow anyway. We do want to protect against a slow target meaning we drop logs for other targets on the floor, so the updated plan is:
- Get rid of the forwarder.
- Have the gatherer create a list of ring buffer iterators, and use
Iterator.Notify()
to watch multiple iterators. (Alternative if that doesn't work out for some reason: start a goroutine with afor iterator.Next(cancel)
loop for each). - This will avoid additional buffering, as well as preventing one slow target meaning that logs for other services are lost.
Yes we can, we can share one Notify channel on all the iterators. Then you just call |
Closing this as we've identified problems with this approach, and will be taking a slightly different tack, as described in Ben's comment here. |
This PR contains the second part of #165, including the actual mechanics of log forwarding in Pebble. It builds upon #209 and #252. This includes an abstract `logClient` interface but doesn't include the specific Loki/syslog implementations - this will come in later PRs. *This is a modification of #218, designed to fix fundamental issues identified with that PR.* ### Current design For each log target, there is a single "gatherer", which receives logs from a bunch of services and writes them to a "client". The gatherer runs a control loop in a separate goroutine, which waits to receive logs on a channel, and writes these to the client. For each service, the gatherer spawns a "log puller". Each puller runs in a separate goroutine, pulling logs from an iterator on the service's ringbuffer. Pulled logs are sent to the gatherer's control loop on the shared channel. The `logClient` interface represents a client to a specific type of backend. In future PRs, we will add `lokiClient` and `syslogClient` types which implement `logClient`. `logClient` includes two methods ```go type logClient interface { Write(context.Context, servicelog.Entry) error Flush(context.Context) error } ``` Client implementations have some freedom about the semantics of these methods. - For a buffering client (e.g. HTTP), `Write` could add the log to the client's internal buffer, while `Flush` would prepare and send an HTTP request with the buffered logs. - For a non-buffering client (TCP?), `Write` could serialise the log directly to the open connection, while `Flush` would be a no-op. ### Teardown Gracefully stopping a log gatherer is a complex process with multiple steps. - At the instant we attempt to stop the gatherer, it may be in the middle of flushing its client. So, wait a small amount of time for the client to finish flushing, but cancel the flush if this takes too long. - The service may have emitted some final logs on shutdown. Give each puller some time to pull the final logs from its iterator - but again, force kill it if this is taking too long. - Once the pullers are all finished, we must have received all the logs we're gonna get, so we can safely shut down the main loop. - Do a final flush of the client to send off any remaining buffered logs. All this logic is encapsulated in the `gatherer.stop()` method. ## QA I've included some sample implementations of `logClient` [here] (https://github.com/barrettj12/pebble/blob/logfwd-fake/internals/overlord/logstate/fake.go). They just print the logs to stdout. These can be used to verify that the log forwarding mechanics work properly. Create a simple logging service, e.g. ```bash #!/bin/bash while true; do echo "Hello" sleep 1 done ``` and a simple plan using this service ```yaml services: svc1: &logsvc command: /home/jb/git/canonical/pebble/logfwd-impl2/pebble/logsvc startup: enabled override: merge svc2: *logsvc log-targets: tgt1: override: merge services: [all] type: loki location: unnecessary ``` Add the [`fake.go`] (https://github.com/barrettj12/pebble/blob/logfwd-fake/internals/overlord/logstate/fake.go) file to the repo. Comment out the following line https://github.com/canonical/pebble/blob/3e904f9d22f297b68cba2dc33c9cf8e1bbbadd90/internals/overlord/logstate/gatherer.go#L356 and replace it with e.g. ```go return &nonBufferingClient{}, nil // unbuffered return &bufferingClient{}, nil // unlimited-size buffer, will flush on timeout only return &bufferingClient{threshold: 3}, nil // buffer with max size: 3 logs ``` You might also want to change the gatherer's tick period: https://github.com/canonical/pebble/blob/3e904f9d22f297b68cba2dc33c9cf8e1bbbadd90/internals/overlord/logstate/gatherer.go#L32 Run Pebble with ``` go run ./cmd/pebble run ``` and verify the logs are printing to stdout. --- JUJU-3776
This PR contains the second part of #165, including the actual mechanics of log forwarding in Pebble. It builds upon #209 and #252. This includes an abstract
logClient
interface but doesn't include the specific Loki/syslog implementations - this will come in later PRs.Current design
For each service, there is a single "forwarder" which continually pulls logs from the service ringbuffer, and writes them to the relevant "gatherers". Each forwarder runs in a separate goroutine.
For each log target, there is a single "gatherer" which receives logs from forwarders, serialises them, prepares a request to the remote, and flushes periodically. Each gatherer runs in a separate goroutine - although the gatherer's
addLog
method is called inside a forwarder's goroutine.The gatherer uses the abstract interfaces
logBuffer
, which is responsible for serialising/storing logs in a backend-specific formatlogClient
, which is responsible for sending logs to a remote target via a backend-specific protocol.