Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-3776] Log forwarding implementation #218

Closed
wants to merge 5 commits into from

Conversation

barrettj12
Copy link
Contributor

@barrettj12 barrettj12 commented May 9, 2023

This PR contains the second part of #165, including the actual mechanics of log forwarding in Pebble. It builds upon #209 and #252. This includes an abstract logClient interface but doesn't include the specific Loki/syslog implementations - this will come in later PRs.

Current design

For each service, there is a single "forwarder" which continually pulls logs from the service ringbuffer, and writes them to the relevant "gatherers". Each forwarder runs in a separate goroutine.

For each log target, there is a single "gatherer" which receives logs from forwarders, serialises them, prepares a request to the remote, and flushes periodically. Each gatherer runs in a separate goroutine - although the gatherer's addLog method is called inside a forwarder's goroutine.

The gatherer uses the abstract interfaces

  • logBuffer, which is responsible for serialising/storing logs in a backend-specific format
  • logClient, which is responsible for sending logs to a remote target via a backend-specific protocol.

@barrettj12 barrettj12 requested a review from nvinuesa May 9, 2023 02:56
@barrettj12 barrettj12 changed the title Log forwarding implementation [JUJU-3776] Log forwarding implementation Jun 16, 2023
- move test helper functions to end of manager_test.go file
- remove unused mock logger from managerSuite
- fix up some comments
- rearrange logGatherer fields
- remove LogTarget.Type in test
@barrettj12 barrettj12 marked this pull request as ready for review July 5, 2023 02:27
@barrettj12 barrettj12 requested review from benhoyt and hpidcock July 5, 2023 02:29
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like good progress here.

}
}
if err := parser.Err(); err != nil {
logger.Noticef("Cannot read logs from service %q: %v", f.serviceName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from what I can tell, if we get a parser.Err() != nil here, that means we had a failure with the RingBuffer iterator, so it is probably safe to either:

  1. exit after printing an error
  2. reset the parser + get the buffer.TailIterator() and start again.
  3. this code is already fine as iterator.Next() should return false if the RingBuffer has reached the end, and we will never actually have an error with the parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I'm asking for here is clarity on error handling, either by adding a comment, or handling the error in the way it needs to.

internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/manager.go Show resolved Hide resolved
internals/overlord/logstate/manager.go Show resolved Hide resolved
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per voice discussion, this approach is still problematic due to the buffer locking in addLog, so addLog is essentially blocking if the sends are slow anyway. We do want to protect against a slow target meaning we drop logs for other targets on the floor, so the updated plan is:

  • Get rid of the forwarder.
  • Have the gatherer create a list of ring buffer iterators, and use Iterator.Notify() to watch multiple iterators. (Alternative if that doesn't work out for some reason: start a goroutine with a for iterator.Next(cancel) loop for each).
  • This will avoid additional buffering, as well as preventing one slow target meaning that logs for other services are lost.

internals/overlord/logstate/forwarder.go Show resolved Hide resolved
internals/overlord/logstate/forwarder.go Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
@hpidcock
Copy link
Member

hpidcock commented Jul 6, 2023

it's currently not really possible to watch multiple iterators in a single goroutine, and goroutines are cheap

Yes we can, we can share one Notify channel on all the iterators. Then you just call Next(nil) on each.

@barrettj12
Copy link
Contributor Author

Closing this as we've identified problems with this approach, and will be taking a slightly different tack, as described in Ben's comment here.

@barrettj12 barrettj12 closed this Jul 6, 2023
@barrettj12 barrettj12 deleted the logfwd-impl branch August 2, 2023 00:57
benhoyt pushed a commit that referenced this pull request Aug 22, 2023
This PR contains the second part of #165, including the actual mechanics of
log forwarding in Pebble. It builds upon #209 and #252. This includes an
abstract `logClient` interface but doesn't include the specific Loki/syslog
implementations - this will come in later PRs.

*This is a modification of #218, designed to fix fundamental issues identified
 with that PR.*

### Current design

For each log target, there is a single "gatherer", which receives logs from a
bunch of services and writes them to a "client". The gatherer runs a control
loop in a separate goroutine, which waits to receive logs on a channel, and
writes these to the client.

For each service, the gatherer spawns a "log puller". Each puller runs in a
separate goroutine, pulling logs from an iterator on the service's
ringbuffer. Pulled logs are sent to the gatherer's control loop on the shared
channel.

The `logClient` interface represents a client to a specific type of backend.
In future PRs, we will add `lokiClient` and `syslogClient` types which
implement `logClient`.

`logClient` includes two methods

```go
type logClient interface {
	Write(context.Context, servicelog.Entry) error
	Flush(context.Context) error
}
```

Client implementations have some freedom about the semantics of these
methods.

- For a buffering client (e.g. HTTP), `Write` could add the log to the
  client's internal buffer, while `Flush` would prepare and send an HTTP
  request with the buffered logs.
- For a non-buffering client (TCP?), `Write` could serialise the log directly
  to the open connection, while `Flush` would be a no-op.

### Teardown

Gracefully stopping a log gatherer is a complex process with multiple steps.

- At the instant we attempt to stop the gatherer, it may be in the middle of
  flushing its client. So, wait a small amount of time for the client to
  finish flushing, but cancel the flush if this takes too long.
- The service may have emitted some final logs on shutdown. Give each puller
  some time to pull the final logs from its iterator - but again, force kill
  it if this is taking too long.
- Once the pullers are all finished, we must have received all the logs we're
  gonna get, so we can safely shut down the main loop.
- Do a final flush of the client to send off any remaining buffered logs.

All this logic is encapsulated in the `gatherer.stop()` method.

## QA

I've included some sample implementations of `logClient` [here]
(https://github.com/barrettj12/pebble/blob/logfwd-fake/internals/overlord/logstate/fake.go).
They just print the logs to stdout. These can be used to verify that the log
forwarding mechanics work properly.

Create a simple logging service, e.g.

```bash
#!/bin/bash
while true; do
  echo "Hello"
  sleep 1
done
```

and a simple plan using this service

```yaml
services:
  svc1: &logsvc
    command: /home/jb/git/canonical/pebble/logfwd-impl2/pebble/logsvc
    startup: enabled
    override: merge
  svc2: *logsvc

log-targets:
  tgt1:
    override: merge
    services: [all]
    type: loki
    location: unnecessary
```

Add the [`fake.go`]
(https://github.com/barrettj12/pebble/blob/logfwd-fake/internals/overlord/logstate/fake.go)
file to the repo.

Comment out the following line
https://github.com/canonical/pebble/blob/3e904f9d22f297b68cba2dc33c9cf8e1bbbadd90/internals/overlord/logstate/gatherer.go#L356
and replace it with e.g.

```go
return &nonBufferingClient{}, nil          // unbuffered
return &bufferingClient{}, nil             // unlimited-size buffer, will flush on timeout only
return &bufferingClient{threshold: 3}, nil // buffer with max size: 3 logs
```

You might also want to change the gatherer's tick period:
https://github.com/canonical/pebble/blob/3e904f9d22f297b68cba2dc33c9cf8e1bbbadd90/internals/overlord/logstate/gatherer.go#L32

Run Pebble with

```
go run ./cmd/pebble run
```

and verify the logs are printing to stdout.

---

JUJU-3776
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants