-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker log plugin #4773
Docker log plugin #4773
Conversation
This PR is an extension of #4360 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -228,12 +231,13 @@ func (d *DockerLogs) getContainerLogs( | |||
} | |||
return err | |||
} | |||
if len(data) > 0 { | |||
if num > 0 && len(data) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just check num > 0
as it is equal to len(data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken care of this !
How can I use this plugin for now ?? does Nightly Builds include this ? |
You can build this branch, or wait until it gets merged into master then you can use the nightly. I expect this will be released in the coming weeks |
This has been pushed to 1.10? |
I am wondering how it will show up in chronograf. Will it be available via the log viewer? |
I merged the release-1.9 branch over this PR. |
Hi @nedix , |
plugins/inputs/docker_log/client.go
Outdated
@@ -0,0 +1,63 @@ | |||
package docker_logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name this package docker_log
} | ||
|
||
func init() { | ||
inputs.Add("docker_logs", func() telegraf.Input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set this to docker_log
I'm definitely getting logs from Docker but they didn't show up in Telegraf. Disabling the Docker metrics plugin so as to eliminate possible interference between the two plugins didn't make a difference either. |
@glinton I've made the changes you've requested. Here are some logs from telegraf: docker_log,containerId=bd9f52f9e407befa4180327234c280c6dbe5602e437397bd8ec39677b64671d1,containerName=sharp_curie,host=prash-laptop log=" / # who\r\n /bin/sh: who: not found\r\n" 1543395928000000000 Docker logs : |
Alright! I'm getting docker_log on stdout. Could someone using InfluxDB verify the logs show up there as I'm not seeing anything in Prometheus yet. For anyone looking to test with stdout, add a tagdrop filter so Telegraf doesn't loop it's own output.
Should we switch to snake_case like the other docker plugin? |
I tested this with influx. "docker_log" gets created and logs come in.
|
Must be Prometheus related then. All good in that case! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @prashanthjbabu
unexpected truncate in some lines
my logs every line begins with |
That's been pulled back from 1.10 :( ? |
There are apparently still errors that need worked out that we don't have time to address before the 1.10 RC.
|
@suconghou |
fields := map[string]interface{}{} | ||
data := make([]byte, LOG_BYTES_MAX) | ||
for { | ||
num, err := logReader.Read(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of docker log's 8 byte message header when storing this in the field. Ensure it's long enough, then assign data[9:num]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that lines longer than LOG_BYTES_MAX
will not have this prefix. I guess we should be reading that log header and consuming the length sent..
They use github.com/docker/docker/stdcopy
to do so. Perhaps you could look at using that or implementing a similar solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out ! I took most of the code from stdcopy and ensured that we're writing to the telegraf Accumulator instead of an io.Writer . Let me know if this approach works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'm not sure why CI/CD is failing . I'm getting errors at some other location . Would you be able to help me out with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been testing this for a while and still get a space before the log message (different behavior than docker cli) I wonder if it needs to be:
fields["log"] = buf[stdWriterPrefixLen+1 : frameSize+stdWriterPrefixLen]
Another issue I found is when running a docker container with -t
, the stdcopy isn't valid and produces:
E! [inputs.docker_log] : Unrecognized input header: xx
The cli selectively uses stdcopy to strip the header as seen here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the tests, I believe if you merge master into your branch, they should succeed. Also, please refrain from force pushing as there is an ongoing review 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been testing this for a while and still get a space before the log message (different behavior than docker cli) I wonder if it needs to be:
fields["log"] = buf[stdWriterPrefixLen+1 : frameSize+stdWriterPrefixLen]
Another issue I found is when running a docker container with
-t
, the stdcopy isn't valid and produces:E! [inputs.docker_log] : Unrecognized input header: xx
The cli selectively uses stdcopy to strip the header as seen here
Fixed the -t issue . I added a pushTtyLogs function based on what containerInspect returns.I noticed that the extra space in the front is also present in Tty mode . I think in both cases after we read , we probably should skip the first character of the string read from the logReader . Although I'm not sure if thats the right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the first byte in the logreader is used to determine the log type, so dropping it wouldn't be ideal. It seems like it would be ok to skip one more byte on buf
before assigning it to fields["log"]
, though we should double check it works well with extra long logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this in the latest commit . I tested this as well and it seems to be working fine . Let me know if this is okay!
84aed43
to
cff8300
Compare
…re first character of string to avoid whitespace in the beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience, here is my review.
We should also think about what tests are needed, the functions creating the filters and whatnot I don't think need tests, but it would be nice to have some tests for the stream parsing.
plugins/inputs/docker_log/README.md
Outdated
- docker_log | ||
- tags: | ||
- containerId | ||
- containerName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make all fields and tags snake_case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken care of this!
plugins/inputs/docker_log/README.md
Outdated
- containerId | ||
- containerName | ||
- fields: | ||
- log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call message
to create more consistency with syslog input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
} | ||
containers, err := d.client.ContainerList(ctx, d.opts) | ||
if err != nil { | ||
log.Printf("%s : %s ", ERR_PREFIX, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are returning the error no need to log here. Make this change throughout pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
defer d.wg.Done() | ||
err := d.getContainerLogs(c, acc) | ||
if err != nil { | ||
log.Printf("%s : %s ", ERR_PREFIX, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use acc.AddError here to log the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
func (d *DockerLogs) Start(acc telegraf.Accumulator) error { | ||
d.mu.Lock() | ||
defer d.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to lock here, Start will not be called concurrently with any other functions.
I see a few issues with locking, I'll add comments but here is how you can expect your plugin to be called, these functions will never be called concurrently.
- Start(): called once
- Gather(): called one or more times if Start does not return an error
- Stop(): called once if Start does not return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed lock for Start()
plugins/inputs/docker_log/README.md
Outdated
|
||
The docker plugin uses the [Official Docker Client](https://github.com/moby/moby/tree/master/client) | ||
to gather logs from the [Engine API](https://docs.docker.com/engine/api/v1.24/). | ||
Note: This plugin works only for containers with the `json-file` or `journald` logging driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this also works with local
? I also wonder if Docker Enterprise would support this using "dual logging"? @fntlnz Do you know or have a guess?
https://docs.docker.com/config/containers/logging/configure/#limitations-of-logging-drivers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added local to readme . Not sure about the dual logging aspects of it.
return nil | ||
} | ||
|
||
func (d *DockerLogs) getContainerLogs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this something like readContainerLogs
or tailContainerLogs
to indicate that it is a long running function and not an immediately returning call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this to tailContainerLogs
Tail: "0", | ||
} | ||
logReader, err := d.client.ContainerLogs(context.Background(), container.ID, logOptions) | ||
d.addToContainerList(container.ID, logReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest moving this into the containerListUpdate
where the rest of the related functions are used. Then we will have add/remove/in functions all in one place.
Must lock d.containerList for this action, since it is called within a goroutine that is not waited for within Gather.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Moved addToContainerList to containerListUpdate and passed logReader into tailContainerLogs.
- addToContainerList already has a mutex lock for d.containerList . Is there anything else I should be adding?
if err != nil { | ||
log.Printf("%s : %s ", ERR_PREFIX, err.Error()) | ||
} | ||
d.removeFromContainerList(c.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must lock d.containerList because this is called inside a goroutine that is not waited for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added d.mu.Lock and unlock inside the removeFromContainerList . Is there anything I should be adding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is sufficient, the code that can run concurrently is the container goroutines and the Stop()
function.
d.mu.Lock() | ||
defer d.mu.Unlock() | ||
d.stopAllReaders() | ||
d.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can deadlock here because we don't release the lock until all the goroutines complete. Once we add the required locking as suggested in my other comments, the goroutines will need to wait for this lock so they can remove themselves from the containerList.
Fix is easy though, just release the lock after calling stopAllReaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved d.mu.Unlock to after stopAllReaders()
fields := map[string]interface{}{} | ||
data := make([]byte, LOG_BYTES_MAX) | ||
for { | ||
num, err := src.Read(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to wrap it with bufio.Scanner then.
} | ||
} | ||
|
||
/* Inspired from https://github.com/moby/moby/blob/master/pkg/stdcopy/stdcopy.go */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about it for now.
if err != nil { | ||
log.Printf("%s : %s ", ERR_PREFIX, err.Error()) | ||
} | ||
d.removeFromContainerList(c.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is sufficient, the code that can run concurrently is the container goroutines and the Stop()
function.
d.addToContainerList(c.ID, logReader) | ||
err = d.tailContainerLogs(c, logReader, acc) | ||
if err != nil { | ||
acc.AddError(fmt.Errorf("%s : %s ", ERR_PREFIX, err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do acc.AddError(err)
, the accumulator will add the prefix for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few more issues while looking into line splitting. I was hoping to get this into 1.11 but I think we need a bit more time to iron out the issues so I have moved it to the 1.12 milestone.
return nil | ||
} | ||
func pushTtyLogs(acc telegraf.Accumulator, tags map[string]string, fields map[string]interface{}, src io.Reader) (err error) { | ||
tags["logType"] = "unknown" //in tty mode we wont be able to differentiate b/w stdout and stderr hence unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename log_type
for { | ||
num, err := src.Read(data) | ||
if num > 0 { | ||
fields["message"] = data[1:num] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we start at 1
instead of data[0:num]
? In my testing of long lines the first character is cut off:
docker_log,container_image=quay.io/influxdb/telegraf-ci,container_name=cocky_shirley,container_version=1.12.5,host=loaner,stream=stdout container_id="4a09148e7188863b4d4274f8049e1c21ebcef582f94fe670d131930de6c6fc12",message="ONTAINER_PARTIAL_MESSAGE=true ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................" 1559630720000000000
acc.AddFields("docker_log", fields, tags) | ||
} | ||
if err == io.EOF { | ||
fields["message"] = data[1:num] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During testing I ran into a crash here, line numbers are a bit off because I had added some print statements, the panic actually occurred on line 273.
panic: runtime error: slice bounds out of range
goroutine 24 [running]:
github.com/influxdata/telegraf/plugins/inputs/docker_log.pushTtyLogs(0x2a03680, 0xc000140640, 0xc0005956b0, 0xc0005956e0, 0x29a5ae0, 0xc0000a6c80, 0x45248c0, 0x0)
/home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:276 +0x394
github.com/influxdata/telegraf/plugins/inputs/docker_log.(*DockerLogs).tailContainerLogs(0xc00048a380, 0xc000309400, 0x40, 0xc0007160c0, 0x1, 0x4, 0xc0003a02d0, 0x23, 0xc00030a5f0, 0x47, ...)
/home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:257 +0x5a1
github.com/influxdata/telegraf/plugins/inputs/docker_log.(*DockerLogs).containerListUpdate.func1(0xc00048a380, 0x2a03680, 0xc000140640, 0xc000309400, 0x40, 0xc0007160c0, 0x1, 0x4, 0xc0003a02d0, 0x23, ...)
/home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:209 +0x220
created by github.com/influxdata/telegraf/plugins/inputs/docker_log.(*DockerLogs).containerListUpdate
/home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:192 +0x2eb
Any possibility to include log data with Kubernetes metadata as well ? |
@sureshsubramaniam Not sure what you are referring to, can you add a link to the documentation? Kubernetes does add its annotations to the container as labels, so they would be collectible just like in the docker input. |
Merging the current pull request, and will follow up with a PR that addresses the remaining comments. |
Resolves #1483
This PR introduces a telegraf plugin to get docker logs . Sample output :
docker_log,containerId=168c940a98b4317de15e336140bf6caae009c1ea948226d7fac84c839ccf6e6d,containerName=loving_leavitt,host=prash-laptop log=" root@168c940a98b4:/# ls\r\n" 1538210547000000000 docker_log,containerId=168c940a98b4317de15e336140bf6caae009c1ea948226d7fac84c839ccf6e6d,containerName=loving_leavitt,host=prash-laptop log=" bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var\r\n" 1538210547000000000 docker_log,containerId=168c940a98b4317de15e336140bf6caae009c1ea948226d7fac84c839ccf6e6d,containerName=loving_leavitt,host=prash-laptop log=" root@168c940a98b4:/# pwd\r\n /\r\n" 1538210552000000000
The plugin is a Telegraf service input plugin which creates a new goroutine for every running container and listens for the log stream using the docker engine APIs . Whenever it gets data from the stream it pushes it out with tags.
Required for all PRs: