Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker log plugin #4773

Merged
merged 16 commits into from
Jun 18, 2019
Merged

Conversation

prashanthjbabu
Copy link
Contributor

@prashanthjbabu prashanthjbabu commented Sep 29, 2018

Resolves #1483

This PR introduces a telegraf plugin to get docker logs . Sample output :
docker_log,containerId=168c940a98b4317de15e336140bf6caae009c1ea948226d7fac84c839ccf6e6d,containerName=loving_leavitt,host=prash-laptop log=" root@168c940a98b4:/# ls\r\n" 1538210547000000000 docker_log,containerId=168c940a98b4317de15e336140bf6caae009c1ea948226d7fac84c839ccf6e6d,containerName=loving_leavitt,host=prash-laptop log=" bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var\r\n" 1538210547000000000 docker_log,containerId=168c940a98b4317de15e336140bf6caae009c1ea948226d7fac84c839ccf6e6d,containerName=loving_leavitt,host=prash-laptop log=" root@168c940a98b4:/# pwd\r\n /\r\n" 1538210552000000000
The plugin is a Telegraf service input plugin which creates a new goroutine for every running container and listens for the log stream using the docker engine APIs . Whenever it gets data from the stream it pushes it out with tags.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@prashanthjbabu prashanthjbabu mentioned this pull request Sep 29, 2018
3 tasks
@prashanthjbabu
Copy link
Contributor Author

This PR is an extension of #4360 !

Copy link
Contributor

@glinton glinton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -228,12 +231,13 @@ func (d *DockerLogs) getContainerLogs(
}
return err
}
if len(data) > 0 {
if num > 0 && len(data) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check num > 0 as it is equal to len(data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care of this !

@glinton glinton added this to the 1.9.0 milestone Oct 1, 2018
@suconghou
Copy link

How can I use this plugin for now ?? does Nightly Builds include this ?

@glinton
Copy link
Contributor

glinton commented Oct 11, 2018

You can build this branch, or wait until it gets merged into master then you can use the nightly. I expect this will be released in the coming weeks

@russorat russorat modified the milestones: 1.9.0, 1.10 Oct 29, 2018
@ytzelf
Copy link

ytzelf commented Nov 15, 2018

This has been pushed to 1.10?

@danielnelson danielnelson self-requested a review November 15, 2018 21:20
@Puneeth-n
Copy link

I am wondering how it will show up in chronograf. Will it be available via the log viewer?

@nedix
Copy link

nedix commented Nov 25, 2018

I merged the release-1.9 branch over this PR.
Had to rename [[inputs.docker_log]] to [[inputs.docker_logs]] for telegraf to fire up.
I expected logs to show up under the "docker_log" key but didn't get anything except an EOF notice. How to proceed?

@prashanthjbabu
Copy link
Contributor Author

Hi @nedix ,
Once you've configured the docker_log plugin , you'll need to have dockers running which throws output to stdout/stderr. A simple way to test this is to launch an alpine docker and run commands like "ls" ,"pwd" in its shell and you should see those outputs in the docker log plugin.

@@ -0,0 +1,63 @@
package docker_logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name this package docker_log

}

func init() {
inputs.Add("docker_logs", func() telegraf.Input {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set this to docker_log

@nedix
Copy link

nedix commented Nov 27, 2018

I'm definitely getting logs from Docker but they didn't show up in Telegraf. Disabling the Docker metrics plugin so as to eliminate possible interference between the two plugins didn't make a difference either.

@prashanthjbabu
Copy link
Contributor Author

@glinton I've made the changes you've requested.
@nedix I tried this again and it worked. I've attached my telegraf config file.
telegraf.txt

Here are some logs from telegraf:
`prash@prash-laptop:~/gocode/src/github.com/influxdata/telegraf (docker_log_plugin)$ ./telegraf --config ./telegraf.conf
2018-11-28T09:04:14Z I! Starting Telegraf
2018-11-28T09:04:14Z I! Loaded inputs: inputs.docker_log
2018-11-28T09:04:14Z I! Loaded aggregators:
2018-11-28T09:04:14Z I! Loaded processors:
2018-11-28T09:04:14Z I! Loaded outputs: file
2018-11-28T09:04:14Z I! Tags enabled: host=prash-laptop
2018-11-28T09:04:14Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"prash-laptop", Flush Interval:10s
docker_log,containerId=bd9f52f9e407befa4180327234c280c6dbe5602e437397bd8ec39677b64671d1,containerName=sharp_curie,host=prash-laptop log=" / # ls\r\n" 1543395884000000000
docker_log,containerId=bd9f52f9e407befa4180327234c280c6dbe5602e437397bd8ec39677b64671d1,containerName=sharp_curie,host=prash-laptop log=" bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var\r\n" 1543395884000000000
^[[11;143Rdocker_log,containerId=bd9f52f9e407befa4180327234c280c6dbe5602e437397bd8ec39677b64671d1,containerName=sharp_curie,host=prash-laptop log=" / # uptime\r\n" 1543395893000000000
docker_log,containerId=bd9f52f9e407befa4180327234c280c6dbe5602e437397bd8ec39677b64671d1,containerName=sharp_curie,host=prash-laptop log=" 09:04:52 up 18 days, 3:49, load average: 0.22, 0.34, 0.64\r\n" 1543395893000000000
^[[14;153R

docker_log,containerId=bd9f52f9e407befa4180327234c280c6dbe5602e437397bd8ec39677b64671d1,containerName=sharp_curie,host=prash-laptop log=" / # who\r\n /bin/sh: who: not found\r\n" 1543395928000000000
`

Docker logs :
prash@prash-laptop:~$ docker run -ti alpine / # ls bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var / # pwd / / # ls bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var / # uptime 08:55:24 up 18 days, 3:40, load average: 0.42, 0.81, 0.97 / # ls bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var / # ls bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var / # uptime 09:04:52 up 18 days, 3:49, load average: 0.22, 0.34, 0.64 / # who /bin/sh: who: not found / #

@nedix
Copy link

nedix commented Dec 2, 2018

Alright! I'm getting docker_log on stdout. Could someone using InfluxDB verify the logs show up there as I'm not seeing anything in Prometheus yet.

For anyone looking to test with stdout, add a tagdrop filter so Telegraf doesn't loop it's own output.

[[inputs.docker_log]]
  [inputs.docker_log.tagdrop]
    "com.docker.compose.service" = [ "telegraf-service-name" ]

[[outputs.file]]
  files = ["stdout"]

Should we switch to snake_case like the other docker plugin?

@prashanthjbabu
Copy link
Contributor Author

I tested this with influx. "docker_log" gets created and logs come in.

> select log from docker_log order by time desc limit 1;
name: docker_log
time                log
----                ---
1543814531000000000 ��T ts=2018-12-03T05:22:11.200Z lvl=info msg="http request" service=http host=172.18.0.3 username=- start=2018-12-03T05:22:11.195868599Z method=POST uri=/write?consistency=&db=_internal&precision=ns&rp=monitor protocol=HTTP/1.1 status=204 referer=- user-agent=InfluxDBClient request-id=62d0a6ab-f6bb-11e8-9ee8-000000000000 duration=4.203413ms\n
> 

@nedix
Copy link

nedix commented Dec 4, 2018

Must be Prometheus related then. All good in that case!

Copy link

@fntlnz fntlnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @prashanthjbabu

plugins/inputs/docker_log/README.md Show resolved Hide resolved
@suconghou
Copy link

unexpected truncate in some lines

> select log from docker_log limit 10;
name: docker_log
time                log
----                ---
1547441987000000000 t 2019-01-14 12:59:47.047 DEBUG 1 --- [nio-8080-exec-7] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547441987000000000 � 2019-01-14 12:59:47.209 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : ==>  Preparing: SELECT id,type,type_name FROM activity_type \ns 2019-01-14 12:59:47.209 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : ==> Parameters: \nu 2019-01-14 12:59:47.210 DEBUG 1 --- [io-8080-exec-10] c.o.t.m.ActivityMapper.selectByParam     : <==      Total: 10\nu 2019-01-14 12:59:47.210 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : <==      Total: 19\n
1547442000000000000 � 2019-01-14 13:00:00.028  INFO 1 --- [pool-3-thread-1] c.o.t.s.i.ActivityVideoLiveServiceImpl   : ===========================视频计费统计任务:end=================================\n
1547442007000000000 t 2019-01-14 13:00:07.298 DEBUG 1 --- [io-8080-exec-22] c.o.t.m.UserMapper.selectByWechatNum     : <==      Total: 2\n� 2019-01-14 13:00:07.298  INFO 1 --- [io-8080-exec-22] c.o.tuwenzb.controller.UserController    : 已有绑定微信用户:oItvJju9-lQVwLIeXUBfJ0hDWfYA,存入session\n
1547442008000000000 t 2019-01-14 13:00:07.511 DEBUG 1 --- [nio-8080-exec-9] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547442008000000000 dPacketStatus":0,"redPacketViewStatus":0,"type":0,"userId":0,"verifyCode":"","ctime":1547442008000}}\nA 2019-01-14 13:00:08.392  INFO 1 --- [skExecutor-2042] c.o.t.service.impl.WebSocketServiceImpl  : ***************sendSuccessful:{"event":"beta-activity:5326","businessType":"STATISTICS","operateType":"UPDATE","data":{"activityId":5326,"commentNum":241,"contentNum":0,"id":3243,"inviteNum":0,"likeNum":0,"newContentNum":0,"onlineNum":7,"participantsNum":2996,"participantsType":1,"perParticipantsNum":"1,1","photoNum":45,"publishPhotoNum":1,"realCommentNum":2,"realLikeNum":0,"realParticipantsNum":216,"realShareNum":0,"regionStatist":{"北京":62},"rewardNum":0,"shareNum":0}}\n
1547442009000000000 t 2019-01-14 13:00:08.684 DEBUG 1 --- [nio-8080-exec-1] c.o.t.m.A.getByActivityId                : <==      Total: 1\n
1547442025000000000 t 2019-01-14 13:00:24.814 DEBUG 1 --- [nio-8080-exec-2] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547442025000000000 t 2019-01-14 13:00:24.825 DEBUG 1 --- [nio-8080-exec-5] c.o.t.mapper.UserMapper.selectAgencyPin  : <==      Total: 0\n
1547442026000000000 t 2019-01-14 13:00:26.117 DEBUG 1 --- [nio-8080-exec-3] c.o.t.m.A.getByActivityId                : <==      Total: 1\nt 2019-01-14 13:00:26.117 DEBUG 1 --- [nio-8080-exec-1] c.o.t.mapper.InteractMapper.findAll      : <==      Total: 2\n
> 




my logs every line begins with 2019-01-14 , but stored logs in influxdb not every line begins with 2019-01-14 , what's worse, some logs are lost. why?

@danielnelson danielnelson removed this from the 1.10.0 milestone Feb 4, 2019
@ytzelf
Copy link

ytzelf commented Feb 13, 2019

That's been pulled back from 1.10 :( ?

@glinton
Copy link
Contributor

glinton commented Feb 13, 2019

There are apparently still errors that need worked out that we don't have time to address before the 1.10 RC.

unexpected truncate in some lines

> select log from docker_log limit 10;
name: docker_log
time                log
----                ---
1547441987000000000 t 2019-01-14 12:59:47.047 DEBUG 1 --- [nio-8080-exec-7] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547441987000000000 � 2019-01-14 12:59:47.209 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : ==>  Preparing: SELECT id,type,type_name FROM activity_type \ns 2019-01-14 12:59:47.209 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : ==> Parameters: \nu 2019-01-14 12:59:47.210 DEBUG 1 --- [io-8080-exec-10] c.o.t.m.ActivityMapper.selectByParam     : <==      Total: 10\nu 2019-01-14 12:59:47.210 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : <==      Total: 19\n
1547442000000000000 � 2019-01-14 13:00:00.028  INFO 1 --- [pool-3-thread-1] c.o.t.s.i.ActivityVideoLiveServiceImpl   : ===========================视频计费统计任务:end=================================\n
1547442007000000000 t 2019-01-14 13:00:07.298 DEBUG 1 --- [io-8080-exec-22] c.o.t.m.UserMapper.selectByWechatNum     : <==      Total: 2\n� 2019-01-14 13:00:07.298  INFO 1 --- [io-8080-exec-22] c.o.tuwenzb.controller.UserController    : 已有绑定微信用户:oItvJju9-lQVwLIeXUBfJ0hDWfYA,存入session\n
1547442008000000000 t 2019-01-14 13:00:07.511 DEBUG 1 --- [nio-8080-exec-9] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547442008000000000 dPacketStatus":0,"redPacketViewStatus":0,"type":0,"userId":0,"verifyCode":"","ctime":1547442008000}}\nA 2019-01-14 13:00:08.392  INFO 1 --- [skExecutor-2042] c.o.t.service.impl.WebSocketServiceImpl  : ***************sendSuccessful:{"event":"beta-activity:5326","businessType":"STATISTICS","operateType":"UPDATE","data":{"activityId":5326,"commentNum":241,"contentNum":0,"id":3243,"inviteNum":0,"likeNum":0,"newContentNum":0,"onlineNum":7,"participantsNum":2996,"participantsType":1,"perParticipantsNum":"1,1","photoNum":45,"publishPhotoNum":1,"realCommentNum":2,"realLikeNum":0,"realParticipantsNum":216,"realShareNum":0,"regionStatist":{"北京":62},"rewardNum":0,"shareNum":0}}\n
1547442009000000000 t 2019-01-14 13:00:08.684 DEBUG 1 --- [nio-8080-exec-1] c.o.t.m.A.getByActivityId                : <==      Total: 1\n
1547442025000000000 t 2019-01-14 13:00:24.814 DEBUG 1 --- [nio-8080-exec-2] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547442025000000000 t 2019-01-14 13:00:24.825 DEBUG 1 --- [nio-8080-exec-5] c.o.t.mapper.UserMapper.selectAgencyPin  : <==      Total: 0\n
1547442026000000000 t 2019-01-14 13:00:26.117 DEBUG 1 --- [nio-8080-exec-3] c.o.t.m.A.getByActivityId                : <==      Total: 1\nt 2019-01-14 13:00:26.117 DEBUG 1 --- [nio-8080-exec-1] c.o.t.mapper.InteractMapper.findAll      : <==      Total: 2\n
> 

my logs every line begins with 2019-01-14 , but stored logs in influxdb not every line begins with 2019-01-14 , what's worse, some logs are lost. why?

@prashanthjbabu
Copy link
Contributor Author

unexpected truncate in some lines

> select log from docker_log limit 10;
name: docker_log
time                log
----                ---
1547441987000000000 t 2019-01-14 12:59:47.047 DEBUG 1 --- [nio-8080-exec-7] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547441987000000000 � 2019-01-14 12:59:47.209 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : ==>  Preparing: SELECT id,type,type_name FROM activity_type \ns 2019-01-14 12:59:47.209 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : ==> Parameters: \nu 2019-01-14 12:59:47.210 DEBUG 1 --- [io-8080-exec-10] c.o.t.m.ActivityMapper.selectByParam     : <==      Total: 10\nu 2019-01-14 12:59:47.210 DEBUG 1 --- [nio-8080-exec-4] c.o.t.mapper.ActivityMapper.getAllType   : <==      Total: 19\n
1547442000000000000 � 2019-01-14 13:00:00.028  INFO 1 --- [pool-3-thread-1] c.o.t.s.i.ActivityVideoLiveServiceImpl   : ===========================视频计费统计任务:end=================================\n
1547442007000000000 t 2019-01-14 13:00:07.298 DEBUG 1 --- [io-8080-exec-22] c.o.t.m.UserMapper.selectByWechatNum     : <==      Total: 2\n� 2019-01-14 13:00:07.298  INFO 1 --- [io-8080-exec-22] c.o.tuwenzb.controller.UserController    : 已有绑定微信用户:oItvJju9-lQVwLIeXUBfJ0hDWfYA,存入session\n
1547442008000000000 t 2019-01-14 13:00:07.511 DEBUG 1 --- [nio-8080-exec-9] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547442008000000000 dPacketStatus":0,"redPacketViewStatus":0,"type":0,"userId":0,"verifyCode":"","ctime":1547442008000}}\nA 2019-01-14 13:00:08.392  INFO 1 --- [skExecutor-2042] c.o.t.service.impl.WebSocketServiceImpl  : ***************sendSuccessful:{"event":"beta-activity:5326","businessType":"STATISTICS","operateType":"UPDATE","data":{"activityId":5326,"commentNum":241,"contentNum":0,"id":3243,"inviteNum":0,"likeNum":0,"newContentNum":0,"onlineNum":7,"participantsNum":2996,"participantsType":1,"perParticipantsNum":"1,1","photoNum":45,"publishPhotoNum":1,"realCommentNum":2,"realLikeNum":0,"realParticipantsNum":216,"realShareNum":0,"regionStatist":{"北京":62},"rewardNum":0,"shareNum":0}}\n
1547442009000000000 t 2019-01-14 13:00:08.684 DEBUG 1 --- [nio-8080-exec-1] c.o.t.m.A.getByActivityId                : <==      Total: 1\n
1547442025000000000 t 2019-01-14 13:00:24.814 DEBUG 1 --- [nio-8080-exec-2] c.o.w.mapper.UserMapper.getUserByUUID    : <==      Total: 1\n
1547442025000000000 t 2019-01-14 13:00:24.825 DEBUG 1 --- [nio-8080-exec-5] c.o.t.mapper.UserMapper.selectAgencyPin  : <==      Total: 0\n
1547442026000000000 t 2019-01-14 13:00:26.117 DEBUG 1 --- [nio-8080-exec-3] c.o.t.m.A.getByActivityId                : <==      Total: 1\nt 2019-01-14 13:00:26.117 DEBUG 1 --- [nio-8080-exec-1] c.o.t.mapper.InteractMapper.findAll      : <==      Total: 2\n
> 

my logs every line begins with 2019-01-14 , but stored logs in influxdb not every line begins with 2019-01-14 , what's worse, some logs are lost. why?

@suconghou
Thanks for trying this out! Unfortunately by design , every new line of your logs will not match with one influx entry . This is because the docker API for getting container logs returns a ReadCloser interface (https://godoc.org/github.com/docker/docker/client#Client.ContainerLogs) . This is a stream ,therefore this plugin would read from this stream upto a certain MAX buffer and then flush out an entry . Considering its a stream , I dont think there's any way to ensure that each line you throw to stdout/stderr can map to one docker_log entry (suggestions are welcome!).
Regarding your second issue on missing logs. It seems from your logs that your telegraf "precision" parameter is in seconds(There seems to be overlapping times 1547441987000000000) . This can cause log entries per container to overwrite since the tags will be the same per container. Kindly use a nanosecond precision in telegraf config file file to ensure that you get all logs. This could be a possible reason although I may have missed something else!

plugins/inputs/docker_log/README.md Outdated Show resolved Hide resolved
plugins/inputs/docker_log/client.go Outdated Show resolved Hide resolved
plugins/inputs/docker_log/docker_logs.go Outdated Show resolved Hide resolved
fields := map[string]interface{}{}
data := make([]byte, LOG_BYTES_MAX)
for {
num, err := logReader.Read(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of docker log's 8 byte message header when storing this in the field. Ensure it's long enough, then assign data[9:num]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that lines longer than LOG_BYTES_MAX will not have this prefix. I guess we should be reading that log header and consuming the length sent..

They use github.com/docker/docker/stdcopy to do so. Perhaps you could look at using that or implementing a similar solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out ! I took most of the code from stdcopy and ensured that we're writing to the telegraf Accumulator instead of an io.Writer . Let me know if this approach works!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm not sure why CI/CD is failing . I'm getting errors at some other location . Would you be able to help me out with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this for a while and still get a space before the log message (different behavior than docker cli) I wonder if it needs to be:

fields["log"] = buf[stdWriterPrefixLen+1 : frameSize+stdWriterPrefixLen]

Another issue I found is when running a docker container with -t, the stdcopy isn't valid and produces:

E! [inputs.docker_log] : Unrecognized input header: xx

The cli selectively uses stdcopy to strip the header as seen here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the tests, I believe if you merge master into your branch, they should succeed. Also, please refrain from force pushing as there is an ongoing review 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this for a while and still get a space before the log message (different behavior than docker cli) I wonder if it needs to be:

fields["log"] = buf[stdWriterPrefixLen+1 : frameSize+stdWriterPrefixLen]

Another issue I found is when running a docker container with -t, the stdcopy isn't valid and produces:

E! [inputs.docker_log] : Unrecognized input header: xx

The cli selectively uses stdcopy to strip the header as seen here

Fixed the -t issue . I added a pushTtyLogs function based on what containerInspect returns.I noticed that the extra space in the front is also present in Tty mode . I think in both cases after we read , we probably should skip the first character of the string read from the logReader . Although I'm not sure if thats the right approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the first byte in the logreader is used to determine the log type, so dropping it wouldn't be ideal. It seems like it would be ok to skip one more byte on buf before assigning it to fields["log"], though we should double check it works well with extra long logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in the latest commit . I tested this as well and it seems to be working fine . Let me know if this is okay!

plugins/inputs/docker_log/docker_logs.go Show resolved Hide resolved
…re first character of string to avoid whitespace in the beginning
plugins/inputs/docker_log/client.go Outdated Show resolved Hide resolved
plugins/inputs/docker_log/docker_logs.go Outdated Show resolved Hide resolved
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience, here is my review.

We should also think about what tests are needed, the functions creating the filters and whatnot I don't think need tests, but it would be nice to have some tests for the stream parsing.

- docker_log
- tags:
- containerId
- containerName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make all fields and tags snake_case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care of this!

- containerId
- containerName
- fields:
- log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call message to create more consistency with syslog input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
containers, err := d.client.ContainerList(ctx, d.opts)
if err != nil {
log.Printf("%s : %s ", ERR_PREFIX, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are returning the error no need to log here. Make this change throughout pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

defer d.wg.Done()
err := d.getContainerLogs(c, acc)
if err != nil {
log.Printf("%s : %s ", ERR_PREFIX, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use acc.AddError here to log the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


func (d *DockerLogs) Start(acc telegraf.Accumulator) error {
d.mu.Lock()
defer d.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to lock here, Start will not be called concurrently with any other functions.

I see a few issues with locking, I'll add comments but here is how you can expect your plugin to be called, these functions will never be called concurrently.

  • Start(): called once
  • Gather(): called one or more times if Start does not return an error
  • Stop(): called once if Start does not return an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed lock for Start()


The docker plugin uses the [Official Docker Client](https://github.com/moby/moby/tree/master/client)
to gather logs from the [Engine API](https://docs.docker.com/engine/api/v1.24/).
Note: This plugin works only for containers with the `json-file` or `journald` logging driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this also works with local? I also wonder if Docker Enterprise would support this using "dual logging"? @fntlnz Do you know or have a guess?

https://docs.docker.com/config/containers/logging/configure/#limitations-of-logging-drivers

Copy link
Contributor Author

@prashanthjbabu prashanthjbabu May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added local to readme . Not sure about the dual logging aspects of it.

return nil
}

func (d *DockerLogs) getContainerLogs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this something like readContainerLogs or tailContainerLogs to indicate that it is a long running function and not an immediately returning call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this to tailContainerLogs

Tail: "0",
}
logReader, err := d.client.ContainerLogs(context.Background(), container.ID, logOptions)
d.addToContainerList(container.ID, logReader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this into the containerListUpdate where the rest of the related functions are used. Then we will have add/remove/in functions all in one place.

Must lock d.containerList for this action, since it is called within a goroutine that is not waited for within Gather.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Moved addToContainerList to containerListUpdate and passed logReader into tailContainerLogs.
  2. addToContainerList already has a mutex lock for d.containerList . Is there anything else I should be adding?

if err != nil {
log.Printf("%s : %s ", ERR_PREFIX, err.Error())
}
d.removeFromContainerList(c.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must lock d.containerList because this is called inside a goroutine that is not waited for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added d.mu.Lock and unlock inside the removeFromContainerList . Is there anything I should be adding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sufficient, the code that can run concurrently is the container goroutines and the Stop() function.

d.mu.Lock()
defer d.mu.Unlock()
d.stopAllReaders()
d.wg.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can deadlock here because we don't release the lock until all the goroutines complete. Once we add the required locking as suggested in my other comments, the goroutines will need to wait for this lock so they can remove themselves from the containerList.

Fix is easy though, just release the lock after calling stopAllReaders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved d.mu.Unlock to after stopAllReaders()

fields := map[string]interface{}{}
data := make([]byte, LOG_BYTES_MAX)
for {
num, err := src.Read(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to wrap it with bufio.Scanner then.

}
}

/* Inspired from https://github.com/moby/moby/blob/master/pkg/stdcopy/stdcopy.go */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it for now.

if err != nil {
log.Printf("%s : %s ", ERR_PREFIX, err.Error())
}
d.removeFromContainerList(c.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sufficient, the code that can run concurrently is the container goroutines and the Stop() function.

d.addToContainerList(c.ID, logReader)
err = d.tailContainerLogs(c, logReader, acc)
if err != nil {
acc.AddError(fmt.Errorf("%s : %s ", ERR_PREFIX, err.Error()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do acc.AddError(err), the accumulator will add the prefix for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@danielnelson danielnelson modified the milestones: 1.11.0, 1.12.0 Jun 5, 2019
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few more issues while looking into line splitting. I was hoping to get this into 1.11 but I think we need a bit more time to iron out the issues so I have moved it to the 1.12 milestone.

return nil
}
func pushTtyLogs(acc telegraf.Accumulator, tags map[string]string, fields map[string]interface{}, src io.Reader) (err error) {
tags["logType"] = "unknown" //in tty mode we wont be able to differentiate b/w stdout and stderr hence unknown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename log_type

for {
num, err := src.Read(data)
if num > 0 {
fields["message"] = data[1:num]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we start at 1 instead of data[0:num]? In my testing of long lines the first character is cut off:

docker_log,container_image=quay.io/influxdb/telegraf-ci,container_name=cocky_shirley,container_version=1.12.5,host=loaner,stream=stdout container_id="4a09148e7188863b4d4274f8049e1c21ebcef582f94fe670d131930de6c6fc12",message="ONTAINER_PARTIAL_MESSAGE=true ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................" 1559630720000000000

acc.AddFields("docker_log", fields, tags)
}
if err == io.EOF {
fields["message"] = data[1:num]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During testing I ran into a crash here, line numbers are a bit off because I had added some print statements, the panic actually occurred on line 273.

panic: runtime error: slice bounds out of range

goroutine 24 [running]:
github.com/influxdata/telegraf/plugins/inputs/docker_log.pushTtyLogs(0x2a03680, 0xc000140640, 0xc0005956b0, 0xc0005956e0, 0x29a5ae0, 0xc0000a6c80, 0x45248c0, 0x0)
        /home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:276 +0x394
github.com/influxdata/telegraf/plugins/inputs/docker_log.(*DockerLogs).tailContainerLogs(0xc00048a380, 0xc000309400, 0x40, 0xc0007160c0, 0x1, 0x4, 0xc0003a02d0, 0x23, 0xc00030a5f0, 0x47, ...)
        /home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:257 +0x5a1
github.com/influxdata/telegraf/plugins/inputs/docker_log.(*DockerLogs).containerListUpdate.func1(0xc00048a380, 0x2a03680, 0xc000140640, 0xc000309400, 0x40, 0xc0007160c0, 0x1, 0x4, 0xc0003a02d0, 0x23, ...)
        /home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:209 +0x220
created by github.com/influxdata/telegraf/plugins/inputs/docker_log.(*DockerLogs).containerListUpdate
        /home/dbn/src/influx/src/github.com/influxdata/telegraf/plugins/inputs/docker_log/docker_logs.go:192 +0x2eb

@sureshsubramaniam
Copy link

Any possibility to include log data with Kubernetes metadata as well ?

@danielnelson
Copy link
Contributor

@sureshsubramaniam Not sure what you are referring to, can you add a link to the documentation? Kubernetes does add its annotations to the container as labels, so they would be collectible just like in the docker input.

@danielnelson
Copy link
Contributor

Merging the current pull request, and will follow up with a PR that addresses the remaining comments.

@danielnelson danielnelson merged commit f8bef14 into influxdata:master Jun 18, 2019
bitcharmer pushed a commit to bitcharmer/telegraf that referenced this pull request Oct 18, 2019
athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020
idohalevi pushed a commit to idohalevi/telegraf that referenced this pull request Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker log input plugin
10 participants