Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_cloudwatch seems to have an integer overflow problem with timestamps on ARMv7 #3640

Closed
nbertram opened this issue Jun 15, 2021 · 4 comments · Fixed by #3648
Closed

out_cloudwatch seems to have an integer overflow problem with timestamps on ARMv7 #3640

nbertram opened this issue Jun 15, 2021 · 4 comments · Fixed by #3648
Assignees
Labels
AWS Issues with AWS plugins or experienced by users running on AWS

Comments

@nbertram
Copy link
Contributor

Bug Report

Describe the bug
I'm using a custom build (off master) of fluent-bit on armhf (ARMv7, 32-bit) and am unable to ship logs to Cloudwatch. I have checked the parsing with the stdout output and note the timestamp is correctly parsed, but if outputting to Cloudwatch all the lines are rejected as "too old".

After sticking mitmproxy into the mix, I can see the message to Cloudwatch is something like this:

{"logGroupName":"test","logStreamName":"test","logEvents":[{"timestamp":252251535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252252535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252253535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252254535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252255535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252256535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252257535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252258535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252259535,"message":"{\\"message\\":\\"dummy\\"}"},{"timestamp":252260535,"message":"{\\"message\\":\\"dummy\\"}"}]}

which generates this error:

[2021/06/15 09:39:28] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Could not find sequence token in response: {"rejectedLogEventsInfo":{"tooOldLogEventEndIndex":10}}

The timestamp in the JSON, 252251535, falls in late Jan 1970 when taken as millis, so AWS' beef is valid.

As a wild guess, and with a bit of experimentation, it does appear that timestamps around 252251535 are a perfect 32 bit int overflow from current time, when you perform the multiplication by 1000, probably here:

event->timestamp = (unsigned long long) (tms->tm.tv_sec * 1000 +

To Reproduce

  • Example log message if applicable:
[2021/06/15 09:39:28] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200
[2021/06/15 09:39:28] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to test
[2021/06/15 09:39:28] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Could not find sequence token in response: {"rejectedLogEventsInfo":{"tooOldLogEventEndIndex":10}}
[2021/06/15 09:39:28] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 10 events to CloudWatch
[2021/06/15 09:39:28] [debug] [out coro] cb_destroy coro_id=7
[2021/06/15 09:39:28] [debug] [task] destroy task=0xf6c34c60 (task_id=0)
[2021/06/15 09:39:38] [debug] [task] created task=0xf6c34c60 id=0 OK
  • Steps to reproduce the problem:

On a 32-bit build of fluent-bit, try a config like this:

[INPUT]
    name dummy
    tag  dummy

[OUTPUT]
    name  cloudwatch_logs
    match *
    region ap-southeast-2
    log_group_name test
    log_stream_name test
    auto_create_group Off

Expected behavior

Log entries successfully delivered to Cloudwatch.

Screenshots
N/A

Your Environment

  • Version used: v1.8.0 (built off master yesterday)
  • Configuration: as above
  • Environment name and version (e.g. Kubernetes? What version?): Docker container on Balena
  • Server type and version: Debian Buster, armhf arch
  • Operating System and version: balenaOS 2.73.1+rev1
  • Filters and plugins: None

Additional context
Tried to use the stock debs for Raspbian Buster armhf, but they caused a "Bus Error" bail on non-Raspbian, so I've built it myself. If I've built it missing flags for 32 bit compatibility, then I profusely apologise!

I'm not sure if fluent-bit actually aims for 32 bit compatibility, though because there are stock builds for ARMv7 I presumed it should work.

@PettitWesley
Copy link
Contributor

Thanks for this detailed report!

@PettitWesley PettitWesley self-assigned this Jun 15, 2021
@PettitWesley PettitWesley added the AWS Issues with AWS plugins or experienced by users running on AWS label Jun 15, 2021
@nbertram
Copy link
Contributor Author

No problem! We're all developers here...

With a bit of experimentation, it seems this fixes it:

event->timestamp = (unsigned long long) (tms->tm.tv_sec * 1000ull + 

though I'm not sure how precious you are about relying on C99 syntax. I guess the same kind of compiler hint could be had by casting the literal (or tv_sec), though that may depend on the compiler. Certainly I suspect GCC is your target for all ARMv7 builds at the moment.

Of course there may be other places in fluent-bit where timestamps are mishandled with the timespec values being 32 bits. A casual grep didn't find any similar multiplications in other drivers. Ultimately I guess running the test suite on ARMv7 may help discover these. I wasn't sure whether the test suite would catch this bug as it stands. I haven't tried yet.

I'm happy to work on a PR for this with a bit of guidance if you like.

@PettitWesley
Copy link
Contributor

PettitWesley commented Jun 15, 2021

@nbertram Please do open a PR. Eduardo can help with/comment on the right way to do this in the PR. Unfortunately cross platform/compiler support is a little bit outside of my area of expertise.

There is CI set up on pull requests against the master branch, which tests with a bunch of different compilers and architectures IIRC. If your suggested change passes that, and Eduardo's review, then we're good to go. I will make sure he sees this PR.

The contributing guide says that in general we follow these recommendations, not sure if any of that applies here: https://httpd.apache.org/dev/styleguide.html

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Issues with AWS plugins or experienced by users running on AWS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants