-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] aws-s3 drops data when files do not end with EOL #30436
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@elastic/obs-cloud-monitoring it rathers seems to belong to your area. May I move it? |
@jlind23 I think @andrewkroh is suggesting this might be something to fix in libbeat 🤔 |
I had a look at this to see what is going on. The root cause is that the aws-s3 input uses the readfile.LineReader function defined in libbeat which does not handle line endings in a way that is compatible with S3. The fix would involve using a different package/function to read lines from S3, or modifying Whoever owns the S3 input should fix this. |
Thank you @cmacknz ! We will take a look to triage this one. |
Filebeat aws-s3 input should return the line of a file even if it does not end in EOL. It should flush any remaining bytes when it reaches the EOF even if they don't end in an EOL terminator. If the final line in a file does not end in an EOL then that data is dropped / lost. This does not impact the aws-s3 input when reading JSON because it uses its own streaming JSON reader.
To read log files the inputs uses readfile.LineReader. It was designed for log files that can be appended to so it waits for the EOL before flushing the log line. But with S3 the data should be considered immutable and the reader should flush any buffered data after
io.EOF
is returned.Failing Test Case
(Apply this with
git apply test-case.patch
.)The text was updated successfully, but these errors were encountered: