Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] aws-s3 drops data when files do not end with EOL #30436

Closed
andrewkroh opened this issue Feb 16, 2022 · 5 comments · Fixed by #33568
Closed

[Filebeat] aws-s3 drops data when files do not end with EOL #30436

andrewkroh opened this issue Feb 16, 2022 · 5 comments · Fixed by #33568
Labels
bug Filebeat Filebeat libbeat Team:Cloud-Monitoring Label for the Cloud Monitoring team v8.3.0

Comments

@andrewkroh
Copy link
Member

Filebeat aws-s3 input should return the line of a file even if it does not end in EOL. It should flush any remaining bytes when it reaches the EOF even if they don't end in an EOL terminator. If the final line in a file does not end in an EOL then that data is dropped / lost. This does not impact the aws-s3 input when reading JSON because it uses its own streaming JSON reader.

To read log files the inputs uses readfile.LineReader. It was designed for log files that can be appended to so it waits for the EOL before flushing the log line. But with S3 the data should be considered immutable and the reader should flush any buffered data after io.EOF is returned.

Failing Test Case

(Apply this with git apply test-case.patch.)

diff --git a/x-pack/filebeat/input/awss3/s3_objects_test.go b/x-pack/filebeat/input/awss3/s3_objects_test.go
index 4ab3edfaa4..375ed35c84 100644
--- a/x-pack/filebeat/input/awss3/s3_objects_test.go
+++ b/x-pack/filebeat/input/awss3/s3_objects_test.go
@@ -216,6 +216,10 @@ func TestS3ObjectProcessor(t *testing.T) {
                err := s3ObjProc.Create(ctx, logp.NewLogger(inputName), ack, s3Event).ProcessS3Object()
                require.NoError(t, err)
        })
+
+       t.Run("text file without end of line marker", func(t *testing.T) {
+               testProcessS3Object(t, "testdata/no-eol.txt", "text/plain", 1)
+       })
 }
 
 func testProcessS3Object(t testing.TB, file, contentType string, numEvents int, selectors ...fileSelectorConfig) []beat.Event {
diff --git a/x-pack/filebeat/input/awss3/testdata/no-eol.txt b/x-pack/filebeat/input/awss3/testdata/no-eol.txt
new file mode 100644
index 0000000000..0b7757db86
--- /dev/null
+++ b/x-pack/filebeat/input/awss3/testdata/no-eol.txt
@@ -0,0 +1 @@
+This file does contain a final EOL.
\ No newline at end of file
@andrewkroh andrewkroh added bug Filebeat Filebeat libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Feb 16, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@jlind23
Copy link
Collaborator

jlind23 commented Mar 23, 2022

@elastic/obs-cloud-monitoring it rathers seems to belong to your area. May I move it?

@kaiyan-sheng
Copy link
Contributor

@jlind23 I think @andrewkroh is suggesting this might be something to fix in libbeat 🤔

@cmacknz
Copy link
Member

cmacknz commented Mar 29, 2022

I had a look at this to see what is going on.

The root cause is that the aws-s3 input uses the readfile.LineReader function defined in libbeat which does not handle line endings in a way that is compatible with S3.

The fix would involve using a different package/function to read lines from S3, or modifying readfile.LineReader in a way to make it compatible with S3. This isn't a bug in the libbeat readfile package.

Whoever owns the S3 input should fix this.

@kaiyan-sheng
Copy link
Contributor

Thank you @cmacknz ! We will take a look to triage this one.

@kaiyan-sheng kaiyan-sheng added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Mar 30, 2022
@jlind23 jlind23 removed the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat libbeat Team:Cloud-Monitoring Label for the Cloud Monitoring team v8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants