Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat] aws-s3 input falsely detects gzip file as a font #29968

Closed
andrewkroh opened this issue Jan 24, 2022 · 1 comment · Fixed by #29969
Closed

[filebeat] aws-s3 input falsely detects gzip file as a font #29968

andrewkroh opened this issue Jan 24, 2022 · 1 comment · Fixed by #29969
Labels
bug Filebeat Filebeat needs_team Indicates that the issue/PR needs a Team:* label

Comments

@andrewkroh
Copy link
Member

The aws-s3 input can fail to properly detect gzip files. We have a file that is falsely matched as application/vnd.ms-fontobject by https://pkg.go.dev/net/http#DetectContentType. I think we can replace the usage of that stdlib method with a more direct gzip magic number check. This would avoid the chance of falsely matching other signatures that are listed before gzip.

$ git diff s3_objects.go 
diff --git a/x-pack/filebeat/input/awss3/s3_objects.go b/x-pack/filebeat/input/awss3/s3_objects.go
index 7fe6b193fa..ebe1a5f082 100644
--- a/x-pack/filebeat/input/awss3/s3_objects.go
+++ b/x-pack/filebeat/input/awss3/s3_objects.go
@@ -15,7 +15,6 @@ import (
        "fmt"
        "io"
        "io/ioutil"
-       "net/http"
        "reflect"
        "strings"
        "time"
@@ -375,18 +374,13 @@ func s3ObjectHash(obj s3EventV2) string {
 // stream without consuming it. This makes it convenient for code executed after this function call
 // to consume the stream if it wants.
 func isStreamGzipped(r *bufio.Reader) (bool, error) {
-       // Why 512? See https://godoc.org/net/http#DetectContentType
-       buf, err := r.Peek(512)
+       buf, err := r.Peek(3)
        if err != nil && err != io.EOF {
                return false, err
        }
 
-       switch http.DetectContentType(buf) {
-       case "application/x-gzip", "application/zip":
-               return true, nil
-       default:
-               return false, nil
-       }
+       // gzip magic number (1f 8b) and the compression method (08 for DEFLATE).
+       return bytes.HasPrefix(buf, []byte{0x1F, 0x8B, 0x08}), nil
 }
 
 // s3Metadata returns a map containing the selected S3 object metadata keys.
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 24, 2022
@botelastic
Copy link

botelastic bot commented Jan 24, 2022

This issue doesn't have a Team:<team> label.

andrewkroh added a commit to andrewkroh/beats that referenced this issue Jan 24, 2022
Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes elastic#29968
andrewkroh added a commit that referenced this issue Jan 24, 2022
Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968
mergify bot pushed a commit that referenced this issue Jan 24, 2022
Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)
mergify bot pushed a commit that referenced this issue Jan 24, 2022
Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)
andrewkroh added a commit that referenced this issue Jan 24, 2022
…29975)

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)

Co-authored-by: Andrew Kroh <[email protected]>
andrewkroh added a commit that referenced this issue Jan 24, 2022
…29974)

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes #29968

(cherry picked from commit 61a7d36)

Co-authored-by: Andrew Kroh <[email protected]>
yashtewari pushed a commit to build-security/beats that referenced this issue Jan 30, 2022
…29969)

Directly check the byte stream for the gzip magic number and deflate
compression type. Avoid using http.DetectContentType because it returns
the first match it finds while checking many signatures.

Closes elastic#29968
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant