Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector 0.18.1 s3 sink does not use batch.max_bytes, creates small files on S3 #10535

Closed
hvnsweeting opened this issue Dec 21, 2021 · 3 comments
Closed
Labels
sink: aws_s3 Anything `aws_s3` sink related type: bug A code related bug.

Comments

@hvnsweeting
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Vector Version

0.18.1

Vector Configuration File

[api]
enabled = true
[sources.generate_syslog]
type = "demo_logs"
## for 0.16
#type = "generator"
format = "syslog"
count = 100000000
interval = 0.000001

[transforms.remap_syslog]
inputs = [ "generate_syslog"]
type = "remap"
source = '''
  structured = parse_syslog!(.message)
  . = merge(., structured)
'''

[sinks.my_sink_id]
type = "aws_s3"
inputs = [ "remap_syslog" ]
bucket = "bucket-dev-logs"
key_prefix = "zzzz-service/dt=%Y-%m-%d-%H/"
compression = "gzip"
region = "us-east-1"
encoding.codec = "ndjson"

batch.max_bytes = 292132131

Debug Output

Expected Behavior

2021-12-21 08:12:25 33340416 1640074344-914149b8-8475-49d6-a406-1c63a82e8f2d.log.gz

Actual Behavior

2021-12-21 08:10:43 4310429 1640074242-bcd94281-e354-4008-921a-ad0d5ec45661.log.gz

Example Data

Additional Context

Using the same config file (a small edit noted above), vector 0.16.1 would create 10x larger file than 0.18.1, and after gunzip the file, the extracted size is close to the batch.max_bytes value. The behavior works as expected in 0.16.1 but seems broken from 0.17 onward.

References

@hvnsweeting hvnsweeting added the type: bug A code related bug. label Dec 21, 2021
@hvnsweeting hvnsweeting changed the title vector 0.18.1 s3 sink does not use batch,max_bytes, creates small files on S3 vector 0.18.1 s3 sink does not use batch.max_bytes, creates small files on S3 Dec 21, 2021
@jszwedko jszwedko added the sink: aws_s3 Anything `aws_s3` sink related label Dec 21, 2021
@jszwedko jszwedko added this to the Vector 0.20.0 milestone Dec 27, 2021
@jszwedko
Copy link
Member

Thanks @hvnsweeting ! We were aware the batching would be less optimal in the 0.17 release, but did not expect it to be by the magnitude you are seeing. We will investigate this and make it more optimal by 0.20.

@jszwedko
Copy link
Member

As a work-around you could increase the batch size here knowing that it will over compensate.

@jszwedko
Copy link
Member

jszwedko commented Aug 1, 2023

Rolling this into #10020

@jszwedko jszwedko closed this as completed Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: aws_s3 Anything `aws_s3` sink related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants