-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(sources): Allow line aggregation to never timeout #4951
Conversation
I broke this off from the AWS S3 source PR as it seemed like it could be reviewed separately. In the context of AWS S3, and probably other sources, I think it is useful to have the line aggregation logic never timeout and aggregate what it has; instead just blocking until it has a full line. This change makes `timeout_ms` on the multiline config optional, and defaults to never timing out. Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good but it makes me a little nervous to change the default behavior to no timeout. That means that with misconfigured patterns vector would accumulate lines in memory indefinitely, which could be confusing to diagnose.
I have the same concerns. As I understand, in the config we can's do |
Signed-off-by: Jesse Szwedko <[email protected]>
Yeah, unfortunately TOML doesn't have a concept of "null": toml-lang/toml#30. It's a little verbose, but I pushed a commit with a custom (de)serializer that will handle integers or the string This allows configs like: data_dir = "/tmp/vector"
[sources.my_source]
type = "file"
include = ["/var/log/*.log"] # required
multiline.condition_pattern = "^[\\s]+" # required
multiline.mode = "continue_through" # required
multiline.start_pattern = "^[^\\s]" # required
multiline.timeout_ms = "none"
[sinks.my_sink]
type = "console"
inputs = ["my_source"]
encoding.codec = "json" I originally tried to implement this generically to let it be used with We could maybe macro-ize it if we find us reaching for this pattern a lot. |
After some more thought and discussion with @binarylogic , I've come around to the idea that not having a timeout for multiline is probably going to cause more user issues than it solves. We can always resurrect this if we find a need. Thanks for reviewing both! |
I broke this off from the AWS S3 source PR as it seemed like it could be
reviewed separately. In the context of AWS S3, and probably other
sources, I think it is useful to have the line aggregation logic never
timeout and aggregate what it has; instead just blocking until it has
a full "line".
This change makes
timeout_ms
on the multiline config optional, anddefaults to never timing out.
Signed-off-by: Jesse Szwedko [email protected]