Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(sources): Allow line aggregation to never timeout #4951

Closed
wants to merge 7 commits into from

Conversation

jszwedko
Copy link
Member

I broke this off from the AWS S3 source PR as it seemed like it could be
reviewed separately. In the context of AWS S3, and probably other
sources, I think it is useful to have the line aggregation logic never
timeout and aggregate what it has; instead just blocking until it has
a full "line".

This change makes timeout_ms on the multiline config optional, and
defaults to never timing out.

Signed-off-by: Jesse Szwedko [email protected]

I broke this off from the AWS S3 source PR as it seemed like it could be
reviewed separately. In the context of AWS S3, and probably other
sources, I think it is useful to have the line aggregation logic never
timeout and aggregate what it has; instead just blocking until it has
a full line.

This change makes `timeout_ms` on the multiline config optional, and
defaults to never timing out.

Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
@jszwedko jszwedko requested a review from fanatid as a code owner November 10, 2020 19:35
Signed-off-by: Jesse Szwedko <[email protected]>
Copy link
Member

@lukesteensen lukesteensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good but it makes me a little nervous to change the default behavior to no timeout. That means that with misconfigured patterns vector would accumulate lines in memory indefinitely, which could be confusing to diagnose.

@MOZGIII
Copy link
Contributor

MOZGIII commented Nov 11, 2020

The implementation looks good but it makes me a little nervous to change the default behavior to no timeout. That means that with misconfigured patterns vector would accumulate lines in memory indefinitely, which could be confusing to diagnose.

I have the same concerns. As I understand, in the config we can's do option = null, so we can't allow users to set Option::None is we have an Option::Some default, is this right?

@jszwedko
Copy link
Member Author

I have the same concerns. As I understand, in the config we can's do option = null, so we can't allow users to set Option::None is we have an Option::Some default, is this right?

Yeah, unfortunately TOML doesn't have a concept of "null": toml-lang/toml#30.

It's a little verbose, but I pushed a commit with a custom (de)serializer that will handle integers or the string "none" to disable the timeout. I also added a default of 1000ms.

This allows configs like:

data_dir = "/tmp/vector"

[sources.my_source]
  type = "file"
  include = ["/var/log/*.log"] # required

  multiline.condition_pattern = "^[\\s]+" # required
  multiline.mode = "continue_through" # required
  multiline.start_pattern = "^[^\\s]" # required
  multiline.timeout_ms = "none"

[sinks.my_sink]
  type = "console"
  inputs = ["my_source"]
  encoding.codec = "json"

I originally tried to implement this generically to let it be used with Option<T: Deserialize + Serialize>, but I couldn't get that working (see conversation starting here: https://discord.com/channels/742820443487993987/751107681217019944/776138635262558249).

We could maybe macro-ize it if we find us reaching for this pattern a lot.

@jszwedko
Copy link
Member Author

After some more thought and discussion with @binarylogic , I've come around to the idea that not having a timeout for multiline is probably going to cause more user issues than it solves. We can always resurrect this if we find a need.

Thanks for reviewing both!

@jszwedko jszwedko closed this Nov 13, 2020
@binarylogic binarylogic deleted the optional-line-agg-timeout branch August 3, 2021 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants