Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils: parse quotes when splitting strings #6387

Merged
merged 5 commits into from
May 3, 2023

Conversation

tsaarni
Copy link
Contributor

@tsaarni tsaarni commented Nov 11, 2022

This change makes it possible to add configuration parameters with whitespace in them by adding support for quotes. For example, user might want to use add operation in modify filter to add value with whitespace in it:

[FILTER]
    name modify
    match *
    condition key_value_matches message .*FOO.*
    add facility "audit log"

When processing the string, the parser will peek into the first non-separator character. If it is quote, it will expect quoted string and process until end-quote. Otherwise it will process the token as previously. Within quoted string, it is possible to escape quote with \" or \'.

Note 1: An alternative approach would be to create new version of flb_utils_split() to make it absolutely sure the quotes are obeyed only for config files and not when splitting other input. This change currently does not do that and all test cases pass.

Note 2: The current lack of support of quotes is not documented as far as I can find. I'm unsure if this change requires documentation.

Addresses #1225, #4286, #2415


Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@tsaarni
Copy link
Contributor Author

tsaarni commented Nov 11, 2022

$ cat fluent-bit.conf
[INPUT]
    name tcp

[OUTPUT]
    name stdout

[FILTER]
    name modify
    match *
    condition key_value_matches message .*FOO.*
    add facility "audit log"

$ valgrind fluent-bit -v --config=fluent-bit.conf
==262966== Memcheck, a memory error detector
==262966== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==262966== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==262966== Command: fluent-bit -v --config=fluent-bit.conf
==262966== 
Fluent Bit v2.0.5
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/11/11 11:25:53] [ info] Configuration:
[2022/11/11 11:25:53] [ info]  flush time     | 1.000000 seconds
[2022/11/11 11:25:53] [ info]  grace          | 5 seconds
[2022/11/11 11:25:53] [ info]  daemon         | 0
[2022/11/11 11:25:53] [ info] ___________
[2022/11/11 11:25:53] [ info]  inputs:
[2022/11/11 11:25:53] [ info]      tcp
[2022/11/11 11:25:53] [ info] ___________
[2022/11/11 11:25:53] [ info]  filters:
[2022/11/11 11:25:53] [ info]      modify.0
[2022/11/11 11:25:53] [ info] ___________
[2022/11/11 11:25:53] [ info]  outputs:
[2022/11/11 11:25:53] [ info]      stdout.0
[2022/11/11 11:25:53] [ info] ___________
[2022/11/11 11:25:53] [ info]  collectors:
[2022/11/11 11:25:53] [ info] [fluent bit] version=2.0.5, commit=b4db5b6c67, pid=262966
[2022/11/11 11:25:53] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2022/11/11 11:25:53] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2022/11/11 11:25:53] [ info] [cmetrics] version=0.5.7
[2022/11/11 11:25:53] [ info] [output:stdout:stdout.0] worker #0 started
[2022/11/11 11:25:53] [ info] [ctraces ] version=0.2.5
[2022/11/11 11:25:53] [ info] [input:tcp:tcp.0] initializing
[2022/11/11 11:25:53] [ info] [input:tcp:tcp.0] storage_strategy='memory' (memory only)
[2022/11/11 11:25:53] [debug] [tcp:tcp.0] created event channels: read=21 write=22
[2022/11/11 11:25:53] [debug] [downstream] listening on 0.0.0.0:5170
[2022/11/11 11:25:53] [debug] [filter:modify:modify.0] Creating regex for condition B : condition key_value_matches message .*FOO.* : .*FOO.*
[2022/11/11 11:25:53] [debug] [filter:modify:modify.0] Initialized modify filter with 1 conditions and 1 rules
[2022/11/11 11:25:53] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2022/11/11 11:25:53] [ info] [sp] stream processor started
[2022/11/11 11:25:56] [debug] [filter:modify:modify.0] Match for condition KEY_VALUE_MATCHES .*FOO.*
[2022/11/11 11:25:56] [debug] [input chunk] update output instances with new chunk size diff=56
[0] tcp.0: [1668158756.143888053, {"message"=>"Hello FOO world!", "facility"=>"audit log"}]
[2022/11/11 11:25:56] [debug] [task] created task=0x504a270 id=0 OK
[2022/11/11 11:25:56] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2022/11/11 11:25:56] [debug] [out flush] cb_destroy coro_id=0
[2022/11/11 11:25:56] [debug] [task] destroy task=0x504a270 (task_id=0)
^C[2022/11/11 11:26:01] [engine] caught signal (SIGINT)
[2022/11/11 11:26:01] [ warn] [engine] service will shutdown in max 5 seconds
[2022/11/11 11:26:01] [ info] [engine] service has stopped (0 pending tasks)
[2022/11/11 11:26:01] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/11/11 11:26:01] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==262966== 
==262966== HEAP SUMMARY:
==262966==     in use at exit: 0 bytes in 0 blocks
==262966==   total heap usage: 1,639 allocs, 1,639 frees, 917,490 bytes allocated
==262966== 
==262966== All heap blocks were freed -- no leaks are possible
==262966== 
==262966== For lists of detected and suppressed errors, rerun with: -s
==262966== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

# execute following on other terminal while fluent-bit is running to produce input for tcp input
$ echo '{ "message": "Hello FOO world!" }' > /dev/tcp/localhost/5170  

@tsaarni tsaarni temporarily deployed to pr November 11, 2022 09:42 Inactive
@tsaarni tsaarni temporarily deployed to pr November 11, 2022 09:42 Inactive
@tsaarni tsaarni temporarily deployed to pr November 11, 2022 10:02 Inactive
@tsaarni
Copy link
Contributor Author

tsaarni commented Nov 11, 2022

Fuzzer seemed to produce SIGSEGV. I'm not sure how to interpret the output and how to reproduce. I see there is base64 encoded fuzzed config file in the logs, so I tried following:

$ echo W1BdU10KTG5hbWUgagpMdHlwZXMgJyVaCkxmb3JtYXQgbHRzdgo= | base64 -d > fuzzer.conf
$ cmake -DSANITIZE_ADDRESS=On ..
$ make
$ bin/fluent-bit --config=fuzzer.conf

but it did not reproduce the crash.

@tsaarni
Copy link
Contributor Author

tsaarni commented Nov 23, 2022

Just a friendly ping, any feedback for this PR would be greatly appreciated :)

The change makes it possible to have parameters with whitespace in them.
For example, user might want to use Add operation in Modify filter to add
JSON value with whitespace in it.

Signed-off-by: Tero Saarni <[email protected]>
@tsaarni tsaarni temporarily deployed to pr December 3, 2022 02:23 Inactive
@tsaarni tsaarni temporarily deployed to pr December 3, 2022 02:23 Inactive
@tsaarni tsaarni temporarily deployed to pr December 3, 2022 02:41 Inactive
@shobanakupuraj
Copy link

We have the same issue, we could not use the modify filter with any space between. This change is working for our scenario too.
Is it possible for us to have this fix merged?

@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 3, 2023

Meged master to pick up the test from @nokute78.

@tsaarni tsaarni temporarily deployed to pr January 4, 2023 22:43 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr January 4, 2023 22:43 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr January 4, 2023 23:05 — with GitHub Actions Inactive
nokute78
nokute78 previously approved these changes Jan 4, 2023
Copy link
Collaborator

@nokute78 nokute78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsaarni Thank you for contribution.

I approved since the unit tests that I added to test flb_utils_split passed.
#6623

@edsiper Could you merge this ?

@tsaarni
Copy link
Contributor Author

tsaarni commented Feb 8, 2023

Hi @edsiper, when you have time, it would be greatly appreciated if you could have a look at this PR. Thank you!

@tsaarni
Copy link
Contributor Author

tsaarni commented Apr 18, 2023

Please hold, I will work some more with this: I will change the approach so that the behavior of flb_utils_split() can be kept.

Cc @leonardo-albertovich

@tsaarni
Copy link
Contributor Author

tsaarni commented Apr 18, 2023

I've now updated the PR by adding a separate function flb_utils_split_quoted() which parses quoted strings into tokens, while the existing flb_utils_split() keeps the old behavior.

For now, only the modify plugin is calling the new flb_utils_split_quoted() to solve the issues addressed by this PR, but it may be useful for other purposes in the future as well.

@tsaarni tsaarni force-pushed the quotation-support branch from 498e62b to ac7ff03 Compare April 18, 2023 09:47
@leonardo-albertovich
Copy link
Collaborator

I'm saving this PR now so I can properly review it and merge it on the 25th. Thank you for your patience.

/* Copy the token */
for (i = 0; i < len;) {
/* Handle escapes */
if (*token_in == '\\' && (token_in[1] == quote || token_in[1] == '\\')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a bug here, if I understood the code correctly, you would expect quote to hold the value coerresponding to the opening quote character but I can't spot any explicit sets or calls that should set it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thank you for spotting this! I've now fixed it and added test that would have caught it.

src/flb_utils.c Outdated
}

/* Copy the token */
for (i = 0; i < len;) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that you increment i only once at the end of this loop, could you please move that increment to this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*out_len = len;
*out = mk_string_copy_substr(token_in, 0, len);

return (int)(token_in - str) + len;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the caller function expects this function to return -1 on failure we should check the result of mk_string_copy_substr and if it returns NULL we should return -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

void test_flb_utils_split_quoted()
{
compare_split_entry("aa \"bb cc\" dd", ' ', 256, FLB_TRUE, "aa", "bb cc", "dd");
compare_split_entry("aa bb 'cc dd'", ' ', 256, FLB_TRUE, "aa", "bb", "cc dd");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test case for the case where there are multiple separators before a word or quoted component?

Could you ad a test case for the unescape functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests.

@tsaarni tsaarni temporarily deployed to pr April 26, 2023 08:08 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr April 26, 2023 08:08 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr April 26, 2023 08:08 — with GitHub Actions Inactive
@leonardo-albertovich
Copy link
Collaborator

Great, I just approved the PR so CI can run, then I think you'll have to fix that DCO issue and we'll be able to merge it.

Thank you very much both for the contribution and the patience @tsaarni!

Signed-off-by: Tero Saarni <[email protected]>
@tsaarni
Copy link
Contributor Author

tsaarni commented Apr 26, 2023

Great, I just approved the PR so CI can run, then I think you'll have to fix that DCO issue and we'll be able to merge it.

Thank you very much both for the contribution and the patience @tsaarni!

Thank you @leonardo-albertovich for great review comments! Much appreciated!

I force-pushed the last commit again to address the DCO sign-off issue. Could you accept the CI run again? Thank you!

@tsaarni tsaarni temporarily deployed to pr April 26, 2023 09:24 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr April 26, 2023 09:24 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr April 26, 2023 09:25 — with GitHub Actions Inactive
@tsaarni tsaarni temporarily deployed to pr April 26, 2023 09:43 — with GitHub Actions Inactive
@tsaarni
Copy link
Contributor Author

tsaarni commented Apr 26, 2023

@leonardo-albertovich Are the MacOS tests impacted by my change, or can it be unrelated flake?

I cannot see the reason from the logs other than

        Start   3: flb-rt-in_event_test
  1/120 Test   #3: flb-rt-in_event_test .......................Subprocess aborted***Exception:   4.86 sec
Test event_test...                              [2023/04/26 10:19:55] [ info] [fluent bit] version=2.1.2, commit=cb47b930b6, pid=20329

Some of the failures seem to have disappeared when you triggered a rerun.

@leonardo-albertovich
Copy link
Collaborator

Don't worry about those macos tests, the failures are not related to your PR and it's something we will fix in the near future.

I'm tagging @edsiper here so he knows this is ready to be merged from my point of view.

@leonardo-albertovich leonardo-albertovich merged commit 72c1376 into fluent:master May 3, 2023
@tsaarni
Copy link
Contributor Author

tsaarni commented May 3, 2023

Thank You @leonardo-albertovich for working with me on this PR, much appreciated! Hope to see some time again 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants