Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use full yaml parsing for more correct comment stripping #585

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

Elliot-Roberts
Copy link
Contributor

Hi, I was encountering a bug where an inlineManifest added via patch in my talconfig.yaml was ending up malformed in the node config generated by talhelper.

For example, my patch had this line:

- '--log-format [%Y-%m-%d %T.%e][%t][%l][%n] [%g:%#] %v'

Which was being modified to:

- '--log-format [%Y-%m-%d %T.%e][%t][%l][%n] [%g:

I tracked the issue down to the yaml comment stripping phase: the current regex is a bit overzealous and thought there was a comment in that string. Seeing the // FIXME use better logic than regex comment there, I thought I'd take a shot at improving it.

The approach I went with is a little heavyweight. The input is fully parsed into a yaml AST, operated on, and then re-rendered to bytes. This is probably best for correctness, but I would also be up for writing something lighter if preferred. I can think of a few ways of just modifying the regex to fix my case, for instance.


There are also behavior changes beyond unambiguous bug-fixing, which should be considered before merging.

  • "Comments" that are within multiline strings that happen to be embedded yaml are no longer stripped (because they are actually part of the content of the string). It could cause problems if anyone was relying on "commented out" code in strings being excluded from environment variable substitution. For example, this will error when MYENV2 is not defined:

    patches:
     - |-
      machine:
        env:
          MYENV: value
          # MYENV2: ${MYENV2}
  • While semantics should be preserved, sometimes style choices are not.
    The output of stripYamlComment will always be indented with 4 spaces, regardless of input style.
    Multiline strings are sometimes converted to double quoted ones. For instance, when they contain lines with trailing whitespace:

    foo: | # note the trailing spaces on the following line
      trailing:    
      bar

    will be turned into:

    foo: "trailing:    \nbar\n"

    (seems intentional, but I am not entirely sure why)

  • Most likely some other things I haven't thought of.

This might constitute a version change?


Thanks for your consideration and your work on this tool!

@budimanjojo
Copy link
Owner

budimanjojo commented Sep 9, 2024

Thank you for your contribution @Elliot-Roberts!

May you share the chunk of your talconfig.yaml where the regex doesn't work? Not sure why it's not working because the # is inside ' block and the regex should ignore it. Or maybe you know the reason why? It's not that I want the old regex back but I'm curious about the reason why it didn't work.

  • The reason behind the stripYamlComment() function was because of an issue (reported in Discord and not here, refinding the issue is hard but I got it now, lesson learned to always recreate the issue in GitHub instead in the future for easier referencing) where a commented out code with ${} block where the environment is no longer available causing the program to error out. So as you stated with your first bullet point, this will be the case with your solution too. But your solution catches the others which are not "comments inside multiline string" which I think that the user shouldn't do that anyway so I'm willing to accept this change in behavior.
  • This project is using 2 spaces indentation and it's my personal preference. So please make it 2 spaces instead of 4. Although I believe the output of this function is not being outputted in the final rendered manifests but it's good to be consistent. As for the multiline string getting rendered differently, I don't think this will cause any issue?

Your solution seems better than the regex that I'm using so I'm going to merge this PR, please check my review comments, thank you!

pkg/substitute/envsubst_test.go Outdated Show resolved Hide resolved
pkg/substitute/envsubst.go Show resolved Hide resolved
pkg/substitute/envsubst_test.go Outdated Show resolved Hide resolved
pkg/substitute/envsubst.go Outdated Show resolved Hide resolved
@Elliot-Roberts
Copy link
Contributor Author

About where the regex fails: it looks like when there is a # and the character immediately before it is not alphanumeric or a quote, then the rest of the line is always stripped.

Using part of the test case I wrote as an example:

a1: '123!@# not a comment'

It first finds all matches for /.?#.*\n/, i.e. optionally one character, then a #, then the rest of the characters on the line. So this matches @# not a comment'\n.
And then for each match, if /^['\"].+['\"]|^[a-zA-Z0-9]/ matches, then it is kept, otherwise it is stripped.
The text @# not a comment'\n does not start with a quote (/^['\"]/), nor does it start with an alphanumeric character (/^[a-zA-Z0-9]/). It starts with @. So the second regex does not match and the text is stripped, yielding an end result of:

a1: '123!

@budimanjojo budimanjojo merged commit 5e71313 into budimanjojo:master Sep 9, 2024
2 checks passed
@budimanjojo
Copy link
Owner

Thank you @Elliot-Roberts!

@Elliot-Roberts
Copy link
Contributor Author

Sure thing, and thanks for the tool! It's been very nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants