Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace regex with winnow based parser #255

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

schneems
Copy link
Contributor

@schneems schneems commented Jan 7, 2025

Replace the existing Regex based "parser" with an actual honest-to-goodness parser/combinator based parser.

How to review

Some suggestions on how to review this:

  • Proofread/comment on SPEC.md in the fist commit. This is the basis for the rest of the parser work so if you disagree with something there, it's not worth continuing on into the implementation
  • Read the changes to the integration tests on the final result: Warnings have been added, but otherwise assert that it's a refactor.
  • Read the added parsing error message in error.rs
  • Read the unit tests in procfile.rs. Some of them were useful for development, the majority are asserting overall behavior and edge cases
  • If you want to review the actual winnow and parser code, please book a pairing session with me. It's a lot and is better to talk through syncronously. Feel free to suggest any documentation typos etc.

This file is intended to define valid `Procfile` inputs and parser behavior separate from implementation. It may be updated in the future based on upstream requirements (CNB spec or kubernetes spec) or through future discussion.
@schneems schneems marked this pull request as ready for review January 7, 2025 18:33
@schneems schneems requested a review from a team as a code owner January 7, 2025 18:33
@schneems schneems enabled auto-merge (squash) January 7, 2025 19:24
SPEC.md Outdated Show resolved Hide resolved
SPEC.md Show resolved Hide resolved
@@ -0,0 +1,42 @@
# CNB Procfile format specification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this exists, but I wonder if this spec should be shared/enforced more widely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared

You mean you're wanting more feedback from a diverse group of people (such as other Heroku teams) or you're wanting to promote the format to a larger group outside of Heroku such as language communities (or both)?

I'm open to both. I'm also open to suggestions. Perhaps we link to the SPEC from the readme and make it more prominent.

enforced

We could share the logic by providing a stand-alone parser utility that converts it to some other format like json or toml. We could start by calling both parsers in cedar and instrumenting failures. While people are using # like comments in today's procfile, technically any leading character would work //.

I also had the idea that I could expose the parser or ship it via crates.io so that other buildpacks could use it. Some buildpacks have logic like "If app has a Procfile..." but it would be more robust to share the same parsing logic.

Getting a little ahead of myself. We could also consider upstreaming RFC 1123 to the CNB spec to make it more clear so then it's not "the procfile buildpack supports the intersection of kubernetes and CNB" and then becomes simply "the procfile buildpack supports the CNB spec".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was just thinking share/enforce this across Heroku more broadly. This will be the third Procfile parser at Heroku, and I doubt they are consistent. This isn't a blocker for this PR though.

I'd support adjusting the CNB spec to have process names that match RFC 1123 if you wanted to do that. I expect many / most CNB users expect to use apps built with CNBs in kubernetes in some way.

* Use "separated" instead of terminated/preceded

* Add a missing word

* Specify that spaces are not part of they key

* Document duplicate key behavior
@schneems schneems requested a review from a team January 9, 2025 21:55
- An empty line MUST be zero or more spaces followed by a line ending or end of file (EOF).
- Key/Value pairs
- A line MAY contain a key/value pair where the key represents the name of a process and the value represents a command
- A key MUST be separated from its value by a colon (`:`) followed by zero or more spaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this spec covers a space before the :? Like my-process : bin/blah. I think that should be legal, but I could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be legal, but I could be wrong.

Looking at the prior regex

^[[:space:]]*([a-zA-Z0-9_-]+):?\\s+(.*)[[:space:]]*
  • starts with one or more whitespace characters
  • alphanumeric or _ or -
  • funny enough the : is optional, but it doesn't support spaces before :
  • A space after the (optional) : is enforced (which is a requirement we've loosened up). It cannot be a tab, it must be a space (which we've loosened up).
  • Then it grabs the value
  • Then the regex makes it look like it strips off any trailing spaces but it doesn't. Rust won't strip it out:
image

Nor will ruby:

irb(main):005:0> match = "web: rails s         ".match(/^[[:space:]]*([a-zA-Z0-9_-]+):?\s+(.*)[[:space:]]*/);
=> #<MatchData "web: rails s         " 1:"web" 2:"rails s         ">
irb(main):006:0> match[2]
=> "rails s         "

So a space before the : wouldn't work today, but instead of an error or warning you'll quietly get an ignored line. I don't think we want to support optional :.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in favour of making the spec (and the Procfile CNB implementation) more strict than the current implementation in cases where we think it makes sense. So long as the Procfile CNB error message is clear, users can tweak their Procfile if needed as part of migrating to CNB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on @edmorley's comment. I imagine most of the allowed weird Procfiles on cedar are accidentally valid.

Copy link
Member

@joshwlewis joshwlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably warrants a major version bump. Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants