Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the legalizer #438

Open
wants to merge 47 commits into
base: main
Choose a base branch
from
Open

Implement the legalizer #438

wants to merge 47 commits into from

Conversation

mcy
Copy link
Member

@mcy mcy commented Jan 29, 2025

This PR adds the legalizer: a collection of diagnostics that follow parsing to eliminate and diagnose invalid constructs that the parser accepted in order the make progress.

This PR also adds a new syntax package for e.g. syntax.Proto2 and other editions knowledge. It adds functions for classifying predeclared.Names and some new taxa.Nouns for naming concepts in diagnostics relevant to the legalizer. Parsing is also tweaked in a few places to generate better diagnostics.

@mcy mcy requested a review from jhump January 29, 2025 23:20
@mcy mcy requested review from emcfarlane and doriable February 3, 2025 18:17
Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it through 16 out of 26 commits. I know I've got some big commits still ahead of me (legalize message and friends; legalize fields), which I am still working through.

But I figured I should go ahead and publish the remarks I have so far since there has been no other activity on the PR since we talked a couple of days ago. I promise I've been working on it and not ignoring it! :)

experimental/ast/predeclared/methods.go Outdated Show resolved Hide resolved
experimental/ast/syntax/syntax.yaml Show resolved Hide resolved
experimental/parser/testdata/parser/field/group.proto.yaml Outdated Show resolved Hide resolved
experimental/parser/parse_decl.go Show resolved Hide resolved
experimental/parser/testdata/parser/package/42.proto Outdated Show resolved Hide resolved
experimental/parser/legalize_file.go Outdated Show resolved Hide resolved
Comment on lines 4 to 7
19 | package 42;
| ^^^^^^^
= help: to place a file in the empty package, remove the `package` declaration
= help: however, using the empty package is discouraged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a good diagnostic. Perhaps the AST node should allow an expression instead of just a path here, so that you diagnose 42 is an invalid type for the package name?

(Totally fine to be a TODO: I know changing the AST at this point might be enough churn to warrant a separate PR.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained elsewhere why this doesn't work properly. There's already a TODO in the .proto file.

experimental/parser/legalize_file.go Outdated Show resolved Hide resolved
... |
22 | | import weak "foo.proto";
| | ^^^^^^^^^^^^^^^^^^^^^^^^ this weak import...
23 | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this diagnostic have an extra line of context after the snippet? The two above do not, so perhaps it's a bug?

If it's a quirk/bug in the reporting, totally fine to fix that in a separate PR. But, if it's an issue in the way the diagnostic is constructed in this commit, then let's try to fix in this branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a quirk with the renderer, but it was easy enough to find and fix.

|
19 | / message M {
... |
23 | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: why is there an extra line of context before the snippet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed :)

Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing the rest of the commits from the original push.

I saw you already replied to some of my first round of comments and pushed changes, so I'll take a look at those next.

Comment on lines 106 to 108
if in != taxa.Reserved {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to legalize the values for extension ranges -- like make sure they are all numeric ranges. The only non-numeric value allowed in extension ranges is builtin max.

Despite this early return, I see several (currently unnecessary) if in == taxa.Reserved checks below, that perhaps should be expanded to perform other checks for extension ranges. For example, both cases below are checking usage of identifiers and names in reserved ranges, which should be outright disallowed in extension ranges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment explaining why I haven't done that (although it appears in a later commit maybe?).

}

seq.Values(decl.Ranges())(func(expr ast.ExprAny) bool {
switch expr.Kind() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a case for ast.ExprKindRange (for both reserved and extension ranges): I imagine we'd want to legalize the bounds -- start must be an integer and end must either be integer or max builtin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, both of the diagnostics below warrant suggestions: where an identifier is found but unexpected, it should be quoted. Where a string literal is found but unexpected, it should be unquoted. Though if the string literal does not contain an identifier, we can't make that suggestion, but might also pair it with a warning that string literals should contain valid identifiers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestions.

Comment on lines 178 to 182
51 | | / oneof O {
... | |
57 | | | }
| \___- ...cannot be declared within this file scope
| \_^ this oneof definition...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interleaving of oneof and file appear in correct. Shouldn't the file always be rendered as "wider range" than anything else inside it?

Also, I don't find the file scope annotation to be helpful. In reality, the oneof is absoluately allowed inside this file scope -- it just must be nested inside a message. So I think it would be better to just leave out the file-scope annotation and to revise the wording about it not being allowed at the top-level. Maybe even use the map of allowed parents to construct a message about where it is allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed, and added the help text.

experimental/parser/legalize_def.go Outdated Show resolved Hide resolved
def.MarkCorrupt()
kw := taxa.Keyword(def.Keyword().Text())
p.Error(errUnexpected{
what: def.Name(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be nice to just highlight the first separator instead of the entire path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it but idk if I like the way it looks.

error: unexpected `.` in identifier
  --> testdata/parser/def/bad_path.proto:20:14
   |
20 |     oneof foo.Bar {}
   |              ^ expected identifier
   = note: the name of a oneof definition must be a single identifier

experimental/parser/legalize_type.go Outdated Show resolved Hide resolved
experimental/parser/legalize_type.go Show resolved Hide resolved
experimental/parser/legalize_type.go Outdated Show resolved Hide resolved
experimental/parser/legalize_type.go Outdated Show resolved Hide resolved
@@ -222,6 +220,8 @@ func legalizePackage(p *parser, parent classified, idx int, first *ast.DeclPacka
legalizePath(p, taxa.Package.In(), decl.Path(), pathOptions{Relative: true})
}

var isOrdinaryFilePath = regexp.MustCompile("[0-9a-zA-Z./_-]*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this down? The convention in most of Buf's codebases is to keep consts and vars at the top of the file (consistency makes them easier to find when you're not in an IDE).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closer to where it's used. I know that's the style, but for some reason I thought you'd said something about how you preferred to have stuff closer together? I can move it back.

(I wish go just had function-local globals lmao, this is such a caveman misfeature)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants