Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude parenthese determination from comments and string literals #4964

Merged
merged 11 commits into from
Jan 30, 2025

Conversation

jevanlingen
Copy link
Contributor

@jevanlingen jevanlingen commented Jan 28, 2025

What's changed?

Exclude parentheses handling for comments and string literals.

What's your motivation?

Improvement to

Checklist

  • I've added unit tests to cover both positive and negative cases
  • I've read and applied the recipe conventions and best practices
  • I've used the IntelliJ IDEA auto-formatter on affected files

github-actions[bot]

This comment was marked as outdated.

@jevanlingen jevanlingen changed the title Exclude parenthese determination from comments Exclude parenthese determination from comments and string literals Jan 28, 2025
@jevanlingen jevanlingen self-assigned this Jan 28, 2025
@jevanlingen jevanlingen marked this pull request as draft January 28, 2025 11:16
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

@jevanlingen jevanlingen added bug Something isn't working parser-groovy labels Jan 28, 2025
@jevanlingen jevanlingen marked this pull request as ready for review January 28, 2025 13:41
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

Copy link
Contributor

@knutwannheden knutwannheden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a shame that we need to handcode a lexer here. Maybe we should at least explore the possibility of using org.codehaus.groovy.antlr.parser.GroovyLexer instead?

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

@jevanlingen
Copy link
Contributor Author

jevanlingen commented Jan 29, 2025

It is a shame that we need to handcode a lexer here. Maybe we should at least explore the possibility of using org.codehaus.groovy.antlr.parser.GroovyLexer instead?

I fear figuring it all out and using it is neither worth the time nor the effort. The Groovy Compiler uses ANTLR to do its lexing, so for example the dollar slashy string $/ and /$ chars can be found there: GroovyLexer.g4, but not directly in the Java code.

The GroovyLexer java class does not contain the delimiter fields I defined. Internally the Groovy compiler does indeed have that knowledge somewhere (see for example the internal StringUtil class, where they use an int to define whether a string is of a slashy type).


Edit: The GroovyLexer does of course contain:

public static final int DQ_GSTRING_MODE = 1;
public static final int TDQ_GSTRING_MODE = 2;
public static final int SLASHY_GSTRING_MODE = 3;
public static final int DOLLAR_SLASHY_GSTRING_MODE = 4;
public static final int GSTRING_TYPE_SELECTOR_MODE = 5;

But that doesn't really do any good. You still need to translate those integers to ", $/, ', etc. I don't think our own code would get any clearer if we start using these numbers/constants.


Edit 2: Ah, now I understand it better, after reading a little bit more about lexers (never before have I read those kind of files). All the "lexer rules" are defined in this ANTLR lexer file. This file uses grammar rules to define a language. The java code is generated, most keywords are listed as integer constants in GroovyLexer java class. Other useful stuff is generated as private members, for instance:

public class AstBuilder extends GroovyParserBaseVisitor<Object> {
  //..
  private static final String SLASH_STR = "/";
  private static final String SLASH_DOLLAR_STR = "/$";
  //..
}

As we can't use al the private stuff, unless you want to use reflection, I wouldn't go down this road. It's just not worth to invest anymore time.

@knutwannheden
Copy link
Contributor

The idea with the lexer would have been to let the ANTLR lexer tokenize some given part of the source code. It would then return a stream of tokens, where each token is stuff like keywords, literals, identifiers, punctuation characters, comments, and of course also tokens representing open and close parentheses. So the idea would have been to just zip through this token stream and count the parentheses. The lexer would worry about literals, comments, etc.

Whether it is any easier or better in the end is hard to tell without trying it out. But an ANTLR lexer is definitely very well optimized for tokenization.

@jevanlingen
Copy link
Contributor Author

jevanlingen commented Jan 29, 2025

Ok, now I understand what you were getting at. My knowledge of lexers is zero to none, as you could see in the above post. I guess if we want to try that, you should take over. Still, I think it's an unnecessary action now that we have everything in place. Let's wrap up this PR; if we ever face the parenthesis question again, let's reconsider whether using the ANTLR lexer is a good idea.

github-actions[bot]

This comment was marked as outdated.

…elimiter.java

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • rewrite-maven/src/test/java/org/openrewrite/maven/internal/MavenPomDownloaderTest.java
    • lines 1278-1278

@jevanlingen jevanlingen merged commit 63c9526 into main Jan 30, 2025
2 checks passed
@jevanlingen jevanlingen deleted the groovy-more-robust-parentheses-with-comments branch January 30, 2025 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parser-groovy
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants