-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh(php) Left and right-side of double colon #3422
enh(php) Left and right-side of double colon #3422
Conversation
8354aa3
to
d55e79e
Compare
d55e79e
to
d7f714b
Compare
src/languages/php.js
Outdated
}, | ||
{ | ||
begin: [ | ||
/[A-Z]\w*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets wrap this in a constant since it's re-used multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because now I have an error, I found the root of my regression. The match is case-insensitive because it uses i
by default. It totally ignores the uppercase rule. Can I disable it or is it some kind of bug?
Error: php/[9]/begin: Polynomial backtracking. By repeating any character that matches /\xa0/i, an attack string can be created.
\s+)([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*
^~~ ^~~~~~~~~~~~~~~~
Full pattern:
/(const)(\s+)([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*(?![A-Za-z0-9])(?![$]))(\s*=)/im
- I also added a new const for name and in MR description added info about names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshgoebel Do you know anything about this case-insensitive matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHP is case_insensitive: true,
so EVERY regex is /i
, there i no need to specify and it's impossible to have ANY regex that is case-dependent. I forgot this or I'd have mentioned it earlier.
So forget about using CamelCase
rules... I've read not ALL of PHP is case-sensitive though... perhaps we could switch it to "case-sensitive" and try and deal with the exceptions by hand? How familiar are you with PHP's case sensitivity overall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHP is
case_insensitive: true,
so EVERY regex is/i
, there i no need to specify and it's impossible to have ANY regex that is case-dependent. I forgot this or I'd have mentioned it earlier.
Now that's news. How could I miss this case_insensitive: true,
:)
I checked the current implementation of the PHP highlighter and part of camel case keywords are affected by this. I would separate keywords into case-insensitive (false, if) and case-sensitive (ArithmeticError ArrayIterator). Case-insensitive function and class names are not affected by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would separate keywords into case-insensitive (false, if) and case-sensitive
Except we have no current way to do that [case insensitive in a case sensitive grammar] - the setting is at the grammar level (because we concat every regex into a single huge one)... I could imagine a possible enhancement to JUST keywords
to allow "as given" or "all lowercase" (would allow the two most common variants, which might in many cases be sufficient)... but that doesn't necessarily help with any custom rules, which would need every letter individually split out... of course we could potentially write a helper for that...
What do you think of the "as given" + all lowercase... ie, supporting "false" and "FALSE" but not "fAlSe"?
I'm trying to figure out what could be done here to wrap this PR up and then circle back to the larger discussion of casing as a sep concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. I dropped a few notes.
d7f714b
to
e6b4237
Compare
I fixed issues and added some questions. |
If you could please open a new PR from It's costing more time to explain some of the tiny things than to just fix them myself. |
Alternatively you could perhaps add me with repo write permission against your fork |
e6b4237
to
e929ec6
Compare
// This will not detect camelCase classes | ||
const PASCAL_CASE_CLASS_NAME = regex.concat("[A-Z]", IDENT_RE); | ||
|
||
const CONSTANT_REFERENCE = regex.concat(IDENT_RE, "\\b(?!\\()"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this a constant reference? In other languages our constant rule is ALL_CAPS_CASE. Is anything following a ::
just a constant in php?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if it's truly a constant we'd use variable.constant
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this a constant reference? In other languages our constant rule is ALL_CAPS_CASE. Is anything following a
::
just a constant in PHP?
Language allows also lower case or any other, but probably almost all use ALL_CAPS_CASE.
It is just IDENT_RE that can't be followed by (. This distinguishes it from the static method call. I left examples in the PR description. In this context, it's just a helper that is part of a few matching rules.
And if it's truly a constant we'd use
variable.constant
...
Ok, I guess accessing Enum value is also variable.constant
, because we can't distinguish those.
Constant declaration and Enum values also should be marked with variable.constant
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just IDENT_RE that can't be followed by (.
That's that's a "function dispatch" or something, not a "constant"... constant to me means const
or read only or UPPERCASE
by convention in may languages...
Constant declaration and Enum values also should be marked with variable.constant?
Perhaps if you could detect they are enum values... I dunno what the PHP syntax is... for most languages it's impossible to do this 100% correctly so we usually have a UPPERCASE constant rule for languages where it's a string pattern and then only catch others in very explicit cases like class name
, etc... where it's clean name is constant despite the weird casing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I just looked above, yes the enum declarations could be marked that way I suppose...
e929ec6
to
2e93647
Compare
2e93647
to
4f02ee1
Compare
@joshgoebel Pls review. |
4f02ee1
to
1d9cdf2
Compare
/const/, | ||
/\s/, | ||
IDENT_RE, | ||
/\s*=/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a look-ahead, I'm not sure why we need to consume it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't know enough about the highlight js to know what's the difference.
- Why can't we consume it?
- Why is look-ahead, is a better solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't just consume things arbitrarily... we aren't highlighting it here, we only need to guarantee it's presence, not consume it. In general we don't consume things we don't need to.
Later someone may want to add operator matching where we highlight all =
and consuming it here would make that impossible... it should be left int he input stream so another rule might match it later.
src/languages/php.js
Outdated
/(?![a-zA-Z0-9_\x7f-\xff])/ | ||
/(::|->)+/, | ||
IDENT_RE, | ||
regex.concat("(?!", WHITESPACE, "*\\()") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there an issue with line 387?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We define again the same regex pattern instead of using an already defined one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You changed the regex pattern... did you not? It was previously ~:
-> ident [ NOT "\s(" AND NOT IDENT ]
You removed the "AND NOT IDENT", portion did you not? Was there a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, lines 385 and 387 are the same as the regex rule of IDENT_RE. IDENT_RE only has the suffix (?![$]))
, which was missing here (in my opinion). Documentation states that all labels follow the same pattern:
Variable names follow the same rules as other labels in PHP. A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$
I also changed the position of the negative look-ahead from the middle to the end. It didn't break any test and according to my knowledge about regex, it should work the same. If you could provide me with a sample that passes the old code and fail in the new one, then I would fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't break any test...
Sadly our tests are far from comprehensive, hence the careful eye I have with reviewing PRs.
This was added with: d49e9fb
The test code added was , 3 => \Location\Web\URI::class));
so I imagine this was to protect the ::class
... but now you're explicitly handling that... so if we remove this guard entirely, does it break anything?
If not, lets throw it out completely. And if it does, then we can look at what broken to help clarify it's purpose of existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Reverting back to, didn't break anything.
Co-authored-by: Josh Goebel <[email protected]>
@wkania I removed the rule, seems fine. I pretty sure it's redundant now. Thanks for all work on this! |
* master: (816 commits) (chore) add sideEffects: false to enable tree-shaking in esbuild & others fix(markdown) Handle `***Hello world***` without breaking (highlightjs#3457) (chore) DRY up php grammar just a little enh(php) support CSSCase attribute naming refactor, security issues enh(php) Add support for Attributes fix(java) prevent false variable init on `else` (highlightjs#3455) (ci) min change threshold for size report (highlightjs#3401) (themes) Add `tokyo-night-dark` (highlightjs#3467) enh(llvm) Improve number support, add `char.escape` (highlightjs#3471) (chore) simplify brainfuck grammar fix(brainfuck) fix highlighting of initial ++/-- Minor change to TypeScript types and TypeScript-specific keywords (highlightjs#3466) fix(angelscript) Fix highlighting of int8, int16, int32, int64 (highlightjs#3464) enh(php) named arguments and fix php constants (highlightjs#3459) themes: add new felipec theme (highlightjs#3441) (chore) release 10.4.0 enh(arcade) Add missing keywords for Arcade v1.16 chore(arcade) eslint --fix, explode keywords enh(php) Left and right-side of double colon (highlightjs#3422) ...
Double colon aka Scope Resolution Operator is used in a few contexts:
Example::$uuid; self::$a; Example::test(); $a::test();
Foo::Test;
parent::$a; parent::__construct();
Foo::Test;
self::class; static::class; parent::class; Foo::class; $a::class
Changes
IDENT_RE
for valid labels instead ofw+
variable.language
.class is a special case of the constant, it's also a reserved keyword.
Info about rules for valid labels (variables, constants, and other names).
Checklist
CHANGES.md