-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup regex compiler operators and operands source #10879
Cleanup regex compiler operators and operands source #10879
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #10879 +/- ##
===============================================
Coverage ? 86.32%
===============================================
Files ? 144
Lines ? 22696
Branches ? 0
===============================================
Hits ? 19593
Misses ? 3103
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanups. The PR description should probably be updated now that #10843 is merged.
cpp/src/strings/regex/regcomp.cpp
Outdated
|
||
static std::vector<int> tokens{STAR, STAR_LAZY, QUEST, QUEST_LAZY, PLUS, PLUS_LAZY, RBRA}; | ||
_last_was_and = | ||
std::any_of(tokens.begin(), tokens.end(), [token](auto t) { return t == token; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess tokens are cheap (and this is host code) so this doesn't really matter, but you could capture the token by reference to avoid copying right? Also, you could iterator using cbegin
and cend
instead and make the function parameter auto const t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy by value for int types is preferred over by reference if the value is not changed. I think the cbegin/cend
is a good idea. I don't think declaring t
as const
is that helpful here. The compiler will generate the same code and a human only has one line of code to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm dumb, when I wrote "I guess tokens are cheap" I didn't check the vector above to see exactly what tokens are. Yes, definitely agree that copy by value is preferred here. I also agree that the const is not necessary, was just a minor suggestion.
Should we make the tokens
vector constexpr, then? Also, just a note that if we ever switch OperatorType
to be a C++-style scoped enum we should probably add a base type specifier that aligns with the type of the tokens vector just to be explicit and avoid type conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think constexpr
would be more appropriate. Unfortunately you can only declare constexpr
for literal or reference types.
So I prefer using static
here.
It looks like c++20
will support constexpr
with std::vector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's a good point. This would probably be a good place to use std::array
(with CTAD so that you don't even need to specify the length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small follow-ups to my previous review questions, but nothing blocking.
@gpucibot merge |
Cleans up the
regcomp.cpp
source to fix class names, comments, and simplify logic around processing operators and operands returned by the parser. Several class member variables used for state are moved or eliminated. Some member functions and variables are renamed. Cleanup of the parser logic will be in a follow-on PR.Reference #3582
Follow on to #10843