-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup cudf::strings::detail::regex_parser class source #10975
Cleanup cudf::strings::detail::regex_parser class source #10975
Conversation
cpp/src/strings/regex/regcomp.cpp
Outdated
} d; | ||
}; | ||
std::vector<Item> m_items; | ||
std::vector<regex_parser::Item> expand_counted_items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a new function but was simply moved from the regex_compiler
class to here since it only needs the regex_parser::_items
variable.
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #10975 +/- ##
===============================================
Coverage ? 86.34%
===============================================
Files ? 144
Lines ? 22738
Branches ? 0
===============================================
Hits ? 19632
Misses ? 3106
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dense, but appears to mostly be renames of members. Looks better than it did for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. minor nits.
cpp/src/strings/regex/regcomp.cpp
Outdated
int quoted = nextc(yy); | ||
_chr = 0; | ||
char32_t chr = 0; | ||
auto quoted = next_char(chr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know about performance but if the signature was cuda::std::pair<bool> next_char()
then this would look slightly neater auto [is_quoted, chr] = next_char()
cpp/src/strings/regex/regcomp.cpp
Outdated
int32_t _id_ccls_w{-1}; // alphanumeric | ||
int32_t _id_ccls_W{-1}; // not alphanumeric | ||
int32_t _id_ccls_s{-1}; // space | ||
int32_t _id_ccls_d{-1}; // digit | ||
int32_t _id_ccls_D{-1}; // not digit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the risk of making the code too verbose, can these get vowels and more elaborate docs?
@gpucibot merge |
This cleans up the awkward range literals for supporting the `CCLASS` and `NCCLASS` regex instructions. The range values were always paired (first,last) but arranged consecutively in a flat vector so `[idx] and [idx+1]` were range pairs `idx` was even. This PR introduces a `reclass_range` class that holds the pairs so we can use normal algorithms to manipulate them. There is some overlap with code changes in PR #10975 Reference #3582 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - MithunR (https://github.com/mythrocks) URL: #11045
Cleans up the
regex_parser
class source to fix member names, comments, and simplify some logic creating parse-items.Minimal changes were made to the
regex_compiler
to accommodate some public interface changes.Also, the
regex_compiler::expand_counted()
function was moved intoregex_parser
since it only need the parser class'_items
data. It seemed more apt for the member function to part ofregex_parser
thanregex_compiler
.Reference #3582