-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix escaped like wildcards in like_utf8
/ nlike_utf8
kernels
#2258
Fix escaped like wildcards in like_utf8
/ nlike_utf8
kernels
#2258
Conversation
Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character. This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not.
Codecov Report
@@ Coverage Diff @@
## master #2258 +/- ##
==========================================
- Coverage 82.29% 82.15% -0.15%
==========================================
Files 243 248 +5
Lines 62443 63537 +1094
==========================================
+ Hits 51387 52196 +809
- Misses 11056 11341 +285
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @daniel-martinez-maqueda-sap -- this looks like an improvement to me for sure. I left some suggestions but I think this PR is already better than master and I think it could be merged as is
@@ -342,7 +377,7 @@ pub fn nlike_utf8_scalar<OffsetSize: OffsetSizeTrait>( | |||
result.append(!left.value(i).ends_with(&right[1..])); | |||
} | |||
} else { | |||
let re_pattern = escape(right).replace('%', ".*").replace('_', "."); | |||
let re_pattern = replace_like_wildcards(right)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how you have refactored this code 👍
@@ -298,10 +298,15 @@ pub fn like_utf8_scalar<OffsetSize: OffsetSizeTrait>( | |||
Ok(BooleanArray::from(data)) | |||
} | |||
|
|||
fn replace_like_wildcards(text: &str) -> Result<String> { | |||
/// Transforms a like `pattern` to a regex compatible pattern. To achieve that, it does: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
like_utf8
/ nlike_utf8
kernels
Benchmark runs are scheduled for baseline = 22185fd and contender = f78d2e6. f78d2e6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…che#2258) * Fix escaped like wildcards Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character. This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not. * Rewrite logic with peek after PR feedback * Simplifly logic * Add documentation and refactor string creation in tests * Add small fix and cargo fmt
…che#2258) * Fix escaped like wildcards Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character. This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not. * Rewrite logic with peek after PR feedback * Simplifly logic * Add documentation and refactor string creation in tests * Add small fix and cargo fmt
…che#2258) * Fix escaped like wildcards Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character. This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not. * Rewrite logic with peek after PR feedback * Simplifly logic * Add documentation and refactor string creation in tests * Add small fix and cargo fmt Can drop this after rebase on commit f78d2e6 "Fix escaped like wildcards in like_utf8 / nlike_utf8 kernels (apache#2258)", first released in 20.0.0
…che#2258) * Fix escaped like wildcards Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character. This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not. * Rewrite logic with peek after PR feedback * Simplifly logic * Add documentation and refactor string creation in tests * Add small fix and cargo fmt Can drop this after rebase on commit f78d2e6 "Fix escaped like wildcards in like_utf8 / nlike_utf8 kernels (apache#2258)", first released in 20.0.0
…che#2258) * Fix escaped like wildcards Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character. This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not. * Rewrite logic with peek after PR feedback * Simplifly logic * Add documentation and refactor string creation in tests * Add small fix and cargo fmt Can drop this after rebase on commit f78d2e6 "Fix escaped like wildcards in like_utf8 / nlike_utf8 kernels (apache#2258)", first released in 20.0.0
Which issue does this PR close?
Closes #415
Rationale for this change
It is explained in the Issues linked above.
What changes are included in this PR?
Added a new function that replaces the like wildcards '%' and '_' for the regex counterparts before executing them. It also takes into account that the wildcards can be escaped, in that case, it does remove the escape characters and leaves the wildcards so that they are matched against the raw character.
This is implemented iterating over all the characters of the pattern to figure out when it needs to be transformed or not.
Are there any user-facing changes?
N/A