-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve char::is_ascii_*
codegen
#67585
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
Some benchmark numbers:
Please do not take these measurement as absolute as they suffer from 2 limitations:
|
Nice improvement. Do you have a plan to improve the second commit? |
I would love some guidance regarding the second commit. What is the preferred direction? |
Ping from triage: |
@joshtriplett short term I guess the safest option is to drop the second commit and accept the duplication (the code is trivial, the duplication in the doc comments is actually more heavy-weight than the implementation code). Would that be ok? |
Yes, I think the duplication is fine for now. |
7e6e0be
to
84435dd
Compare
This is still marked as |
I am guessing it is being ignored because of the tags. |
This comment has been minimized.
This comment has been minimized.
r? @KodrAus |
☔ The latest upstream changes (presumably #69030) made this pull request unmergeable. Please resolve the merge conflicts. |
These methods explicitly check if a char is in a specific ASCII range, therefore the `is_ascii()` check is not needed, but LLVM seems to be unable to remove it. WARNING: this change improves the performance on ASCII `char`s, but complex checks such as `is_ascii_punctuation` become slower on non-ASCII `char`s.
84435dd
to
4e7aeaf
Compare
rebased to resolve merge conflicts |
r? @Amanieu |
There is a way to improve performance even further, at the cost of uglier code. If you need to check both uppercase and lowercase, you can clear bit 6 (0x20) of the character code to convert to uppercase and then do a single range check on the uppercase value. So for example, |
Yup, with further tweaking it might be possible to improve the performance. |
Well, let's leave that for a future PR then. @bors r+ |
📌 Commit 4e7aeaf has been approved by |
…r=Amanieu Improve `char::is_ascii_*` codegen This PR is an attempt to fix rust-lang#65127 A couple of warnings: 1. the generated code might be further improved (in LLVM and/or MIR) by emitting better comparison sequences; in particular, this would improve the performance of "complex" checks such as those in `is_ascii_punctuation` 2. the second commit is currently marked "DO NOT MERGE", because it regresses SIMD on `u8` slices; this could likely be fixed by improving the computation/usage of demanded bits in LLVM An alternative approach to remove the code duplication might be the use of macros, but currently most of the duplication is actually in the doc comments, so maybe just keeping the redundancy could be ok
Rollup of 8 pull requests Successful merges: - #67585 (Improve `char::is_ascii_*` codegen) - #68914 (Speed up `SipHasher128`.) - #68994 (rustbuild: include channel in sanitizers installed name) - #69032 (ICE in nightly-2020-02-08: handle TerminatorKind::Yield in librustc_mir::transform::promote_consts::Validator method) - #69034 (parser: Remove `Parser::prev_token_kind`) - #69042 (Remove backtrace header text) - #69059 (Remove a few unused objects) - #69089 (Properly use the darwin archive format on Apple targets) Failed merges: r? @ghost
This PR is an attempt to fix #65127
A couple of warnings:
is_ascii_punctuation
u8
slices; this could likely be fixed by improving the computation/usage of demanded bits in LLVMAn alternative approach to remove the code duplication might be the use of macros, but currently most of the duplication is actually in the doc comments, so maybe just keeping the redundancy could be ok