-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there SIMD support? #411
Comments
Not yet. I had some ideas about it, but it's not in the works yet. |
Thanks, that's good to know. |
A note about alignment: since we cannot guarantee input data alignment, it would be impossible to use multi-byte reads (not without explicit guarantee from the user that the input is aligned). However, it should still be possible to combine multiple bytes into a 2/4/8-byte value and do the switch on this combined value rather than do 2/4/8 consequent switch statements. This optimization is not straightforward, as it the underlying DFA may not have many linear segments that can be combined this way (due to the grammar, or due to the possible end-of-input after each byte). This needs some study and experiments. |
I wonder if explicitly open coding the SIMD compiler intrinsics beats trying to emit constructs that the compiler can SIMD itself. |
I don't think compiler can perform such optimization, as they require a bit of high-level insight. Consider this simple regular grammar:
It has just two rules: either a string Currently re2c generates the following "branchy" code (
Which is compiled to very similar "branchy" assembly (
Both GCC and Clang with -O2 generate almost identical code. And they cannot reorder the branches with the reads: this kind of optimization is too unsafe for a compiler to perform (at least in my limited understanding of C++ compilers). |
Precisely. This is why I suggest that using the compiler intrinsics is probably the correct path. clang, gcc, etc. support mostly the same set. |
@pmetzger What intrinsics specifically do you mean? I don't see how an intrinsic can restructure the program and squash the four check-and-branch pieces in the example into one. |
You can play games like the one you're proposing with the use of intrinsics. They're gross and have limited portabilty, but the end user could specify whether they wanted the use of intrinsics or not. Intrinsics also let you call SIMD instructions directly from generated C code. gcc and clang both support a wide variety of intrinsics. Here, for example, is some explanation of the vector extensions both support. https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Vector-Extensions.html https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors |
For example, use SIMD intrinsics explicity, or use long long to process 8 bytes together?
The text was updated successfully, but these errors were encountered: