Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] Provide an SSE4.2 implementation of identifier token lexer #68962

Merged
merged 3 commits into from
Oct 19, 2023

Conversation

serge-sans-paille
Copy link
Collaborator

The _mm_cmpistri instruction can be used to quickly parse identifiers.

With this patch activated, clang pre-processes 1.8% faster, and sqlite3.c amalgametion 1.5% faster, based on time measurements and number of executed instructions as measured by valgrind.

The introduction of an extra helper function in the regular case has no impact on performance, see

https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u

The _mm_cmpistri instruction can be used to quickly parse identifiers.

With this patch activated, clang pre-processes <iostream> 1.8% faster, and
sqlite3.c amalgametion 1.5% faster, based on time measurements and
number of executed instructions as measured by valgrind.

The introduction of an extra helper function in the regular case has no
impact on performance, see

    https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 13, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 13, 2023

@llvm/pr-subscribers-clang

Author: None (serge-sans-paille)

Changes

The _mm_cmpistri instruction can be used to quickly parse identifiers.

With this patch activated, clang pre-processes <iostream> 1.8% faster, and sqlite3.c amalgametion 1.5% faster, based on time measurements and number of executed instructions as measured by valgrind.

The introduction of an extra helper function in the regular case has no impact on performance, see

https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&amp;to=12bcb016cde4579ca7b75397762098c03eb4f264&amp;stat=instructions:u

Full diff: https://github.com/llvm/llvm-project/pull/68962.diff

1 Files Affected:

  • (modified) clang/lib/Lex/Lexer.cpp (+38-7)
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index feed1b9ecd71a8d..f2d9eb3a8af4e3d 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -47,6 +47,10 @@
 #include <tuple>
 #include <utility>
 
+#ifdef __SSE4_2__
+#include <nmmintrin.h>
+#endif
+
 using namespace clang;
 
 //===----------------------------------------------------------------------===//
@@ -1847,19 +1851,46 @@ bool Lexer::LexUnicodeIdentifierStart(Token &Result, uint32_t C,
   return true;
 }
 
+static const char *fastParseASCIIIdentifier(const char *CurPtr, const char* BufferEnd) {
+#ifdef __SSE4_2__
+  static constexpr char AsciiIdentifierRange[16] = {
+      '_', '_', 'A', 'Z', 'a', 'z', '0', '9',
+  };
+  constexpr ssize_t BytesPerRegister = 16;
+
+  while (LLVM_LIKELY(BufferEnd - CurPtr >= BytesPerRegister)) {
+    __m128i AsciiIdentifierRangeV = _mm_loadu_si128((const __m128i *)AsciiIdentifierRange);
+
+      __m128i Cv = _mm_loadu_si128((const __m128i *)(CurPtr));
+      int Consumed =
+          _mm_cmpistri(AsciiIdentifierRangeV, Cv,
+                       _SIDD_LEAST_SIGNIFICANT | _SIDD_CMP_RANGES |
+                           _SIDD_UBYTE_OPS | _SIDD_NEGATIVE_POLARITY);
+      CurPtr += Consumed;
+      if (Consumed == BytesPerRegister)
+        continue;
+      return CurPtr;
+  }
+#else
+  (void)BufferEnd;
+#endif
+
+  unsigned char C = *CurPtr;
+  while (isAsciiIdentifierContinue(C))
+    C = *++CurPtr;
+  return CurPtr;
+}
+
 bool Lexer::LexIdentifierContinue(Token &Result, const char *CurPtr) {
   // Match [_A-Za-z0-9]*, we have already matched an identifier start.
+
   while (true) {
-    unsigned char C = *CurPtr;
-    // Fast path.
-    if (isAsciiIdentifierContinue(C)) {
-      ++CurPtr;
-      continue;
-    }
+
+    CurPtr = fastParseASCIIIdentifier(CurPtr, BufferEnd);
 
     unsigned Size;
     // Slow path: handle trigraph, unicode codepoints, UCNs.
-    C = getCharAndSize(CurPtr, Size);
+    unsigned char C = getCharAndSize(CurPtr, Size);
     if (isAsciiIdentifierContinue(C)) {
       CurPtr = ConsumeChar(CurPtr, Size, Result);
       continue;

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty neat.
I'm wondering if this change would be tested by any of our bots?

clang/lib/Lex/Lexer.cpp Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Oct 13, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

clang/lib/Lex/Lexer.cpp Outdated Show resolved Hide resolved
@serge-sans-paille
Copy link
Collaborator Author

@tbaederr / @cor3ntin Do I have your formal approval to merge this?

@cor3ntin
Copy link
Contributor

@serge-sans-paille I'm still trying to understand whether this change will be tested by CI - or at all.
SSE4 is not enabled by default and i don't think our cmake does that either - I did ping @AaronBallman and we are still thinking about it

@serge-sans-paille
Copy link
Collaborator Author

ok. I obviously tested on my setup, but having consistent testing is important. Thanks for the quick answer!

@AaronBallman
Copy link
Collaborator

@serge-sans-paille I'm still trying to understand whether this change will be tested by CI - or at all. SSE4 is not enabled by default and i don't think our cmake does that either - I did ping @AaronBallman and we are still thinking about it

I think we're in good shape for testing because we have at least one bot that's set up to compile for cascade lake, which does support SSE 4.2: https://github.com/llvm/llvm-zorg/blob/4bc450165fa9fcef26adca5a80f4d97b9d6babc3/buildbot/osuosl/master/config/builders.py#L827 so I believe that gets us the test coverage we need. Any disagreement?

Copy link
Contributor

@tbaederr tbaederr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants