Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree-sitter 0.19.3 build-wasm fails on Windows for 8c23e0ec #34

Closed
sogaiu opened this issue Dec 30, 2022 · 11 comments
Closed

tree-sitter 0.19.3 build-wasm fails on Windows for 8c23e0ec #34

sogaiu opened this issue Dec 30, 2022 · 11 comments

Comments

@sogaiu
Copy link
Owner

sogaiu commented Dec 30, 2022

At the time of this writing, tree-sitter-clojure (8c23e0e) uses tree-sitter version 0.19.3 and it appears that with this version emcc does not work on Windows. At least according to a couple of attempts recently, it wasn't working for me.

This appears to have been addressed in tree-sitter/tree-sitter#1044.

I checked for a version of tree-sitter that was released after that PR was merged. The PR was merged in 2021-04. The first version of tree-sitter to have been released after that date appears to be 0.19.5 in 2021-05. At least according to info I currently see here.

With 0.19.5, I had successful results with both the tree-sitter build-wasm and tree-sitter web-ui invocations on Windows.

It might make sense to upgrade to at least tree-sitter 0.19.5. I don't think that will adversely affect others (specifically, things should still work with elisp-tree-sitter).


Some peripheral remarks follow.

I tested with both emsdk 2.0.11 and 2.0.24 successfully so that might mean that the emsdk version info mentioned here is a bit off. That is, it may be that with tree-sitter 0.19.5, one can use emsdk 2.0.11, and that emsdk 2.0.24 is not required. Though I suppose it might depend on the operating system being used...

That's of some interest because at some point emscripten changed the way their emsdk script worked. It used to be that one could quickly switch between emscripten versions using invocations such as:

emsdk activate 2.0.11

or:

emsdk activate 2.0.24

It was quick after the respective versions were installed because the fetched files had been retained and no additional downloads were necessary. Unfortunately, they seem to have changed things so that once one activates a different version, at least some of the files are removed so switching between versions has involved additional waiting and downloading.

Now one gets an error after an activate invocation. Working around this one would typically invoke emsdk install <version>, wait for downloading to finish, and then invoke emsdk activate <version>.

@dannyfreeman
Copy link
Collaborator

I think updating to the latest 19.x is pretty low hanging fruit at this point and we could pretty safely do that soon.

@sogaiu
Copy link
Owner Author

sogaiu commented Jan 28, 2023

If we go ahead with just stating which version of the tree-sitter cli we use (and the cli invocation) to produce the generated output (e.g. src/parser.c and friends) and not place such information in package.json (as mentioned elsewhere), that may address this issue in some sense.

@sogaiu sogaiu changed the title tree-sitter build-wasm fails on Windows for 8c23e0ec tree-sitter 0.19.3 build-wasm fails on Windows for 8c23e0ec Feb 5, 2023
@sogaiu
Copy link
Owner Author

sogaiu commented Feb 5, 2023

I compared the generated sources in src we have now to what I get via tree-sitter 0.19.5.

src/tree_sitter/parser.h and src/parser.c differ.

IIUC, tree-sitter/tree-sitter@1badd13 and tree-sitter/tree-sitter@cc519b3 are responsible for some of the changes.

There is also some factoring out of some bits from ts_lex, e.g. in the newer generated source there are functions like sym_comment_character_set_2 which I don't think existed before.

$ diff -u src.checkout/tree_sitter/parser.h src.0.19.5/tree_sitter/parser.h 
--- src.checkout/tree_sitter/parser.h	2023-02-06 07:30:09.960106999 +0900
+++ src.0.19.5/tree_sitter/parser.h	2023-02-06 07:44:30.077924918 +0900
@@ -102,8 +102,8 @@
   const uint16_t *small_parse_table;
   const uint32_t *small_parse_table_map;
   const TSParseActionEntry *parse_actions;
-  const char **symbol_names;
-  const char **field_names;
+  const char * const *symbol_names;
+  const char * const *field_names;
   const TSFieldMapSlice *field_map_slices;
   const TSFieldMapEntry *field_map_entries;
   const TSSymbolMetadata *symbol_metadata;
$ diff -u src.checkout/parser.c src.0.19.5/parser.c
--- src.checkout/parser.c	2023-02-06 07:30:09.956107060 +0900
+++ src.0.19.5/parser.c	2023-02-06 07:44:30.077924918 +0900
@@ -91,7 +91,7 @@
   aux_sym_read_cond_lit_repeat1 = 72,
 };
 
-static const char *ts_symbol_names[] = {
+static const char * const ts_symbol_names[] = {
   [ts_builtin_sym_end] = "end",
   [sym__ws] = "_ws",
   [sym_comment] = "comment",
@@ -167,7 +167,7 @@
   [aux_sym_read_cond_lit_repeat1] = "read_cond_lit_repeat1",
 };
 
-static TSSymbol ts_symbol_map[] = {
+static const TSSymbol ts_symbol_map[] = {
   [ts_builtin_sym_end] = ts_builtin_sym_end,
   [sym__ws] = sym__ws,
   [sym_comment] = sym_comment,
@@ -552,7 +552,7 @@
   field_value = 11,
 };
 
-static const char *ts_field_names[] = {
+static const char * const ts_field_names[] = {
   [0] = NULL,
   [field_close] = "close",
   [field_delimiter] = "delimiter",
@@ -808,7 +808,7 @@
     {field_value, 5},
 };
 
-static TSSymbol ts_alias_sequences[PRODUCTION_ID_COUNT][MAX_ALIAS_SEQUENCE_LENGTH] = {
+static const TSSymbol ts_alias_sequences[PRODUCTION_ID_COUNT][MAX_ALIAS_SEQUENCE_LENGTH] = {
   [0] = {0},
   [1] = {
     [0] = aux_sym__sym_qualified_token2,
@@ -821,7 +821,7 @@
   },
 };
 
-static uint16_t ts_non_terminal_alias_map[] = {
+static const uint16_t ts_non_terminal_alias_map[] = {
   0,
 };
 
@@ -865,6 +865,52 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool sym_comment_character_set_2(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < '"'
+        ? (c < 28
+          ? (c >= '\t' && c <= '\r')
+          : c <= ' ')
+        : (c <= '"' || (c >= '(' && c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? (c < ';'
+          ? c == '/'
+          : c <= ';')
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8200
+      ? (c < 5760
+        ? (c < '}'
+          ? c == '{'
+          : c <= '~')
+        : (c <= 5760 || (c >= 8192 && c <= 8198)))
+      : (c <= 8202 || (c < 8287
+        ? (c >= 8232 && c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool sym_comment_character_set_3(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < '"'
+        ? (c < 28
+          ? (c >= '\t' && c <= '\r')
+          : c <= ' ')
+        : (c <= '"' || (c >= '(' && c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? c == ';'
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8200
+      ? (c < 5760
+        ? (c < '}'
+          ? c == '{'
+          : c <= '~')
+        : (c <= 5760 || (c >= 8192 && c <= 8198)))
+      : (c <= 8202 || (c < 8287
+        ? (c >= 8232 && c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static inline bool aux_sym__kwd_leading_slash_token1_character_set_1(int32_t c) {
   return (c < '`'
     ? (c < '('
@@ -889,6 +935,26 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_2(int32_t c) {
+  return (c < '{'
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= '"' || (c >= '(' && c <= ')')))
+      : (c <= ',' || (c < '?'
+        ? c == ';'
+        : (c <= '@' || (c >= '[' && c <= '`')))))
+    : (c <= '{' || (c < 8200
+      ? (c < 5760
+        ? (c >= '}' && c <= '~')
+        : (c <= 5760 || (c >= 8192 && c <= 8198)))
+      : (c <= 8202 || (c < 8287
+        ? (c >= 8232 && c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static inline bool aux_sym__kwd_leading_slash_token1_character_set_3(int32_t c) {
   return (c < '{'
     ? (c < ','
@@ -931,6 +997,28 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_5(int32_t c) {
+  return (c < '@'
+    ? (c < '('
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || c == '"'))
+      : (c <= ')' || (c < '0'
+        ? c == ','
+        : (c <= '9' || c == ';'))))
+    : (c <= '^' || (c < 8200
+      ? (c < 5760
+        ? (c < '}'
+          ? (c >= '`' && c <= '{')
+          : c <= '~')
+        : (c <= 5760 || (c >= 8192 && c <= 8198)))
+      : (c <= 8202 || (c < 8287
+        ? (c >= 8232 && c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static inline bool aux_sym__kwd_leading_slash_token1_character_set_6(int32_t c) {
   return (c < '['
     ? (c < '('
@@ -957,6 +1045,84 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_7(int32_t c) {
+  return (c < '['
+    ? (c < '('
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || c == '"'))
+      : (c <= ')' || (c < ';'
+        ? (c < '0'
+          ? c == ','
+          : c <= '9')
+        : (c <= ';' || (c >= '@' && c <= 'F')))))
+    : (c <= '^' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? (c >= '`' && c <= 'f')
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_8(int32_t c) {
+  return (c < '['
+    ? (c < '('
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || c == '"'))
+      : (c <= ')' || (c < ';'
+        ? (c < '/'
+          ? (c >= '+' && c <= '-')
+          : c <= '9')
+        : (c <= ';' || c == '@'))))
+    : (c <= '^' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == '`'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_9(int32_t c) {
+  return (c < '['
+    ? (c < '('
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || c == '"'))
+      : (c <= ')' || (c < ';'
+        ? (c < '0'
+          ? (c >= '+' && c <= '-')
+          : c <= '9')
+        : (c <= ';' || c == '@'))))
+    : (c <= '^' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == '`'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static inline bool aux_sym__kwd_leading_slash_token1_character_set_10(int32_t c) {
   return (c < '['
     ? (c < '('
@@ -1009,6 +1175,62 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_12(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || (c < '('
+          ? c == '"'
+          : c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? (c < ';'
+          ? c == '/'
+          : c <= ';')
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == 'e'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_13(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || (c < '('
+          ? c == '"'
+          : c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? (c < ';'
+          ? c == '/'
+          : c <= ';')
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == 'i'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static inline bool aux_sym__kwd_leading_slash_token1_character_set_14(int32_t c) {
   return (c < '`'
     ? (c < ','
@@ -1037,6 +1259,116 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_15(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || (c < '('
+          ? c == '"'
+          : c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? (c < ';'
+          ? c == '/'
+          : c <= ';')
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == 'r'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_16(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || (c < '('
+          ? c == '"'
+          : c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? (c < ';'
+          ? c == '/'
+          : c <= ';')
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == 's'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_17(int32_t c) {
+  return (c < '`'
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || (c < '('
+          ? c == '"'
+          : c <= ')')))
+      : (c <= ',' || (c < '@'
+        ? (c < ';'
+          ? c == '/'
+          : c <= ';')
+        : (c <= '@' || (c >= '[' && c <= '^')))))
+    : (c <= '`' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? c == 'u'
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__kwd_leading_slash_token1_character_set_18(int32_t c) {
+  return (c < '['
+    ? (c < '('
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= ' ' || c == '"'))
+      : (c <= ')' || (c < ';'
+        ? (c < '/'
+          ? c == ','
+          : c <= '/')
+        : (c <= ';' || c == '@'))))
+    : (c <= '^' || (c < 8192
+      ? (c < '}'
+        ? (c < '{'
+          ? (c >= '`' && c <= 'a')
+          : c <= '{')
+        : (c <= '~' || c == 5760))
+      : (c <= 8198 || (c < 8287
+        ? (c < 8232
+          ? (c >= 8200 && c <= 8202)
+          : c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static inline bool aux_sym__kwd_qualified_token1_character_set_1(int32_t c) {
   return (c < '`'
     ? (c < '('
@@ -1061,6 +1393,46 @@
         : (c <= 8287 || c == 12288))))));
 }
 
+static inline bool aux_sym__kwd_qualified_token1_character_set_2(int32_t c) {
+  return (c < '['
+    ? (c < ','
+      ? (c < 28
+        ? (c < '\t'
+          ? c == 0
+          : c <= '\r')
+        : (c <= '"' || (c >= '(' && c <= ')')))
+      : (c <= ',' || (c < ';'
+        ? c == '/'
+        : (c <= ';' || c == '@'))))
+    : (c <= '`' || (c < 8200
+      ? (c < 5760
+        ? (c < '}'
+          ? c == '{'
+          : c <= '~')
+        : (c <= 5760 || (c >= 8192 && c <= 8198)))
+      : (c <= 8202 || (c < 8287
+        ? (c >= 8232 && c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
+static inline bool aux_sym__sym_qualified_token1_character_set_1(int32_t c) {
+  return (c < '\\'
+    ? (c < '"'
+      ? (c < '\t'
+        ? c == 0
+        : (c <= '\r' || (c >= 28 && c <= ' ')))
+      : (c <= '"' || (c < ','
+        ? c == ')'
+        : (c <= ',' || (c >= '0' && c <= ';')))))
+    : (c <= ']' || (c < 8200
+      ? (c < 5760
+        ? c == '}'
+        : (c <= 5760 || (c >= 8192 && c <= 8198)))
+      : (c <= 8202 || (c < 8287
+        ? (c >= 8232 && c <= 8233)
+        : (c <= 8287 || c == 12288))))));
+}
+
 static bool ts_lex(TSLexer *lexer, TSStateId state) {
   START_LEXER();
   eof = lexer->eof(lexer);
@@ -1152,23 +1524,7 @@
       if (lookahead == '`') ADVANCE(163);
       if (lookahead == '{') ADVANCE(141);
       if (lookahead == '~') ADVANCE(165);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          (lookahead < '0' || ';' < lookahead) &&
-          lookahead != '\\' &&
-          lookahead != ']' &&
-          lookahead != '}' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(135);
+      if (!aux_sym__sym_qualified_token1_character_set_1(lookahead)) ADVANCE(135);
       END_STATE();
     case 5:
       if (lookahead == '#') ADVANCE(148);
@@ -1428,52 +1784,13 @@
       ACCEPT_TOKEN(sym_comment);
       if (lookahead == '\n') ADVANCE(48);
       if (!sym_comment_character_set_1(lookahead)) ADVANCE(49);
-      if (('\t' <= lookahead && lookahead <= '\r') ||
-          (28 <= lookahead && lookahead <= ' ') ||
-          lookahead == '"' ||
-          lookahead == '(' ||
-          lookahead == ')' ||
-          lookahead == ',' ||
-          lookahead == '/' ||
-          lookahead == ';' ||
-          lookahead == '@' ||
-          ('[' <= lookahead && lookahead <= '^') ||
-          lookahead == '`' ||
-          lookahead == '{' ||
-          lookahead == '}' ||
-          lookahead == '~' ||
-          lookahead == 5760 ||
-          (8192 <= lookahead && lookahead <= 8198) ||
-          (8200 <= lookahead && lookahead <= 8202) ||
-          lookahead == 8232 ||
-          lookahead == 8233 ||
-          lookahead == 8287 ||
-          lookahead == 12288) ADVANCE(51);
+      if (sym_comment_character_set_2(lookahead)) ADVANCE(51);
       END_STATE();
     case 50:
       ACCEPT_TOKEN(sym_comment);
       if (lookahead == '\n') ADVANCE(48);
       if (!aux_sym__kwd_leading_slash_token1_character_set_1(lookahead)) ADVANCE(50);
-      if (('\t' <= lookahead && lookahead <= '\r') ||
-          (28 <= lookahead && lookahead <= ' ') ||
-          lookahead == '"' ||
-          lookahead == '(' ||
-          lookahead == ')' ||
-          lookahead == ',' ||
-          lookahead == ';' ||
-          lookahead == '@' ||
-          ('[' <= lookahead && lookahead <= '^') ||
-          lookahead == '`' ||
-          lookahead == '{' ||
-          lookahead == '}' ||
-          lookahead == '~' ||
-          lookahead == 5760 ||
-          (8192 <= lookahead && lookahead <= 8198) ||
-          (8200 <= lookahead && lookahead <= 8202) ||
-          lookahead == 8232 ||
-          lookahead == 8233 ||
-          lookahead == 8287 ||
-          lookahead == 12288) ADVANCE(51);
+      if (sym_comment_character_set_3(lookahead)) ADVANCE(51);
       END_STATE();
     case 51:
       ACCEPT_TOKEN(sym_comment);
@@ -1722,26 +2039,7 @@
     case 78:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
       if (lookahead == '!') ADVANCE(50);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || '"' < lookahead) &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != ';' &&
-          lookahead != '?' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '`' < lookahead) &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(100);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_2(lookahead)) ADVANCE(100);
       if (lookahead == '?') ADVANCE(151);
       if (lookahead == '^') ADVANCE(138);
       if (lookahead == '_') ADVANCE(54);
@@ -1769,26 +2067,7 @@
       END_STATE();
     case 82:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          (lookahead < '0' || '9' < lookahead) &&
-          lookahead != ';' &&
-          (lookahead < '@' || '^' < lookahead) &&
-          (lookahead < '`' || '{' < lookahead) &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(100);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_5(lookahead)) ADVANCE(100);
       if (('0' <= lookahead && lookahead <= '9') ||
           ('A' <= lookahead && lookahead <= 'Z') ||
           ('a' <= lookahead && lookahead <= 'z')) ADVANCE(75);
@@ -1803,56 +2082,14 @@
       END_STATE();
     case 84:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          (lookahead < '0' || '9' < lookahead) &&
-          lookahead != ';' &&
-          (lookahead < '@' || 'F' < lookahead) &&
-          (lookahead < '[' || '^' < lookahead) &&
-          (lookahead < '`' || 'f' < lookahead) &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(100);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_7(lookahead)) ADVANCE(100);
       if (('0' <= lookahead && lookahead <= '9') ||
           ('A' <= lookahead && lookahead <= 'F') ||
           ('a' <= lookahead && lookahead <= 'f')) ADVANCE(73);
       END_STATE();
     case 85:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          (lookahead < '+' || '-' < lookahead) &&
-          (lookahead < '/' || '9' < lookahead) &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_8(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == '+' ||
           lookahead == '-') ADVANCE(88);
@@ -1860,28 +2097,7 @@
       END_STATE();
     case 86:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          (lookahead < '+' || '-' < lookahead) &&
-          (lookahead < '0' || '9' < lookahead) &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(100);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_9(lookahead)) ADVANCE(100);
       if (lookahead == '+' ||
           lookahead == '-') ADVANCE(90);
       if (('0' <= lookahead && lookahead <= '9')) ADVANCE(72);
@@ -1911,57 +2127,13 @@
       END_STATE();
     case 91:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != 'e' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_12(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == 'e') ADVANCE(99);
       END_STATE();
     case 92:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != 'i' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_13(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == 'i') ADVANCE(94);
       END_STATE();
@@ -1979,113 +2151,25 @@
       END_STATE();
     case 95:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != 'r' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_15(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == 'r') ADVANCE(97);
       END_STATE();
     case 96:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != 's' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_16(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == 's') ADVANCE(91);
       END_STATE();
     case 97:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != 'u' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_17(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == 'u') ADVANCE(91);
       END_STATE();
     case 98:
       ACCEPT_TOKEN(aux_sym__kwd_leading_slash_token1);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || ' ' < lookahead) &&
-          lookahead != '"' &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '^' < lookahead) &&
-          lookahead != '`' &&
-          lookahead != 'a' &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(99);
+      if (!aux_sym__kwd_leading_slash_token1_character_set_18(lookahead)) ADVANCE(99);
       if (lookahead == '/') ADVANCE(100);
       if (lookahead == 'a') ADVANCE(93);
       END_STATE();
@@ -2101,26 +2185,7 @@
     case 101:
       ACCEPT_TOKEN(aux_sym__kwd_qualified_token1);
       if (lookahead == '!') ADVANCE(49);
-      if (lookahead != 0 &&
-          (lookahead < '\t' || '\r' < lookahead) &&
-          (lookahead < 28 || '"' < lookahead) &&
-          lookahead != '(' &&
-          lookahead != ')' &&
-          lookahead != ',' &&
-          lookahead != '/' &&
-          lookahead != ';' &&
-          lookahead != '@' &&
-          (lookahead < '[' || '`' < lookahead) &&
-          lookahead != '{' &&
-          lookahead != '}' &&
-          lookahead != '~' &&
-          lookahead != 5760 &&
-          (lookahead < 8192 || 8198 < lookahead) &&
-          (lookahead < 8200 || 8202 < lookahead) &&
-          lookahead != 8232 &&
-          lookahead != 8233 &&
-          lookahead != 8287 &&
-          lookahead != 12288) ADVANCE(102);
+      if (!aux_sym__kwd_qualified_token1_character_set_2(lookahead)) ADVANCE(102);
       if (lookahead == '_') ADVANCE(53);
       END_STATE();
     case 102:
@@ -2417,7 +2482,7 @@
   }
 }
 
-static TSLexMode ts_lex_modes[STATE_COUNT] = {
+static const TSLexMode ts_lex_modes[STATE_COUNT] = {
   [0] = {.lex_state = 0},
   [1] = {.lex_state = 45},
   [2] = {.lex_state = 45},
@@ -2954,7 +3019,7 @@
   [533] = {.lex_state = 42},
 };
 
-static uint16_t ts_parse_table[LARGE_STATE_COUNT][SYMBOL_COUNT] = {
+static const uint16_t ts_parse_table[LARGE_STATE_COUNT][SYMBOL_COUNT] = {
   [0] = {
     [ts_builtin_sym_end] = ACTIONS(1),
     [sym__ws] = ACTIONS(1),
@@ -17127,7 +17192,7 @@
   },
 };
 
-static uint16_t ts_small_parse_table[] = {
+static const uint16_t ts_small_parse_table[] = {
   [0] = 3,
     ACTIONS(856), 1,
       aux_sym__kwd_leading_slash_token1,
@@ -25721,7 +25786,7 @@
       aux_sym__sym_qualified_token2,
 };
 
-static uint32_t ts_small_parse_table_map[] = {
+static const uint32_t ts_small_parse_table_map[] = {
   [SMALL_STATE(225)] = 0,
   [SMALL_STATE(226)] = 40,
   [SMALL_STATE(227)] = 77,
@@ -26033,7 +26098,7 @@
   [SMALL_STATE(533)] = 10185,
 };
 
-static TSParseActionEntry ts_parse_actions[] = {
+static const TSParseActionEntry ts_parse_actions[] = {
   [0] = {.entry = {.count = 0, .reusable = false}},
   [1] = {.entry = {.count = 1, .reusable = false}}, RECOVER(),
   [3] = {.entry = {.count = 1, .reusable = true}}, REDUCE(sym_source, 0),
@@ -26796,7 +26861,7 @@
 #endif
 
 extern const TSLanguage *tree_sitter_clojure(void) {
-  static TSLanguage language = {
+  static const TSLanguage language = {
     .version = LANGUAGE_VERSION,
     .symbol_count = SYMBOL_COUNT,
     .alias_count = ALIAS_COUNT,
@@ -26807,18 +26872,18 @@
     .production_id_count = PRODUCTION_ID_COUNT,
     .field_count = FIELD_COUNT,
     .max_alias_sequence_length = MAX_ALIAS_SEQUENCE_LENGTH,
-    .parse_table = (const uint16_t *)ts_parse_table,
-    .small_parse_table = (const uint16_t *)ts_small_parse_table,
-    .small_parse_table_map = (const uint32_t *)ts_small_parse_table_map,
+    .parse_table = &ts_parse_table[0][0],
+    .small_parse_table = ts_small_parse_table,
+    .small_parse_table_map = ts_small_parse_table_map,
     .parse_actions = ts_parse_actions,
     .symbol_names = ts_symbol_names,
     .field_names = ts_field_names,
-    .field_map_slices = (const TSFieldMapSlice *)ts_field_map_slices,
-    .field_map_entries = (const TSFieldMapEntry *)ts_field_map_entries,
+    .field_map_slices = ts_field_map_slices,
+    .field_map_entries = ts_field_map_entries,
     .symbol_metadata = ts_symbol_metadata,
     .public_symbol_map = ts_symbol_map,
     .alias_map = ts_non_terminal_alias_map,
-    .alias_sequences = (const TSSymbol *)ts_alias_sequences,
+    .alias_sequences = &ts_alias_sequences[0][0],
     .lex_modes = ts_lex_modes,
     .lex_fn = ts_lex,
   };

@sogaiu
Copy link
Owner Author

sogaiu commented Feb 5, 2023

It appears that some projects that use tree-sitter-clojure may end up doing their own generating of files under src.

The default ABI version used for those files changed to 14 in tree-sitter 0.20.7 from 13.

I suppose I could check on what other repositories have in their src/parser.c files...

@sogaiu
Copy link
Owner Author

sogaiu commented Feb 5, 2023

Of the 66 different repositories I have locally that have src/parser.c, the results are:

  • 9 - 2
  • 10 - 1
  • 11 - 1
  • 12 - 0
  • 13 - 30
  • 14 - 32

So roughly, something like half are at 13 and half are at 14.

(I did update the local copies before checking.)

Update: I have a page elsewhere with a larger sample size (currently 184). For that data, a bit over 50% use 13 and a bit under 40% use 14.


9 tree-sitter-email
9 tree-sitter-todo
10 tree-sitter-eno
11 tree-sitter-agda
13 tree-sitter-clojure
13 tree-sitter-commonlisp
13 tree-sitter-css
13 tree-sitter-dot
13 tree-sitter-elisp
13 tree-sitter-elm
13 tree-sitter-fennel
13 tree-sitter-go-mod
13 tree-sitter-hack
13 tree-sitter-html
13 tree-sitter-janet-simple
13 tree-sitter-json
13 tree-sitter-kotlin
13 tree-sitter-lua
13 tree-sitter-make
13 tree-sitter-markdown
13 tree-sitter-nix
13 tree-sitter-objc
13 tree-sitter-perl
13 tree-sitter-proto
13 tree-sitter-r
13 tree-sitter-sourcepawn
13 tree-sitter-sparql
13 tree-sitter-svelte
13 tree-sitter-toml
13 tree-sitter-turtle
13 tree-sitter-vhdl
13 tree-sitter-vue
13 tree-sitter-wgsl
13 tree-sitter-yaml
14 tree-sitter-bash
14 tree-sitter-c
14 tree-sitter-capnp
14 tree-sitter-cpp
14 tree-sitter-c-sharp
14 tree-sitter-cuda
14 tree-sitter-d
14 tree-sitter-dockerfile
14 tree-sitter-elixir
14 tree-sitter-embedded-template
14 tree-sitter-erlang
14 tree-sitter-glsl
14 tree-sitter-go
14 tree-sitter-haskell
14 tree-sitter-hcl
14 tree-sitter-java
14 tree-sitter-javascript
14 tree-sitter-julia
14 tree-sitter-kdl
14 tree-sitter-meson
14 tree-sitter-nu
14 tree-sitter-nu.LhKipp
14 tree-sitter-org
14 tree-sitter-php
14 tree-sitter-python
14 tree-sitter-racket
14 tree-sitter-ruby
14 tree-sitter-rust
14 tree-sitter-scala
14 tree-sitter-smali
14 tree-sitter-twig
14 tree-sitter-verilog

@sogaiu
Copy link
Owner Author

sogaiu commented Feb 27, 2023

In 12fcfb9 (on the dev branch) I started using tree-sitter cli version 0.20.7. That should address this issue.

However, as has been the case from before, the precise Emscripten version matters and I'm not sure what version works for 0.20.7. There are commits post-0.20.7 where it has become possible to use Emscripten version 3.1.29, but these commits are not in any released version of tree-sitter.

For reference, any commit at or later than this should work.

@dannyfreeman
Copy link
Collaborator

This change works fine for me so far. I'm assuming for commit 3daa97f that abi version 13 was used?

@sogaiu
Copy link
Owner Author

sogaiu commented Mar 1, 2023

Yes.

On a side note I have a branch that tries to record this in code.

Still in flux, but here's a bit for a taste: https://github.com/sogaiu/tree-sitter-clojure/blob/build-local-ts-cli/conf/conf.clj#L14-L23

@sogaiu sogaiu added the candidate-on-dev The dev branch contains code to address label Mar 1, 2023
@sogaiu
Copy link
Owner Author

sogaiu commented Mar 6, 2023

I'm not sure what version works for 0.20.7.

I looked into this a bit and I believe Emscripten 2.0.24 should work.

I've written up a summary on this topic here.

@sogaiu
Copy link
Owner Author

sogaiu commented Mar 12, 2023

The commit where I started using tree-sitter generate --abi 13 --no-bindings (with tree-sitter 0.20.7) can now be seen at 12fcfb9.

@sogaiu
Copy link
Owner Author

sogaiu commented May 8, 2023

As mentioned above, addressed in 12fcfb9.

@sogaiu sogaiu closed this as completed May 8, 2023
@sogaiu sogaiu removed the candidate-on-dev The dev branch contains code to address label May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants