Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin treesitter #152

Merged
merged 6 commits into from
Feb 20, 2025
Merged

Builtin treesitter #152

merged 6 commits into from
Feb 20, 2025

Conversation

countvajhula
Copy link
Collaborator

@countvajhula countvajhula commented Feb 19, 2025

Summary of Changes

Incorporate @polaris64 's updates to migrate to Emacs's built-in treesitter.

Current testing results

Legend:

[x] = OK
[-] = N/A
[o] = needs design
[ ] = needs fixing

Results:

[x] ("1" digit-argument)
[x] ("2" digit-argument)
[x] ("3" digit-argument)
[x] ("4" digit-argument)
[x] ("5" digit-argument)
[x] ("6" digit-argument)
[x] ("7" digit-argument)
[x] ("8" digit-argument)
[x] ("9" digit-argument)
[x] ("h" symex-go-backward)
[x] ("j" symex-go-down)
[x] ("k" symex-go-up)
[x] ("l" symex-go-forward)
[x] ("gh" backward-char)
[x] ("gj" symex-next-visual-line)
[x] ("gk" symex-previous-visual-line)
[x] ("gl" forward-char)
[x] ("(" symex-create-round)
[x] ("[" symex-create-square)
[ ] (")" symex-wrap-round)
[ ] ("]" symex-wrap-square)
[-] ("C-'" symex-cycle-quote)
[-] ("C-," symex-cycle-unquote)
[-] ("`" symex-add-quoting-level)
[-] ("C-`" symex-remove-quoting-level)
[x] ("f" symex-traverse-forward)
[x] ("b" symex-traverse-backward)
[x] ("C-f" symex-traverse-forward-more)
[x] ("C-b" symex-traverse-backward-more)
[x] ("F" symex-traverse-forward-skip)
[x] ("B" symex-traverse-backward-skip)
[x] ("{" symex-leap-backward)
[x] ("}" symex-leap-forward)
[x] ("M-{" symex-soar-backward)
[x] ("M-}" symex-soar-forward)
[o] ("C-k" symex-climb-branch)
[o] ("C-j" symex-descend-branch)
[x] ("y" symex-yank)
[x] ("Y" symex-yank-remaining)
[ ] ("p" symex-paste-after)
[ ] ("P" symex-paste-before)
[ ] ("x" symex-delete)
[ ] ("X" symex-delete-backwards)
[x] ("c" symex-change)
[ ] ("D" symex-delete-remaining)
[x] ("C" symex-change-remaining)
[ ] ("C--" symex-clear)
[ ] ("s" symex-replace)
[ ] ("-" symex-splice)
[ ] ("S" symex-change-delimiter)
[ ] ("H" symex-shift-backward)
[ ] ("L" symex-shift-forward)
[ ] ("M-H" symex-shift-backward-most)
[ ] ("M-L" symex-shift-forward-most)
[ ] ("K" symex-raise)
[o] ("C-S-j" symex-emit-backward)
[o] ("C-(" symex-capture-backward)
[o] ("C-S-h" symex-capture-backward)
[o] ("C-{" symex-emit-backward)
[o] ("C-S-l" symex-capture-forward)
[o] ("C-}" symex-emit-forward)
[o] ("C-S-k" symex-emit-forward)
[o] ("C-)" symex-capture-forward)
[o] ("z" symex-swallow)
[o] ("Z" symex-swallow-tail)
[o] ("e" symex-evaluate)
[o] ("s-;" symex-evaluate)
[o] ("E" symex-evaluate-remaining)
[-] ("C-M-e" symex-evaluate-pretty)
[o] ("d" symex-evaluate-definition)
[-] ("M-e" symex-eval-recursive)
[-] ("T" symex-evaluate-thunk)
[o] ("t" symex-switch-to-scratch-buffer)
[x] ("M" symex-switch-to-messages-buffer)
[o] ("r" symex-repl)
[o] ("R" symex-run)
[-] ("|" symex-split)
[-] ("&" symex-join)
[ ] ("o" symex-open-line-after)
[ ] ("O" symex-open-line-before)
[x] (">" symex-insert-newline)
[x] ("<" symex-join-lines-backwards)
[x] ("C->" symex-append-newline)
[x] ("C-<" symex-join-lines)
[x] ("C-S-o" symex-append-newline)
[x] ("J" symex-join-lines)
[x] ("M-J" symex-collapse)
[x] ("M-<" symex-collapse)
[x] ("M->" symex-unfurl)
[x] ("C-M-<" symex-collapse-remaining)
[x] ("C-M->" symex-unfurl-remaining)
[x] ("0" symex-goto-first)
[x] ("M-h" symex-goto-first)
[x] ("$" symex-goto-last)
[x] ("M-l" symex-goto-last)
[x] ("M-j" symex-goto-lowest)
[x] ("M-k" symex-goto-highest)
[ ] ("=" symex-tidy)
[ ] ("<tab>" symex-tidy)
[ ] ("C-=" symex-tidy-remaining)
[ ] ("C-<tab>" symex-tidy-remaining)
[ ] ("M-=" symex-tidy-proper)
[ ] ("M-<tab>" symex-tidy-proper)
[o] ("A" symex-append-after)
[o] ("a" symex-insert-at-end)
[o] ("i" symex-insert-at-beginning)
[o] ("I" symex-insert-before)
[o] ("w" symex-wrap)
[o] ("W" symex-wrap-and-append)
[x] (";" symex-comment)
[x] ("M-;" symex-comment-remaining)
[-] ("C-;" symex-eval-print)
[o] ("C-?" symex-describe)
[x] ("<return>" symex-enter-lower)
[x] ("<escape>" symex-escape-higher))

Other todos

  • change usage of tsc-changed-ranges, use treesit-parser-add-notifier?

Public Domain Dedication

  • In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

@countvajhula countvajhula mentioned this pull request Feb 19, 2025
18 tasks
@countvajhula
Copy link
Collaborator Author

Hey @polaris64 , I'm just taking a look at this now. First of all, thank you so much, and perfect timing. This is definitely a huge head start and it looks like we're quite close! I think we're ready to start testing with this new branch. Btw, from a quick look, I see that you're abstracting the tree sitter library and supporting both tsc as well as the built-in library. Some users have complained that depending on tsc / elisp-tree-sitter is a non-starter for them, so I'm wondering if we should just drop tsc and support only Emacs 29+.

Otherwise, if we manage to get to #26 as part of this release, then tree-sitter could be decoupled as a separate package and people could install symex-ts, or not, at their discretion. That should allow the rest of Symex to depend on an old version of Emacs (e.g. Emacs 27) while symex-ts specifically depends on Emacs 29.

As I'm on Emacs 29 now, I'm not sure it would be easy for me to test the tsc variant so I'm leaning towards dropping it regardless. I'm not too familiar with the tree-sitter ecosystem though, so wdyt?

@polaris64
Copy link
Collaborator

polaris64 commented Feb 19, 2025 via email

@countvajhula countvajhula force-pushed the builtin-treesitter branch 2 times, most recently from 11e6442 to defe54b Compare February 19, 2025 19:54
@countvajhula
Copy link
Collaborator Author

Turns out I had unwittingly rebased/cherry-picked your commits in the reverse order! So your initial commit was added last and it included all of the interpolation between built-in and tree-sitter packages. I did the rebase over and it looks good now 🙂

Yes I fully agree. The way it works now with my latest changes is that
if you use Symex on a version of Emacs prior to 29 then you'll get all
of Symex's functionality except for Tree Sitter. If you use it on 29+
(with Tree Sitter support enabled) then you'll get that too. I think
that's fine and, as you say, it's not worth supporting an old library
and adding extra dependencies.

Perfect.

I'll continue testing 👍

@countvajhula
Copy link
Collaborator Author

OK, I did one full pass and tested every feature that's bound to a key! A quick summary is, the motions generally work well, including complex ones like traversals, leap branch, etc. But the other features (deletion etc.) don't always work.

For the things that don't work, they fall into 3 broad classes:

  1. Their behavior is undefined for non-Lisp languages

Many features that make sense for Lisp languages just don't make sense for non-Lisp languages, like e (evaluate), | (split) and & (join). I think it would make sense to just have these be no-ops. But some of them may perhaps benefit from a fresh design, i.e., the next class:

  1. Features requiring fresh design for non-Lisp

Some features aren't quite undefined, but may have a different intuitive meaning in non-Lisp languages, and it would take some design effort to understand what the meaning should be.

Examples:

i (insert), a (append), I (insert before), A (insert after)

For python, if we've selected a def defining a function, we would like i to insert at the first position inside the body of the function. a, at the last position inside the body. I should insert before the def, and A should insert after the body of the function, as a peer of the def, i.e., at the same indent level as the def.

There are other features like z (swallow), K (raise), and more, that are like this, requiring a fresh design for non-Lisp languages.

For this class of issue, it would probably be quite useful to design the analogous behavior for Treesitter. But I'm also happy for us to work on this class of feature on the main branch after the release, unless there are any low-hanging fruit here.

  1. Problems with mutating operations (transformations)

This is the main class of issue to fix for the release. Some transformations (i.e., which mutate the buffer) don't work, and many of them give the same error message, about treesit-node-outdated. My feeling is that there is a single problem underlying these failing features, which, if we fix, would fix it for all the transformations.

This may be related to the "changed ranges" or "parser notifier" issue you mentioned @polaris64. What's interesting though is that some transformations do work, so it will be interesting to see what's different about them.

I'll share the detailed results in a separate comment.

@countvajhula
Copy link
Collaborator Author

Here are the feature-by-feature testing results!

I just went through the lithium mode definition (which has the mode keybindings) and tested each feature one by one.

Legend:

  • "OK" means it works fine.
  • "ERROR" means it raises an error, and I include the error message.
  • "N/A" means the feature is undefined for non-Lisp.
  • "Requires design" means the feature requires a fresh design for non-Lisp.

There are also some features we should probably remove from Symex, like "go to messages buffer", leaving that to user .emacs.d config (and optionally mentioning them in the README).

("1" digit-argument)
("2" digit-argument)
("3" digit-argument)
("4" digit-argument)
("5" digit-argument)
("6" digit-argument)
("7" digit-argument)
("8" digit-argument)
("9" digit-argument)

(i.e., count arguments)

OK for motions
ERROR for some transformations like x:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

Whatever this issue is, it comes up a lot in the other features that mutate the buffer, and seems to be a core technical issue to resolve.

("h" symex-go-backward)
("j" symex-go-down)
("k" symex-go-up)
("l" symex-go-forward)

OK

("gh" backward-char)
("gj" symex-next-visual-line)
("gk" symex-previous-visual-line)
("gl" forward-char)

OK

They don't update the selection overlay after the motion, maybe? But they work fine.

("(" symex-create-round)
("[" symex-create-square)

OK

(")" symex-wrap-round)
("]" symex-wrap-square)

ERROR:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

("C-'" symex-cycle-quote)
("C-," symex-cycle-unquote)
("" symex-add-quoting-level) ("C-" symex-remove-quoting-level)

N/A

("f" symex-traverse-forward)
("b" symex-traverse-backward)
("C-f" symex-traverse-forward-more)
("C-b" symex-traverse-backward-more)

OK

("F" symex-traverse-forward-skip)
("B" symex-traverse-backward-skip)

OK

("{" symex-leap-backward)
("}" symex-leap-forward)
("M-{" symex-soar-backward)
("M-}" symex-soar-forward)

OK

("C-k" symex-climb-branch)
("C-j" symex-descend-branch)

Requires design.

Kinda works but doesn't update selection. Not very useful for non-Lisp, at the moment.

("y" symex-yank)
("Y" symex-yank-remaining)

OK

("p" symex-paste-after)
("P" symex-paste-before)

ERROR:

Debugger entered--Lisp error: (error "Unable to perform edit: the Emacs internal tree sitter library is not yet supported")
  signal(error ("Unable to perform edit: the Emacs internal tree sitter library is not yet supported"))

("x" symex-delete)
("X" symex-delete-backwards)

OK

But note they don't work with counts, e.g. 2x, with the ERROR about "node outdated."

("c" symex-change)

OK

Interestingly, although this just uses delete under the hood, this works with counts, too (i.e., 2c works)! Probably a clue we can follow to fixing that issue.

("D" symex-delete-remaining)

ERROR:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

("C" symex-change-remaining)

OK

("C--" symex-clear)
("s" symex-replace)

OK if we are actually on a parenthesized expression
If not (where it would be invalid anyway), then ERROR:

Debugger entered--Lisp error: (error "Unable to perform edit: the Emacs internal tree sitter library is not yet supported")
  signal(error ("Unable to perform edit: the Emacs internal tree sitter library is not yet supported"))

("-" symex-splice)

ERROR:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

("S" symex-change-delimiter)

OK, but it joins lines if the delimiter is followed by a newline (it should just change the delimiter)

("H" symex-shift-backward)

ERROR:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

("L" symex-shift-forward)

No error, but doesn't work correctly.

("M-H" symex-shift-backward-most)
("M-L" symex-shift-forward-most)

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

("K" symex-raise)

No error, but doesn't work consistently.

("C-S-j" symex-emit-backward)
("C-(" symex-capture-backward)
("C-S-h" symex-capture-backward)
("C-{" symex-emit-backward)
("C-S-l" symex-capture-forward)
("C-}" symex-emit-forward)
("C-S-k" symex-emit-forward)
("C-)" symex-capture-forward)

Requires design. Currently not supported (noop).

("z" symex-swallow)
("Z" symex-swallow-tail)

These usually give an ERROR:

Debugger entered--Lisp error: (error "Can’t splice top level.")
  signal(error ("Can’t splice top level."))

("e" symex-evaluate)
("s-;" symex-evaluate)
("E" symex-evaluate-remaining)
("C-M-e" symex-evaluate-pretty)
("d" symex-evaluate-definition)
("M-e" symex-eval-recursive)
("T" symex-evaluate-thunk)

N/A

Currently raise an ERROR:

Debugger entered--Lisp error: (error "Symex mode: no method :eval for major-mode python-ts-mode")
  signal(error ("Symex mode: no method :eval for major-mode python-ts-mode"))

("t" symex-switch-to-scratch-buffer)

Requires design. Currently ERROR:

Debugger entered--Lisp error: (error "Symex mode: no method :switch-to-scratch-buffer for major-mode python-ts-mode")
  signal(error ("Symex mode: no method :switch-to-scratch-buffer for major-mode python-ts-mode"))

("M" symex-switch-to-messages-buffer)

OK

("r" symex-repl)

Debugger entered--Lisp error: (error "Symex mode: no method :repl for major-mode python-ts-mode")
  signal(error ("Symex mode: no method :repl for major-mode python-ts-mode"))

("R" symex-run)

Debugger entered--Lisp error: (error "Symex mode: no method :run for major-mode python-ts-mode")
  signal(error ("Symex mode: no method :run for major-mode python-ts-mode"))

("|" symex-split)
("&" symex-join)

N/A (or requires design)

("o" symex-open-line-after)
("O" symex-open-line-before)

OK, but o adds a newline before any terminating delimiters on current line (e.g. commas) --- it should ideally add it after the delimiter.

(">" symex-insert-newline)
("<" symex-join-lines-backwards)
("C->" symex-append-newline)
("C-<" symex-join-lines)
("C-S-o" symex-append-newline)

OK, but messes up indentation (probably just an issue with symex-tidy)

("J" symex-join-lines)

OK

("M-J" symex-collapse)
("M-<" symex-collapse)
("M->" symex-unfurl)
("C-M-<" symex-collapse-remaining)
("C-M->" symex-unfurl-remaining)

OK -- work in limited cases, but mess up commas.

("0" symex-goto-first)
("M-h" symex-goto-first)
("$" symex-goto-last)
("M-l" symex-goto-last)
("M-j" symex-goto-lowest)
("M-k" symex-goto-highest)

OK

("=" symex-tidy)
("" symex-tidy)
("C-=" symex-tidy-remaining)
("C-" symex-tidy-remaining)
("M-=" symex-tidy-proper)
("M-" symex-tidy-proper)

Mostly OK? But can mess up commas by adding spaces around them.

("A" symex-append-after)
("a" symex-insert-at-end)
("i" symex-insert-at-beginning)
("I" symex-insert-before)

Requires design. This is another place we could consider using tree-edit, e.g., we could potentially map I (or a new binding like M-i) to a tree-edit insertion command, allowing you to insert an if or for form, etc. (FYI @ethan-leba)

Currently they kinda work in a Lisp style.

("w" symex-wrap)
("W" symex-wrap-and-append)

N/A (maybe requires design --- e.g., we could wrap with standard forms like if, for, and may be able to use tree-edit for this purpose here as well. FYI @ethan-leba 🙂 ). These currently raise ERROR:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  signal(treesit-node-outdated (#<treesit-node-outdated>))

(";" symex-comment)
("M-;" symex-comment-remaining)

OK

("C-;" symex-eval-print)

N/A

("C-?" symex-describe)

Requires design. Currently ERROR:

Debugger entered--Lisp error: (error "Symex mode: no method :describe-symbol for major-mode python-ts-mode")
  signal(error ("Symex mode: no method :describe-symbol for major-mode python-ts-mode"))

;; escapes
("" symex-enter-lower)
("" symex-escape-higher))

OK

@countvajhula
Copy link
Collaborator Author

countvajhula commented Feb 20, 2025

I added the current test results to the PR description in an easy-to-read (I think) format. I'll keep it up to date as features get fixed. Any assistance / pointers appreciated --- especially could use help understanding the treesit-node-outdated issue, which seems to be at the heart of many of the errors 🙂

@countvajhula countvajhula marked this pull request as ready for review February 20, 2025 03:25
@countvajhula
Copy link
Collaborator Author

I think it would be good to merge this and continue things on the integration branch. Please let me know if this PR looks good to you @polaris64

polaris64 and others added 6 commits February 19, 2025 20:45
* Define new functions or aliases for tree sitter functionality
depending on the available support from Emacs.

* Replace all occurrences of library function calls with calls to the
new aliases/functions.

TODO: change usage `tsc-changed-ranges', use
`treesit-parser-add-notifier'.
Support for handling tree changes needs to be implemented differently
for the internal tree sitter library.
The old package was used before Emacs had built-in support for Tree
Sitter (starting in Emacs 29). Since then the elisp-tree-sitter
package has not been updated and therefore it seems sensible for Symex
only to support Tree Sitter modes when Emacs has built-in support
available to it.
@polaris64
Copy link
Collaborator

Hi @countvajhula,

Thanks for performing exhaustive tests on this, that's very valuable information to have and I know how long it takes to do the testing properly!

Just some quick notes on the errors you're seeing: -

  • signal(treesit-node-outdated (#<treesit-node-outdated>)): this is most likely due to mutations of the buffer no longer being handled and relates to your "change usage of tsc-changed-ranges, use treesit-parser-add-notifier?" task. The cause of this is that Symex keeps track of the current node. When the tree is mutated then the current node may or may not become outdated (e.g. if the node was removed or if the structure changed enough for it to become a new node). Trying to use this node will produce this error. Handling the mutation properly by reselecting a new node at point should prevent all of these errors; that's what the symex-ts--handle-tree-modification (which is no longer implemented) used to do.
  • ""Unable to perform edit: the Emacs internal tree sitter library is not yet supported": this is an error explicitly raised by the symex-ts--handle-tree-modification macro. Rather than letting it silently fail I've added this error message for now so that you can see cases where the macro needs to be used but can't be as it isn't implemented for the built-in Tree Sitter library.

So for the first case, these are probably operations that should be using the symex-ts--handle-tree-modification macro. Once this macro has been implemented or replaced with an alternative then using that with these operations should prevent these errors.

I hope that makes sense!

Copy link
Collaborator

@polaris64 polaris64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me

@countvajhula
Copy link
Collaborator Author

@polaris64 Yes, that makes perfect sense and is exactly what I needed to know. I'll take a look today and see what progress we can make!

@countvajhula countvajhula merged commit 30dd9c6 into 2.0-integration Feb 20, 2025
0 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants