Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with find-definition in Allegro 11 beta #613

Open
rpgoldman opened this issue Jul 21, 2023 · 12 comments
Open

Problem with find-definition in Allegro 11 beta #613

rpgoldman opened this issue Jul 21, 2023 · 12 comments

Comments

@rpgoldman
Copy link
Contributor

When I try to find a definition after the first time with SLY and Allegro 11, I get errors that look like this:

sly-error

I wasn't able to find that error string looking around in the sly code, so I am a little bit lost. Is there some way to track this down? It looks like some transaction between Emacs and CL is stuck in a weird state. If anyone could point me towards a location in the sly/slynk code, I would have a whack at fixing it.

Another thing that would help would be if I could gather more information -- like maybe dump a backtrace to a file.

@joaotavora
Copy link
Owner

Hello Robert. Sorry for not getting back to you earlier. I haven't got Allegro 11 beta to test this, and I've been away from CL from a while now. But I can give you some tips on debugging this.

Definition finding has many ways to start, but the main way is M-. of course, which binds to sly-edit-definition. Of course, you probably already know this. Anyway, this calls CL's SLYNK:FIND-DEFINITIONS-FOR-EMACS. This calls into SLYNK-BACKENDS interface SLYNK-BACKEND:FIND-DEFINITIONS, whose protocol is moderately well described in the docstring.

For Allegro, this calls into the implementation of that interface, which lives in slynk/backend/allegro.lisp. And here is probably where it gets interesting for Allegro 11 and you'll have to drill down the function to know what interesting thing happens in Allegro 11 that doesn't in Allegro 10 -- or vice versa.

Using SLY to debug/develop SLY is generally possible and my recommendation is to have two connections: one for Allegro 10 and one for Allegro 11 and switch between them with M-x sly-next-connection.

I've solved some of these problems in the past (well in the past to be honest). If I recall correctly Allegro CL, the definition finding business is more convoluted than in other languages, and relies on some Allegro internals. For example, when compiling a definition with C-c C-c, the little snippet of code (a string basically) that's sent from the Emacs side for compilation on the CL side is adorned with some let-bindings of special variables so that the calculation to have the eventual evaluation of that compiled snippet record whence the snippet came. See this in allegro.lisp:

(defimplementation slynk-compile-string (string &key buffer position filename
                                                line column policy)
  (declare (ignore line column policy))
  (handler-case
      (with-compilation-hooks ()
        (let ((*buffer-name* buffer)
              (*buffer-start-position* position)
              (*buffer-string* string))
          (compile-from-temp-file string buffer position filename)))
    (reader-error () nil)))

And remember to also expand that with-compilation-hooks!. Anyway this means that there are at least two cases to deal with here: (1) finding definitions after LOAD ing a .lisp or .fasl file and (2) finding definition after compiling a bit of code with C-c C-c.

Hope this helps, let me know if I can answer something more.

if you can find me some Allegro 11 that I can use to test I might be able to find some time.

It would also be worth it to check with the SLIME guys because usually these problems pop up there as well. Maybe @luismbo has some clues (though I think he doesn't use Allegro as much as he used to....).

@rpgoldman
Copy link
Contributor Author

@joaotavora Thank you very much for the helpful write-up. It seems like buried somewhere down in there is an invocation of Allegro functions on the slynk side that is not terminating so that a new attempt to search fails.

One possibility, it seems to me, is this function, which is used in the defimplementation for both find-definitions and find-source-location. This is just a conjecture, but it does have a loop in it which is the only place I've found a likely way that execution could get stuck on the slynk side:

(defun fspec-definition-locations (fspec)
  (cond
    ((and (listp fspec) (eq (car fspec) :internal))
     (destructuring-bind (_internal next _n) fspec
       (declare (ignore _internal _n))
       (fspec-definition-locations next)))
    (t
     (let ((defs (excl::find-source-file fspec)))
       (when (and (null defs)
                  (listp fspec)
                  (string= (car fspec) '#:method))
         ;; If methods are defined in a defgeneric form, the source location is
         ;; recorded for the gf but not for the methods. Therefore fall back to
         ;; the gf as the likely place of definition.
         (setq defs (excl::find-source-file (second fspec))))
       (if (null defs)
           (list
            (list fspec
                  (make-error-location "Unknown source location for ~A" 
                                       (fspec->string fspec))))
           (loop for (fspec type file top-level) in defs collect 
                 (list (list type fspec)
                       (find-fspec-location fspec type file top-level))))))))

There's something going on in the interaction between emacs and CL that I really don't understand at all...

@joaotavora
Copy link
Owner

there is an invocation of Allegro functions on the slynk side that is not terminating so that a new attempt to search fails.

I find this a bit odd. At least, I understand by "not terminating" something that times out on the Slynk side. For asynchronous Emacs -> Slynk requests (sly-eval-async) this can indeed happen, but in this case the top level operation is synchronous sly-eval, meaning Emacs waits (using a mechanism that isn't trivial and not worth explaining here, I think) for the Slynk side.

So from what you've shown it seems that the underlying operation is simply returning an error. Maybe you can confirm this by showing a fragment of the *sly-events for allegro* buffer. Here's what it looks like for successfully finding the deifnition of SLYNK:XREF>ELISP, for example:

...
(:emacs-rex (slynk:find-definitions-for-emacs "xref>elisp") ":slynk" t
            6)
(:return
 (:ok
  (("(:OPERATOR XREF>ELISP)"
    (:location
     (:file "/home/capitaomorte/Source/Emacs/sly/slynk/slynk.lisp")
     (:offset 1 128614) nil))))
 6)
...

But in your case, it seems that fspec-definition-locations is returning an "error location" (different from signalling an error, mind you). Your events probably contains something like:

(:emacs-rex (slynk:find-definitions-for-emacs "xref>elisp") ":slynk" t
            46)
(:return (:ok ((":ERROR" "oh no"))) 46)

Most likely the excl::find-source-file call (notice one of the :: calls to an internal function I've previously mentioned) is returning something that

It is probably helpful to trace (either with CL's built-in trace or with SLY's trace dialog) the fpsec-definition-location (singular) function reports as the error. Why is that? I don't know. Maybe excl::find-source-file it doesn't like the arguments passed to it, or maybe there's some detail in Allegro 11 Beta that allegro.lisp doesn't comprehend. In any case I think something is calling this error handler at the bottom.

(defun find-fspec-location (fspec type file top-level)
  (handler-case
      (etypecase file
        (pathname
         (let ((probe (gethash file *temp-file-map*)))
           (cond (probe
                  (destructuring-bind (buffer offset) probe
                    (make-location `(:buffer ,buffer)
                                   `(:offset ,offset 0))))
                 (t
                  (find-definition-in-file fspec type file top-level)))))
        ((member :top-level)
         (make-error-location "Defined at toplevel: ~A" 
                              (fspec->string fspec))))
    (error (e)
      (make-error-location "Error: ~A" e))))

Anyway, you should trace a lot of functions, like this one and find-definition-in-file as well. Maybe its callees, too.

@rpgoldman
Copy link
Contributor Author

Recording my initial tracing results here. I will continue to look into this. Looks like things are going wrong in scm:section-file.

 0[336]: (SLYNK-ALLEGRO::FIND-FSPEC-LOCATION REPLAN-DOMAIN :OPERATOR
           #P"/Users/rpg/projects/hideho/plan-repair/plan-repair-by-rewrite/plan-repair-rewrite.lisp"
           (REPLAN-DOMAIN :OPERATOR))
   1[336]: (SLYNK-ALLEGRO::FIND-DEFINITION-IN-FILE REPLAN-DOMAIN
             :OPERATOR
             #P"/Users/rpg/projects/hideho/plan-repair/plan-repair-by-rewrite/plan-repair-rewrite.lisp"
             (REPLAN-DOMAIN :OPERATOR))
     2[336]: (EXCL.SCM:SECTION-FILE :FILE #P"/Users/rpg/projects/hideho/plan-repair/plan-repair-by-rewrite/plan-repair-rewrite.lisp")
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART EXECUTION-RECORD :TYPE 127
                   802>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 127
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART REPLAN-DOMAIN-AND-PROBLEM
                   :OPERATOR 1039 1324>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 1039
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART ORDERING-LITERALS DEFGENERIC
                   1629 2087>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 1629
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART NEW-ACTIONS :OPERATOR 2476
                   4919>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 2476
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART NEW-ACTION-TASKS-AND-METHODS
                   :OPERATOR 5108 5493>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 5108
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART NEW-ACTION-TASK-AND-METHODS
                   :OPERATOR 5495 7917>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 5495
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART MAKE-NEW-ACTION :OPERATOR 9218
                   10066>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 9218
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART NEW-ACT-REVISED-EFFECTS
                   :OPERATOR 10101 11330>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 10101
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART REPLAN-PROBLEM :OPERATOR 11589
                   12250>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 11589
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART REPLAN-DOMAIN :OPERATOR 12252
                   13053>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 12252
       3[336]: (EXCL.SCM::SOURCE-PART-START
                 #<EXCL.SCM::SOURCE-PART REVISE-ORIGINAL-METHODS
                   :OPERATOR 13055 13791>)
       3* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
            (EXCL.SCM::SOURCE-PART)>
       3[336]: returned 13055
     2[336]: returned-by-throwing to tag 770239548: 0
   1[336]: returned-by-throwing to tag 770239548: 0
   1[336]: (EXCL.SCM::SOURCE-PART-START
             #<EXCL.SCM::SOURCE-PART REVISE-ORIGINAL-METHODS :OPERATOR
               13055 13791>)
   1* #<ACLMOP:STANDARD-READER-METHOD EXCL.SCM::SOURCE-PART-START
        (EXCL.SCM::SOURCE-PART)>
   1[336]: returned 13055
 0[336]: returned
           (:ERROR "Error: not finished with
       #<EXCL.SCM::SOURCE-PART REVISE-ORIGINAL-METHODS :OPERATOR 13055
         13791>")

@rpgoldman
Copy link
Contributor Author

rpgoldman commented Jul 28, 2023

Further notes:

  1. Not sure if this is relevant, but there was junk at the bottom of the file that section-file was invoked on. Removing the junk did not fix the problem, though.
  2. I have further localized the error to EXCL.SCM::CONVERT-TO-BUFFER-POSITIONS. It looks like possibly there's something going wrong where Allegro is trying to find the start and end of the last function in the file, and the endpoint is off somehow.
  3. This code is exactly the same as in slime/swank/allegro.lisp. Alas, in neither place are there any comments, despite the fact that heavy use is made of non-exported functions.

@joaotavora
Copy link
Owner

  1. OK. I take it by "removing the junk", you mean you not only deleted it in the file and saved but also re-compiled the loaded the fasl, so Allegro doesn't see the "junk" anymore.
  2. Yes, internal function here. Could be encoding/line endings stuff. Check that: it has created problems in the past.
  3. I guess that means SLIME has this problem too which is good to know. You can check the history of the file. At least when I and @luismbo touched it (can't remember if in SLIME or in SLY) we left good commit messages explaining the changes. M-x vc-region-history is your friend.

Thanks and keep posting your results here freely.

@joaotavora
Copy link
Owner

Some more notes:

  • Just noticed that scm:section-file is external, could be worth posting a link to any docstring/documentation you can find here.
  • How does Allegro 10 deal with finding definitions in this particular file? What is the trace of the same functions?
  • Can you come up with a trivial example hello-world.lisp that triggers this instead of that file?

@rpgoldman
Copy link
Contributor Author

  1. OK. I take it by "removing the junk", you mean you not only deleted it in the file and saved but also re-compiled the loaded the fasl, so Allegro doesn't see the "junk" anymore.

Yes, that is correct -- I pulled the junk out and then redid the asdf:load-system.

2. Yes, internal function here.  Could be encoding/line endings stuff.  Check that: it has created problems in the past.

I'm not at all sure what Franz's file positions are. They don't seem to correspond to indices into characters of the file. From the fact that there's convert-to-buffer-positions, it's clear that something must be decoded.

For what it's worth, source-part-start returns 13702 for the start position of my revise-original-methods function --- and Emacs' (point) returns 13565 when the cursor is positioned at the beginning of the defun. So Allegro positions seem to ignore some file characters, instead of counting additional ones. I suppose there could be some sort of ignoring of whitespace going on. There's a huge block of code in this file preceded by #+nil, but that block of code is too large to account for this difference of only 137 characters (there are 1297 in that block).

I'm on the Mac, and wrote this code on a modern Mac, so the line endings should simply be newlines (no old-school Mac \r or Windows \n\r).

3. I guess that means SLIME has this problem too which is good to know. You _can_ check the history of the file.  At least when I and @luismbo touched it (can't remember if in SLIME or in SLY) we left good commit messages explaining the changes.  `M-x vc-region-history` is your friend.

Using revision annotate, I found two commits -- both pretty old -- that touch the enclosing find-definition-in-file function:

commit 7544734a10b4f3e525ec69a6e92751ebd217b9ce
Author: Matthias Koeppe <[email protected]>
Date:   Wed Dec 7 17:47:12 2005 +0000

    (find-definition-in-file)
    (find-fspec-location, fspec-definition-locations): Allegro CL
    properly records all definitions made by arbitrary macros whose
    names start with "def".  Use excl::find-source-file and
    scm:find-definition-in-definition-group (rather than
    scm:find-definition-in-file) to find them.

and

commit 57e0354d364fdb26aa458c0783bca4e48a167177
Author: Luís Oliveira <[email protected]>
Date:   Tue Oct 30 18:23:07 2018 +0000

    slynk/backend/allegro.lisp: disable eol fixup in some cases
    
    When dealing with CRLF files, Allegro CL sometimes returns positions
    that count CRLF as two positions, sometimes one position, depending on
    context and version. Use :offset instead of :position for cases where
    eol fixup shouldn't be performed.
    
    Bug report and pair programming/testing courtesy of João Távora.
    
    (cherry picked from SLIME commit cc8bd14e6e139162baa8be30e78fba3245abf535)
    
    * slynk/backend/allegro.lisp: do things described above.

@rpgoldman
Copy link
Contributor Author

Some more notes:

* Just noticed that `scm:section-file`  is external, could be worth posting a link to any docstring/documentation you can find here.

Alas, although it's exported, it doesn't have docstrings, and there's nothing in the Franz docs about the scm functions.

However I have a source license for Allegro (but unfortunately only for 10.1, not the 11 beta), and I can see the error in their SCM code. It happens when we reach the end of file without having reached the end of the source definition we are looking at.

I'm looking and this almost looks like a simple fencepost error. But I really can't tell, since I have the source only for the version that used to work, not the current version that doesn't.

* How does Allegro 10 deal with finding definitions in this particular file?  What is the trace of the same functions?

I'll have to work on this. The problem is that I have an M1 Mac and Allegro 10.1 doesn't run on my machine. I'll have to get set up on a linux box where I can test Allegro 10.1

* Can you come up with a trivial example `hello-world.lisp` that triggers this instead of that file?

I will have to see.

More soon!

@joaotavora
Copy link
Owner

I will have to see [about the hello-world.lisp]

It should be relatively easy to create a hello-world.lisp that looks like:

(in-package :cl-user)
(defun foo () (bar))

(defun bar () )

then C-c C-k to compile and load it, then go the bar on the second line and see if point jumps to the last line or if some this or other problem happened.

@rpgoldman
Copy link
Contributor Author

Sorry about the long delay. It appears there is something about this particular file of code that is interfering with Allegro's source finding, so an MWE is not easy to make. Trying to shim in enough time to check what has changed in the source tracking (Franz very kindly supplied me source under NDA).

@rpgoldman
Copy link
Contributor Author

I have little more to report, except to say that Allegro 11 express edition should now be available at the Franz download site. I continue to work on this as other responsibilities permit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants