Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sigreturn on macOS 10.14 Beta on functions including (quit) #146

Closed
currymj opened this issue Aug 31, 2018 · 16 comments
Closed

sigreturn on macOS 10.14 Beta on functions including (quit) #146

currymj opened this issue Aug 31, 2018 · 16 comments

Comments

@currymj
Copy link

currymj commented Aug 31, 2018

On a beta machine, if I try:

Michaels-MacBook-Pro-6% ccl64
Clozure Common Lisp Version 1.11.5  (DarwinX8664)

For more information about CCL, please see http://ccl.clozure.com.

CCL is free software.  It is distributed under the terms of the Apache
Licence, Version 2.0.
? (quit)
sigreturn returned
? for help
[5280] Clozure CL kernel debugger: b
current thread: tcr = 0x103010, native thread ID = 0x307, interrupts enabled


(#x0000000000647EE0) #x00003000006326A4 : #<Function %NANOSLEEP #x00003000006323AF> + 757
(#x0000000000647F68) #x000030000064BBBC : #<Function HOUSEKEEPING-LOOP #x000030000064B9DF> + 477
(#x0000000000647FB8) #x000030000064C274 : #<Function (:INTERNAL (TOPLEVEL-FUNCTION (LISP-DEVELOPMENT-SYSTEM T))) #x000030000064C16F> + 261
[5280] Clozure CL kernel debugger: 

I get a similar result trying to load quicklisp, although loading my own test file that just defines a simple function works fine.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@xrme
Copy link
Member

xrme commented Aug 31, 2018

I think I have a simple fix for the 1.12 development branch. If it seems stable there, I'll back-port it to 1.11.5 shortly.

@xrme
Copy link
Member

xrme commented Aug 31, 2018

212c254 seems to fix 1.12-dev.

I did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do (rebuild-ccl :clean t)), when I evaluate (quit), I see errors like this:

Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664
? (quit)
> Error: Fault during read of memory address #x0
> While executing: 0, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
$ 

And for 32-bit:

Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664
? (quit)
> Error: Fault during read of memory address #x0
> While executing: 0, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.

Frankly, I am not much inclined to worry about this, even if I port 212c254 to 1.11.5.

@edoneel
Copy link

edoneel commented Sep 1, 2018 via email

@xrme
Copy link
Member

xrme commented Oct 31, 2018

It is beginning to look like it isn't safe to get rid of DarwinSigReturn on pre-Mojave systems.

Things seem to work most of the time, but there are definitely issues.

I'm going to quote some mail from openmcl-devel that shows some (fairly heavyweight) steps to reproduce:

Clone the following into quicklisp/local-projects:

https://gitlab.common-lisp.net/dcooper/zacl.git
https://github.com/gendl/aserve.git
https://gitlab.common-lisp.net/gendl/gendl.git

Then:

(ql:quickload :gendl)
(gendl:start-gendl!)

That should print a banner and let you know which port the webserver is running on.

Now go to the following URL in your browser:

http://localhost:9000/tasty (or whatever the port is)

Accept the default robot:assembly.

Hover over the root node in the tree at upper-left and see the "Pencil" icon show up. Click the Pencil icon.

This should result in the reported crash.

The crash is happening some time during the call to the gdlAjax function, which is invoked through an Ajax call when clicking that "pencil" hover-over icon. The gdlAjax function is defined in the file gendl/gwl/ajax/source/ajax.lisp.

My reply:

I think 212c254 is not as problem-free as I originally thought it might be.

If I run a CCL IDE that includes that change on a High Sierra system (like the test ccl.pkg), then I see a crash. But when I used a command-line lisp, I didn't see the crash, oddly enough.

If I take that change out (i.e., revert 212c254), it works fine.

On the other hand, if I run the Lisp installed from the test ccl.pkg on a macOS Mojave system, your test case appears to work fine.

I wish I remembered why we needed the DarwinSigReturn workaround in the first place. It looks like it is going to be necessary to detect at runtime whether we are on a pre-Mojave macOS, and leave the DarwinSigReturn thing in place if so.

@xrme
Copy link
Member

xrme commented Nov 14, 2018

I'm also seeing crashes into the lisp kernel debugger when doing make certify-books-short from acl2-8.1 sources when running on Mojave.

Example:

   | Unhandled exception 4 at 0x7fff77485b53, context->regs at #x7000110c5590
   | ? for help
   | [21182] Clozure CL kernel debugger: Exit code from ACL2 is 137
   | -rw-r--r--  1 rme  staff  1844 Nov 13 19:28 world-theorems.cert

So I'm pretty sure that just getting rid of DarwinSigReturn is not the complete solution for Mojave, which is unfortunate.

@xrme
Copy link
Member

xrme commented Nov 17, 2018

It seems that there's a third argument to the sigreturn system call on Mojave. From the _sigtramp disassembly, we see:

    0x7fff77485b43 <+35>: movq   %rbx, %rdi
    0x7fff77485b46 <+38>: movl   $0x1e, %esi
    0x7fff77485b4b <+43>: movq   %r12, %rdx
    0x7fff77485b4e <+46>: callq  0x7fff77488594            ; symbol stub for: __sigreturn

The High Sierra sigtramp doesn't put anything in %rdx.

I have no idea what this extra argument is (and I don't see the Mojave sources on opensource.apple.com yet).

https://trac.clozure.com/ccl/changeset/11565 is a breadcrumb. Other archaeology leads me to beleive that we're in this situation because (at one point at least) sigaltstack isn't (or wasn't) thread-local on Darwin.

@xrme
Copy link
Member

xrme commented Nov 24, 2018

Another test case:

On macOS Mojave (earlier macOS versions work as expected):

  1. build acl2-8.1
  2. start acl2 and then evaluate
(thm (equal (append (append x y) x y x y x y x y)
           (append x y x y x y x y x y)))
  1. hit C-c and observe that CCL enters the lisp kernel debugger
Unhandled exception 4 at 0x7fff77485b53, context->regs at #x70000b261590
? for help
[31147] Clozure CL kernel debugger:

@xrme
Copy link
Member

xrme commented Nov 25, 2018

This exception (4 is SIGILL) is because the call to sigreturn in the sigtramp routine returned unexpectedly, and there's helpfully an illegal instruction there to catch that unexpected case.

@xrme
Copy link
Member

xrme commented Dec 10, 2018

https://opensource.apple.com/release/macos-1014.html is now available (but they say "coming soon" for the sources for xnu-4903.201.2, which is probably what I really need to figure out what the third arg to sigreturn is.)

@rprimus
Copy link
Contributor

rprimus commented Dec 11, 2018

Tue Dec 11 10:11:28 GMT 2018

@xrme

bsd/dev/i386/unix_signal.c:

688:sigreturn(struct proc *p, struct sigreturn_args *uap, __unused int *retval)

for cross references:
http://newosxbook.com/xxr/index.jl?q=sigreturn&ver=xnu-4903.221.2&case=false&def=false

tarball:
https://opensource.apple.com/tarballs/xnu/xnu-4903.221.2.tar.gz

@almsanac
Copy link

Given the resources available and the complexity of the issue, I'd prioritize getting it to work on Mojave, and deprecate earlier versions of MacOS. If there's time to get it running in earlier versions, that's great. But the most important thing is getting it working on Mojave.

@xrme
Copy link
Member

xrme commented Jan 1, 2019

I ran a 1.11.5 binary under a Mojave debug kernel. I got, as I feared I would, the following message:

process dx86cl64[405] sigreturn token mismatch: received 0x7ffeefbff120 expected 0xb993f3f80d520774

After this debug message is printed, the sigreturn system call returns with an error code.

The Mojave sources contain code to mitigate a class of attacks ("sigreturn oriented programming") described in, for example, https://dl.acm.org/citation.cfm?id=2650802. It seems that this mitigation breaks a technique that CCL has been using.

update: link to PDF of paper in question: https://www.cs.vu.nl/~herbertb/papers/srop_sp14.pdf

@xrme
Copy link
Member

xrme commented Jan 11, 2019

Thanks to some help from Apple DTS, I committed dd5622e, and this really does seem to make CCL compatible with macOS Mojave.

@svspire
Copy link
Contributor

svspire commented Jan 13, 2019

dd5622e Seems to still work fine on OSX 10.9.5 in terminal mode and as a GUI app.

@stoney
Copy link

stoney commented Jan 21, 2019

Are we going to see the Mac App Store version of Clozure CL updated soon? I'm waiting for that since my attempts to build it from this repos + comments haven't worked.

@xrme
Copy link
Member

xrme commented Jan 27, 2019

I just submitted an updated Mac App Store version of Clozure CL. It now has to get through app review. This seems to take about a week. It might take a little longer if the app review team identifies any issues that need to be corrected.

I'll post a note when it is approved (as I hope it will be).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants