Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add catch_unwind! macro to prevent panics crossing ffi boundaries #797

Merged
merged 1 commit into from
May 5, 2020

Conversation

davidhewitt
Copy link
Member

@davidhewitt davidhewitt commented Mar 8, 2020

Fixes #492

This is a first attempt at updating all the wrapping code to use catch_unwind to prevent unsoundness of panics inside Rust callback functions.

I implemented this using a pyo3::catch_unwind macro, which takes a Python token and a body of code which should evaluate to a PyResult.

I ran into complications around lifetimes and the callback converters so I ended up simplifying a lot of the callback conversion code. I think a lot of those macros are now in slightly better shape but there's probably more refactoring that could be done.

Please give this a thorough review, I'm prepared to rework this as many times as needed to get it right.

TODO:

  • Custom Exception type from BaseException
    • Mechanism for users to import this Exception (will do later after Can we export exceptions? #805)
    • Resume unwind when pyO3 gets this Exception in a PyErr
  • Panic section for the book

@kngwyu
Copy link
Member

kngwyu commented Mar 9, 2020

Awesome! 💯
I'll leave some comments but generally, the design concept looks good.

src/gil.rs Outdated Show resolved Hide resolved
pyo3-derive-backend/src/module.rs Outdated Show resolved Hide resolved
pyo3-derive-backend/src/pymethod.rs Outdated Show resolved Hide resolved
src/callback.rs Outdated Show resolved Hide resolved
src/callback.rs Outdated Show resolved Hide resolved
src/class/iter.rs Outdated Show resolved Hide resolved
src/class/macros.rs Outdated Show resolved Hide resolved
src/class/macros.rs Outdated Show resolved Hide resolved
src/class/macros.rs Outdated Show resolved Hide resolved
@kngwyu
Copy link
Member

kngwyu commented Mar 9, 2020

@davidhewitt
Now I'm not sure we should include this in the 0.9.0 release. What's your opnion?

@davidhewitt
Copy link
Member Author

Thanks for all the reviews! I will answer and make a round of changes maybe tomorrow.

Now I'm not sure we should include this in the 0.9.0 release. What's your opnion?

If you're ready for 0.9 to be released then please proceed without waiting for this. We can include this in 0.10. (There's technically breaking changes here to a couple of pub functions and traits, so I think this will have to go in a breaking semver release.)

src/callback.rs Outdated Show resolved Hide resolved
@konstin
Copy link
Member

konstin commented Mar 9, 2020

I haven't looked thoroughly at the implementation, I rather wanted to ask if we should forcefully exit (e.g. by abort) instead of raising an exception. A panic breaks all normal assumption about control flow in rust and might leave the rust part in a broken state, into which the python can than call later.

See https://doc.rust-lang.org/nomicon/unwinding.html and https://doc.rust-lang.org/nomicon/poisoning.html for some related information.

@davidhewitt
Copy link
Member Author

I can see why aborting is an easier choice. However, I don't think we should do this.

I think Python users writing Rust for the first time are likely to write a lot of things which panic (e.g. unwrap). Aborting the process is a fairly hard to debug error for such users who aren't familiar with low level code.

From the edition guide, I understand that the use of catch_unwind is exactly intended for the kind of scenario we have here: https://doc.rust-lang.org/edition-guide/rust-2018/error-handling-and-panics/controlling-panics-with-std-panic.html

@konstin
Copy link
Member

konstin commented Mar 9, 2020

I like the way normal rust binaries currently handle this:

fn crashy() {
    None.unwrap()
}

fn main() {
    println!("Hello, world!");
    crashy();
    println!("Goodbye, world!");
}

This prints

Hello, world!
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/main.rs:2:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

when running it with cargo run, and a long backtrace if the environment variable is set:

Long backtrace
Hello, world!
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/main.rs:2:5
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/backtrace-0.3.44/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/backtrace-0.3.44/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1053
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1428
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:470
  11: rust_begin_unwind
             at src/libstd/panicking.rs:378
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  13: core::panicking::panic
             at src/libcore/panicking.rs:52
  14: core::option::Option<T>::unwrap
             at /rustc/823ff8cf1397a5772b1f6954b60576202bf91836/src/libcore/macros/mod.rs:10
  15: foo_crash::crashy
             at src/main.rs:2
  16: foo_crash::main
             at src/main.rs:7
  17: std::rt::lang_start::{{closure}}
             at /rustc/823ff8cf1397a5772b1f6954b60576202bf91836/src/libstd/rt.rs:67
  18: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
  19: std::panicking::try::do_call
             at src/libstd/panicking.rs:303
  20: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:86
  21: std::panicking::try
             at src/libstd/panicking.rs:281
  22: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  23: std::rt::lang_start_internal
             at src/libstd/rt.rs:51
  24: std::rt::lang_start
             at /rustc/823ff8cf1397a5772b1f6954b60576202bf91836/src/libstd/rt.rs:67
  25: main
  26: __libc_start_main
  27: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

pyo3 could have some message like "Your rust code panicked. pyo3 catched this for you. Do <...> to get a backtrace. Note: If you return a PyResult, it will be converted to a python error."

@kngwyu
Copy link
Member

kngwyu commented Mar 9, 2020

@davidhewitt

We can include this in 0.10

Then let's include this in 0.9.
Releasing 0.10 just a few days 0.9 is not very user friendly.

@konstin

I rather wanted to ask if we should forcefully exit (e.g. by abort) instead of raising an exception.

It seems too difficult for me. I opened a question on the user forum.

@konstin
Copy link
Member

konstin commented Mar 9, 2020

@konstin

I rather wanted to ask if we should forcefully exit (e.g. by abort) instead of raising an exception.

It seems too difficult for me. I opened a question on the user forum.

Oh that's a great idea, thank you! Hopefully we'll get some clearer answers there

@kngwyu
Copy link
Member

kngwyu commented Mar 10, 2020

Based on the discussion in the forum, now I think we should raise exceptions for panics.
It can be problematic if the thread panics while we have PyRefMut, but it's not UB.

@davidhewitt
Copy link
Member Author

pyo3 could have some message like "Your rust code panicked. pyo3 catched this for you. Do <...> to get a backtrace. Note: If you return a PyResult, it will be converted to a python error."

We could potentially add this as an optional feature panic_abort, or maybe even piggy-back off the the cfg setting if it's possible. The only problem is I'm not sure where to call panic::set_hook (it seems like it's probably a lot of overhead to call it in every wrapper?)

@davidhewitt
Copy link
Member Author

Based on the discussion in the forum, now I think we should raise exceptions for panics.
It can be problematic if the thread panics while we have PyRefMut, but it's not UB.

Cool. I am not sure I see why panic with PyRefMut specifically is a problem. AFAIK drop code still runs during stack unwinding. We should definitely add a test for the PyRefMut case.

@kngwyu
Copy link
Member

kngwyu commented Mar 10, 2020

@davidhewitt

I am not sure I see why panic with PyRefMut specifically is a problem.

I said PyRefMut is problematic because it can cause logical error when panicking.
UnwindSafe documents says that &mut T or RefCell<T> is not UnwindSafe since it can break logical invariant.

Here's an example of such a logical invariant.
We sometimes hold logical assumptions for mutable data. In the above example, I assume that the elements of array are monotonically increasing.
However, this kind of logical assumption can be broken because of panic, like this.
If we are certain that such kind of logical error can not happen, we can use AssertUnwindSafe.

The problem we have is we assume PyRefMut<T> is AssertUnwindSafe even if users don't intend it. Maybe we should note this implicit AssertUnwindSafe in the guide.

@davidhewitt
Copy link
Member Author

The problem we have is we assume PyRefMut is AssertUnwindSafe even if users don't intend it. Maybe we should note this implicit AssertUnwindSafe in the guide.

I'm not hugely keen on "implicit AssertUnwindSafe" in the guide, because many pyo3 users may not read it and make mistakes.

I have been thinking that the best solution is probably to "poison" the PyCell if there is a panic during a PyRefMut borrow, in just the same way a Mutex can be poisoned. What do you think of this idea?

}
Err(e) => PyErr::from(e).restore_and_null(py),
}
let py = crate::Python::assume_gil_acquired();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed this in the thread, but is assume_gil_acquired sound without calling GILPool::new beforehand? If so, could we document the reason somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all current code, GILPool::new is called immediately after, like this:

let py = Python::assume_gil_acquired();
let pool = GILPool::new(py);
...

What I'm changing it to in this PR is now

let py = Python::assume_gil_acquired();
catch_unwind!(py, {
    let pool = GILPool::new(py);
    ...
})
...

However you've rightly noticed I have accidentally lost the GILPool::new here. I will fix this and also carefully check the rest of the changes I made.

I can't comment on whether this usage of assume_gil_acquired is sound, but I have other soundness concerns with it. See #800

@kngwyu
Copy link
Member

kngwyu commented Mar 12, 2020

I have been thinking that the best solution is probably to "poison" the PyCell if there is a panic during a PyRefMut borrow, in just the same way a Mutex can be poisoned. What do you think of this idea?

I'm not for that.

  • Simply we rarely need it. It can be only problematic when
    • Panics happen when a user is doing some delicate operation with mutable data
    • He or she ignores the error like
      try
         pyclass.call()
      except RuntimeError:
          pass
    So I don't think we should take care of such cases seriously.
  • To handle poisoning, std::sync::Mutex has a complex try_lock API, but now parking_lot's Mutex with simpler try_lock, which does not handle poisoning, is also widely used. It seems like this fact implies most of users don't actually need poisoning.

@davidhewitt
Copy link
Member Author

Ok. Fwiw lock_api is also not UnwindSafe: Amanieu/parking_lot#32

I guess we can always change to have poisoning later if it turns out to be a problem.

Sometime in the next few days I'll push some tests and documentation to this branch

@programmerjake
Copy link
Contributor

Personally, I only panic! when my program should be aborted due to logic errors, so Python code ignoring any resulting exceptions is likely to observe my code behaving in unexpected/unintended ways. That said, I'm fine with PyCell not implementing poisoning as long as the specific Python exception thrown is extremely uncommon to catch and ignore.

@davidhewitt
Copy link
Member Author

What if we made the exception type derive from BaseException, like SystemExit ?

https://docs.python.org/3/library/exceptions.html#SystemExit

This would mean that it would be really rare for Python programs to catch this exception.

To do this, I'd just like there to be some way to refer to this exception from Python code, so that in cases where users really do want to catch it, they can. I haven't thought of a good way to do this, so any ideas are welcome.

@programmerjake
Copy link
Contributor

What if we made the exception type derive from BaseException, like SystemExit ?

Sounds like a good idea.

https://docs.python.org/3/library/exceptions.html#SystemExit

This would mean that it would be really rare for Python programs to catch this exception.

To do this, I'd just like there to be some way to refer to this exception from Python code, so that in cases where users really do want to catch it, they can. I haven't thought of a good way to do this, so any ideas are welcome.

What about injecting a module containing the PanicException definition into sys.modules if it's not already there so any python code can just import pyo3_panics.PanicException to access it? If the module is already there then we use the existing PanicException class and abort on panic! if it's not found in the existing module.

We could just publish a python package to PyPI to reserve the name with some simple code like:

class PanicException(BaseException):
    pass

and it would be constructed by calling PanicException(panic_message_as_python_string)

@kngwyu
Copy link
Member

kngwyu commented Mar 13, 2020

To do this, I'd just like there to be some way to refer to this exception from Python code, so that in cases where users really do want to catch it, they can. I

How about recommending users to register our custom exception module to their module?
But maybe we need another PR to do so.

What about injecting a module containing the PanicException definition into sys.modules

Can it be confusing?

@programmerjake
Copy link
Contributor

What about injecting a module containing the PanicException definition into sys.modules

Can it be confusing?

Probably not from the perspective of PyO3 users, but it may be confusing for developers of PyO3 itself. The user would see pyo3_panics as a module that can be imported in order to access PanicException, which would get raised on panic! whether or not the user imported the pyo3_panics module, looking just like standard Python where they imported some module that throws an exception imported from some other module that __main__ didn't import.

The only edge cases are:

  • when they have something else named pyo3_panics -- PyO3 might be forced to abort on panic if the pyo3_panics module is messed up.
  • when they try to import pyo3_panics before the Rust library is imported and they didn't install the pyo3_panics package using pip or similar.

@programmerjake
Copy link
Contributor

Additionally, when PyO3 is translating an exception to PyErr, it might be good to panic again if the exception is a PanicException in order to propagate panics.

@davidhewitt
Copy link
Member Author

How about recommending users to register our custom exception module to their module?

Couldn't we do this automatically? E.g. add a _pyo3 submodule to the top-level module exposed by the extension.

If that can work, I think I prefer it to a globally installed module. Though the globally installed module is a nice idea too; it's just more work for us :D

Additionally, when PyO3 is translating an exception to PyErr, it might be good to panic again if the exception is a PanicException in order to propagate panics.

Yes I think this is a good idea.

@programmerjake
Copy link
Contributor

One other benefit of the global module is it allows multiple independent PyO3 libraries to use the exact same type.

@davidhewitt
Copy link
Member Author

One other benefit of the global module is it allows multiple independent PyO3 libraries to use the exact same type.

True, but it also complicates compatibility if we ever needed to change this module for any reason.

I'm very torn on which is the best approach 😄

@kngwyu
Copy link
Member

kngwyu commented Mar 13, 2020

Couldn't we do this automatically? E.g. add a _pyo3 submodule to the top-level module exposed by the extension.

My 1st attempt is placed on #805

If that can work, I think I prefer it to a globally installed module.

I don't know how we can do it. Maybe we need one more .so file?

@kngwyu
Copy link
Member

kngwyu commented Apr 11, 2020

I'm suprised by that -

Yeah, I'm not sure what happens too 🙄. Please run the test code yourself.

@davidhewitt
Copy link
Member Author

Please run the test code yourself.

👍 Give me a few days to play with the printing also and I'll report back

@kngwyu
Copy link
Member

kngwyu commented May 1, 2020

@davidhewitt
I'm not still convinced that we should panic when fetching a panic error, but leaving this PR unmerged is worse.
If you still think we need to retrieve panic, then let's merge this PR as is. We can fix this behavior if a user complains about it.

@davidhewitt
Copy link
Member Author

davidhewitt commented May 1, 2020

@kngwyu I think retrieving panics is the way to go, so if you want to see this PR merged, that's good with me. I can re-open #853 after to change the re-pancking to use resume_unwind! instead of panic!.

@davidhewitt
Copy link
Member Author

First need to fix travis. I can do that this weekend.

@davidhewitt
Copy link
Member Author

FYI haven't forgotten this, will do tomorrow.

@davidhewitt davidhewitt force-pushed the catch-unwind branch 2 times, most recently from e4185be to 2904fec Compare May 3, 2020 12:14
@davidhewitt
Copy link
Member Author

@kngwyu This is now ready for review and/or merge if you want. I'd still like to improve PanicException to carry the payload so that I can use resume_unwind! instead of panic!, but first I think I might have to finish off #686 to make that possible...

@davidhewitt davidhewitt marked this pull request as ready for review May 3, 2020 12:15
src/err.rs Show resolved Hide resolved
@davidhewitt
Copy link
Member Author

I've added a note to that doc.

I've also improved the error message that occurs. Now, with this sample code:

use pyo3::prelude::*;
use pyo3::exceptions;
use pyo3::types::{PyDict, IntoPyDict};
use pyo3::{ffi, AsPyPointer, AsPyRef};

#[pyclass]
struct MyClass { }

#[pymethods]
impl MyClass {
  fn panic(&self) {
    panic!("Hello!");
  }
}

fn main() {
  let gil = Python::acquire_gil();
  let py = gil.python();

  let c = PyCell::new(py, MyClass { }).unwrap();

  pyo3::py_run!(py, c, "def foo(c): c.panic()\n  \nfoo(c)");
}

This is the output:

thread 'main' panicked at 'Hello!', src\main.rs:13:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
--- PyO3 is resuming a panic after an uncaught PanicException ---
Python stack trace below:

Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "<string>", line 1, in foo
pyo3_runtime.PanicException: Hello!
error: process didn't exit successfully: `target\debug\pyo3-scratch.exe` (exit code: 101)

src/err.rs Outdated
pub fn fetch(_: Python) -> PyErr {
///
/// If the current error is a `PanicException` (which would have originated in other pyo3 code)
/// then this function will resume the uncaught panic.
Copy link
Member

@kngwyu kngwyu May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does uncaught mean?
We catch the panic and resume it here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I meant the Python exception specifically was not caught in Python code. Perhaps this can be phrased better?

@davidhewitt
Copy link
Member Author

FYI while finishing this branch off I had an idea for a nice refactoring #901. But I wanted to open that as a separate PR, so please review that PR first and then I will rebase this one and finish up the documentation suggestion requested.

@davidhewitt davidhewitt force-pushed the catch-unwind branch 2 times, most recently from a0eff02 to 85e043d Compare May 5, 2020 07:17
@davidhewitt
Copy link
Member Author

Rebased on #901 and updated the PyErr::fetch documentation. Let me know if you want anything else changed. 👍

@kngwyu
Copy link
Member

kngwyu commented May 5, 2020

Thank you for lots of work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panics should be caught at the FFI boundary and propagated into python some other way
5 participants