Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in a finalizer on PyPy #98

Open
whitequark opened this issue Apr 15, 2024 · 15 comments
Open

ValueError in a finalizer on PyPy #98

whitequark opened this issue Apr 15, 2024 · 15 comments

Comments

@whitequark
Copy link
Contributor

When running code that works fine on CPython under PyPy, I'm getting a ValueError raised here:

    def __registerFinalizer(self, handle, finalizer):
        if handle in self.__finalizer_dict:
            finalizer.detach()
            raise ValueError
        self.__finalizer_dict[handle] = finalizer

Is this just a remnant on some debug code that happens to work by chance on CPython, or is there an actual reason to raise it? I'm very confused.

I'm using python-libusb1 3.1.0.

@vpelletier
Copy link
Owner

vpelletier commented Apr 15, 2024

This is not a remnant, but a way to cleanly fail instead of having the finalizer randomly break (ex: on interpreter shutdown). __finalizer_dict exists both to enforce the finalization order between instances (basically: the USB context object must be kept alive for any object obtained from it can be finalized) and to allow the "owner" instance to cascade closure requests to its dependents (closing a device handle closes all transfers related to that device).

So if two finalizers fall into the same entry in __finalizer_dict, it is a bug, and this raise is here to alert about this before this becomes the usual nightmare of garbage-collection-related heisenbugs, which themselves could turn into segfaults when libusb calls happen in the wrong order.

Could you post more of the traceback which leads to this failure ? Especially, I wonder which class' __registerFinalizer this is about, and what the caller is so I can figure out what the handle is computed from and hence guess why it may be reused.

Also, do you have a suggestion to clarify the intent of those raises, so they do not look like they may be leftover debugging code ?

EDIT: finalization order happens because references to the required objects are held by the finalizer itself, not by __finalizer_dict.

@whitequark
Copy link
Contributor Author

Especially, I wonder which class' __registerFinalizer this is about, and what the caller is so I can figure out what the handle is computed from and hence guess why it may be reused.

That was USBDeviceHandle. Probably I got some other exception and it's crashed in the finalizer instead of whichever other way it was going to...

@vpelletier
Copy link
Owner

vpelletier commented Apr 16, 2024

Is there a way for me to try to replicate this issue ? I unfortunately have very little time these days (which I would otherwise spend trying to get the new python-libusb1 version out, master has several improvements which were long overdue) but maybe I can figure something out.

At least, I did a quick try of the python-functionfs usbcat example (running device.py with CPython3 and host.py with pypy3 7.3.14), using kernel's dummy_hcd, sent a few packets back and forth and did not get any obvious failure.

In case you would like to reproduce, it goes like this:

Terminal 1, in python-functionfs:

$ sudo modprobe dummy_hcd
$ sudo modprobe libcomposite
$ virtualenv vpy3
$ vpy3/bin/pip install .
$ sudo vpy3/bin/python examples/usbcat/device.py

Terminal 2, in python-libusb1:

$ virtualenv -p pypy3 vpypy3
$ vpypy3/bin/pip install .
$ sudo vpypy3/bin/python .../examples/usbcat/host.py

But this may be too simple of an example to have a chance of triggering the issue.

@whitequark
Copy link
Contributor Author

Is there a way for me to try to replicate this issue ?

Unfortunately the way I discovered it is by running an userspace RGMII PHY driver with Glasgow on PyPy3.9.

@vpelletier
Copy link
Owner

on PyPy3.9

This may be the reason: I use object ids as keys, and this is where you triggered a duplicate value conflict. IIRC object lifecycles in pypy are subtly different from cpython, and I would not be too surprised if this affected finalizer call order.

Here are some things I can think of:

  • Adding randomness to the value used to keep track of the finalizers. This feels uggly though.
  • Triggering a GC when conflict happens, and re-checking for conflict ? It should trigger infrequently, but may get in the way of higher-level code (ex: maybe the caller is trying to debug a memory leak and disabled the GC on purpose).

No better idea so far.

@whitequark
Copy link
Contributor Author

Can you assign a property to an object containing a sequential numeric id, or will that get destroyed before the finalizer is called?

@vpelletier
Copy link
Owner

Excellent idea. This is not only much simpler, but it showed me a way to simplify the entire thing.

I ended up using a single itertools.count, stored as a class attribute: dab4906

This should resolve this issue, not only because of the removal of the reliance on object id, but also because I removed the raise statements (because if itertools.count produces duplicate values, I'm not sure anything meaningful can be done).

How does that look to you ?

@whitequark
Copy link
Contributor Author

whitequark commented Dec 30, 2024

As I understand, you are using itertools.count() to handle concurrency, i.e. you assume that calls to next(itertools.count()) are atomic. I don't believe this is guaranteed by anything in the language and it would be perfectly fine for someone to create a Python runtime where it's not atomic. In general you should be using a mutex.

However, I have examined the current implementations in CPython and PyPy and found them to provide atomicity. The PyPy one is written in "interpreter level" code; as far as I understand, "interpreter level" Python code in PyPy is atomic in the same way C code is atomic in CPython, aside from explicit yield points.

Personally I would use a mutex but I would also not die on this hill.

@vpelletier
Copy link
Owner

vpelletier commented Dec 30, 2024

Excellent point, thank you very much for your careful review.

My general position on concurrency in python-libusb1 is something like:

  • support it...
  • ...for reasonable uses (ex: the code will not fight against a caller setting pollFD notifiers multiple times in parallel)
  • ...but even for unreasonable uses it must not crash (ex: an object handed out to C code must not be up for garbage collection until it is handed back)
  • ...as much as possible without imposing performance costs on well-behaved code (like taking locks when caller may be single threaded)

I think count atomicity belongs to the 3rd point: if it goes wrong, some object will not be finalized in the proper order and may cause a segfault. So I must think more - hopefully I will have an idea not involving a lock.

@whitequark
Copy link
Contributor Author

whitequark commented Dec 30, 2024

class Finalizer:
    per_thread = threading.local()
    
    def next_sequence(self):
        self.per_thread.sequence = getattr(self.per_thread, "sequence", 0) + 1
        return (self.per_thread.sequence, threading.current_thread().ident)

@whitequark
Copy link
Contributor Author

(Sorry, I've unintentionally omitted a part of the code before; I've edited it to reflect that.)

This solution ambiently assumes that you aren't sharing USBContext objects between threads, i.e. by the time a thread is dead, all of the objects referenced by it are also dead. I think that is a reasonable assumption but it should perhaps be made explicit.

@vpelletier
Copy link
Owner

by the time a thread is dead, all of the objects referenced by it are also dead

Indeed, if a subsequent thread is spawned and gets the same thread id, then duplicate keys may happen again.

I suspect there are setups where this may not hold though, like a worker-thread design where the number of threads changes and objects are not assigned to any given thread.

...The song of the lock-mermaids sure is tempting.

@whitequark
Copy link
Contributor Author

An uncontended lock should be quite cheap to take, and contention seems like it wouldn't be very common (how many people even have more than one USB context in first place?)

@vpelletier
Copy link
Owner

It seems to be cheap indeed. I added a lock in 11e700b .

@whitequark
Copy link
Contributor Author

Nice! I'm a bit exhausted lately so I can't test this immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants