Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension #137

Closed
Bltzmn opened this issue Jan 20, 2024 · 3 comments

Comments

@Bltzmn
Copy link

Bltzmn commented Jan 20, 2024

Dear Bjodah, Hello! Thanks a lot for usefull software!

I would like to share my problem that I encountered while using the cvode module. If You have the opportunity to give any advice, I will be very grateful.

The problem I'm facing is this: when I do a large number of iterations of integration using, for example, integrate_predefined or integrate_adaptive methods - the code throws an error:

Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension
 Python runtime state: initialized
 Current thread 0x00007fc8bac2e440 (most recent call first):
  File "<name>.py", line <number> in <module>
  Aborted (core dumped)

For example, I prepared a simple non-physical example based on Your example with the classic van der Pol oscillator, where we perform the integration operation many times. (For example, when we need to calculate this task locally at different points in space, at different time steps).

from pycvodes import integrate_adaptive
mu = 1.0
def f(t, y, dydt):
    dydt[0] = y[1]
    dydt[1] = -y[0] + mu*y[1]*(1 - y[0]**2)

def j(t, y, Jmat, dfdt=None, fy=None):
    Jmat[0, 0] = 0
    Jmat[0, 1] = 1
    Jmat[1, 0] = -1 - mu*2*y[1]*y[0]
    Jmat[1, 1] = mu*(1 - y[0]**2)
    if dfdt is not None:
        dfdt[:] = 0

atol=1e-4; rtol=1e-20
for iter in range(1000):
    k = 0
    for jiter in range(1000):
        t0 = k
        tend = k + 1 # so let the step to be 1 second
        k+=1
        if t0==0:
            y0 = [1, 0]
        else:
            y0 = [y_prev1, y_prev2]

        tout, yout, info = integrate_adaptive(f, j, y0, t0, tend, atol, rtol,
                                           method='bdf')

        y_prev1 = yout.T[0][-1]
        y_prev2 = yout.T[1][-1]

        print(iter, jiter)

Output:

Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension
 Python runtime state: initialized
 Current thread 0x00007fc8bac2e440 (most recent call first):
  File "<name>.py", line 27 in <module>
  Aborted (core dumped)

My Ubuntu 22.04 x64 machine with Python 3.9, conda 23.1.0, pycvodes '0.14.1', pyodesys '0.14.2', sundials '6.5.1' returns this deallocating None error after 6824 iterations of this code.

And my another Ubuntu 22.04 x64 machine with Python 3.11, conda 23.7.4, pycvodes '0.14.2', pyodesys '0.14.2', sundials '6.5.1' returns this deallocating None error after 5075 iterations of this code.

But the iteration number in the week itself depends on the task. So, for a chemical kinetics problem, 114800 iterations of integration pass before an error occurs.

I also looked at similar issues that people have linked to, for example: ajakubek/python-llist#11. And I looked at Your cython and C code, but couldn’t find anything resembling these errors in Your code.

If You have the opportunity to give any advice on how to correct the situation, I would be very grateful.

Have a nice time!

@bjodah
Copy link
Owner

bjodah commented Jan 21, 2024

@Bltzmn thank you for reporting this. pycvodes uses the "anyode" submodule which has some manual refcounting (which I probably messed up somewhere). You can find the relevant file here:

https://github.com/bjodah/anyode/blob/c06c0dd1f50b6e7bca8db4d0e9d017d026b05936/include/anyode/anyode_numpy.hpp

If I search for Py_None in that file, I see that I have tried to construct it with refcounting in mind. Perhaps the destructor ~PyOdeSys is out of sync. (I can't determine by quickly glancing over it).

Given this issue, I probably shouldn't have done this manually (or have been more careful), and left this to e.g. cython instead, but at the time I wrote it, the cython version was rather slow compared to the hand-rolled code you see there.

I don't know if a debug build of CPython perhaps contains asserts to help with deubbing this. It might be worth looking into. Hopefully, I will eventually find the time to hunt this down. But it might be a while (months), but if you or someone else want to try to fix this in the mean time, I'll find the time to review and publish a bugfixed version.

Some links to the relevant parts of the CPython API-docs for anyone looking into this:

I notice that the lifetime handling of None has changed in CPython 3.12, so bonus points (for me or anyone else looking into this) for making sure it works on CPython 3.12+ also.

@bjodah
Copy link
Owner

bjodah commented Apr 12, 2024

So I found some time to look into this. I can reproduce your error, and I this is the culprit:
https://github.com/bjodah/anyode/blob/c06c0dd1f50b6e7bca8db4d0e9d017d026b05936/include/anyode/anyode_numpy.hpp#L42

I should not be decreasing the reference count there. Adding print(sys.getrefcount(None)) to the example you provided confirms this analysis. I will be releasing an updated version with that line removed shortly. Thank you for your patience.

@bjodah
Copy link
Owner

bjodah commented Apr 14, 2024

This should be fixed in 0.14.3:
https://pypi.org/project/pycvodes/0.14.3/

Thank you for reporting this!

@bjodah bjodah closed this as completed Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants