Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32 bit ctypes stdcall callback fails to restore stack pointer #82929

Closed
DavidHeffernan mannequin opened this issue Nov 8, 2019 · 17 comments
Closed

32 bit ctypes stdcall callback fails to restore stack pointer #82929

DavidHeffernan mannequin opened this issue Nov 8, 2019 · 17 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows topic-ctypes type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@DavidHeffernan
Copy link
Mannequin

DavidHeffernan mannequin commented Nov 8, 2019

BPO 38748
Nosy @pfmoore, @tiran, @tjguk, @zware, @zooba, @michaelDCurran
PRs
  • bpo-38748: gh-82929: Add test for stack corruption. #26204
  • Files
  • py3.8crash.zip: Testcase
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-11-08.16:57:34.359>
    labels = ['3.8', 'type-crash', '3.10', 'ctypes', 'OS-windows', '3.9']
    title = '32 bit ctypes stdcall callback fails to restore stack pointer'
    updated_at = <Date 2021-11-15.23:13:25.310>
    user = 'https://bugs.python.org/DavidHeffernan'

    bugs.python.org fields:

    activity = <Date 2021-11-15.23:13:25.310>
    actor = 'michaeldcurran'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Windows', 'ctypes']
    creation = <Date 2019-11-08.16:57:34.359>
    creator = 'David Heffernan'
    dependencies = []
    files = ['49959']
    hgrepos = []
    issue_num = 38748
    keywords = ['patch']
    message_count = 10.0
    messages = ['356249', '357345', '391095', '391096', '391195', '391196', '391197', '391388', '406369', '406371']
    nosy_count = 8.0
    nosy_names = ['paul.moore', 'christian.heimes', 'tim.golden', 'python-dev', 'zach.ware', 'steve.dower', 'David Heffernan', 'michaeldcurran']
    pr_nums = ['26204']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue38748'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @DavidHeffernan
    Copy link
    Mannequin Author

    DavidHeffernan mannequin commented Nov 8, 2019

    Starting with Python 3.8 certain ctypes callbacks fail to restore the stack pointer.

    In the repo below, when the DLL is compiled with MSVC under default debug settings, running the Python script leads to a debug error dialog which says:

    Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

    It appears that when the C code calls the callback function, the value of ESP is 4 greater than it should be.

    This problem does not occur with older versions of Python.

    **DLL code**

    #include <Windows.h>
    
    BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
    {
        switch (ul_reason_for_call)
        {
        case DLL_PROCESS_ATTACH:
        case DLL_THREAD_ATTACH:
        case DLL_THREAD_DETACH:
        case DLL_PROCESS_DETACH:
            break;
        }
        return TRUE;
    }
    
    typedef void (__stdcall *MYCALLBACK)(int, double);

    extern "C"
    {
    __declspec(dllexport) void __stdcall foo(MYCALLBACK callback)
    {
    callback(1, 11);
    callback(2, 21);
    callback(3, 31);
    }
    }

    **Python code**

    import ctypes
    import ctypes.wintypes
    
    def CallbackType(restype, *argtypes):
    
        def from_param(cls, obj):
            if obj is None:
                return obj
            return ctypes._CFuncPtr.from_param(obj)
    
        result = ctypes.WINFUNCTYPE(restype, *argtypes)
        result.from_param = classmethod(from_param)
        return result
    
    MYCALLBACK = CallbackType(
        None,
        ctypes.c_int,
        ctypes.c_double
    )
    
    def callback(handle, time):
        print(handle, time)
    mycallback = MYCALLBACK(callback)
    
    lib = ctypes.WinDLL(r'path\to\dll\foo.dll')
    func = getattr(lib, '_foo@4')
    func.restype = None
    func.argtypes = MYCALLBACK,
    func(mycallback)

    @DavidHeffernan DavidHeffernan mannequin added 3.8 (EOL) end of life topic-ctypes type-crash A hard crash of the interpreter, possibly with a core dump labels Nov 8, 2019
    @zooba
    Copy link
    Member

    zooba commented Nov 22, 2019

    I would guess this is due to the updated libffi, and probably it's a case that is not sufficiently tested.

    Adding tests is the first step. It'll probably have to enumerate many parameter types (in case there's something weird here like misaligning the double after the int and not clearing it up, as opposed to a constant 4-byte offset).

    Hopefully then the issue is on our side and not part of libffi, but it could be anywhere.

    @zooba zooba added the 3.9 only security fixes label Nov 22, 2019
    @michaeldcurran
    Copy link
    Mannequin

    michaeldcurran mannequin commented Apr 14, 2021

    I can also reproduce this. I will attach my own testcase below.
    So far I see it when the callback is __stdcall (WINFUNCTYPE) and it takes an larger than 4 bytes (E.g. a long long or a VARIANT), with one or more arguments preceeding it such that this argument is not aligned on a multiple of 8 bytes.
    For example arguments can be:

    • long, long long
    • long, long, long, long long
      But the corruption does not occur with something like:
    • long, long, long long
      My testcase uses long, long long to show the crash.

    @michaeldcurran
    Copy link
    Mannequin

    michaeldcurran mannequin commented Apr 14, 2021

    This bug is reproduceable on both Python 3.8 and 3.9. But not 3.7.
    Ths bug is seen in the real world, in the context of providing Python callbacks to Win32 API functions, or when implementing comtypes COM objects in Python.
    For example, we see this crash in the NVDA screen reader project, in our implementation of ITTSBufNotifySink from the Microsoft SAPI4 speech API.
    ITTSBufNotifySink::TextDataStarted takes a pointer (this), and a long long (qTimestamp).
    nvaccess/nvda#12152

    @zooba
    Copy link
    Member

    zooba commented Apr 16, 2021

    My previous post still stands. This requires a test and then (hopefully) a fix in ctypes or (potentially) a fix upstream in libffi.

    @DavidHeffernan
    Copy link
    Mannequin Author

    DavidHeffernan mannequin commented Apr 16, 2021

    @Steve as a user of Python rather than a Python developer, I don't know what the process here. I understand the need for a test, and then a fix. And I would not be surprised if the act of fixing the issue led to a broadening of the test if the scope of the defect turns out to be wider than noted in the comments so far.

    What is the process for a resolution to the issue being found. You've outlined the steps required, but how will they happen?

    @tiran
    Copy link
    Member

    tiran commented Apr 16, 2021

    It would help us if you or Michael could provide a minimal reproducer of the crash in form of a unit test and submit it as pull request.

    @tiran tiran added the 3.10 only security fixes label Apr 16, 2021
    @zooba
    Copy link
    Member

    zooba commented Apr 19, 2021

    Even better, one that doesn't crash but safely returns a value that can be checked. (IIRC, we have a test that does this to ensure that structs smaller than 9 bytes are passed on the stack.)

    Half C/half Python is fine - the C bits would go into _ctypes_test.c

    Given how well you reported the issue in the first place, I'm confident the quick start in the devguide will get you up and building quickly: https://devguide.python.org/ Adding the test might be a bit more of an adventure (though not as much as fixing it...)

    @michaeldcurran
    Copy link
    Mannequin

    michaeldcurran mannequin commented Nov 15, 2021

    As requested, I created a pr which adds a test to show the crash. It is marked as expected failure on 32 bit (x86) and runs successfully as expected on x64.

    What would be the next steps in moving this issue forward?

    I don't think I have the knowledge to be able to start on an actual fix for the bug myself, but I and my organization are happy to assist where we can.

    A little bit of background:
    The NvDA screen reading software allows blind and vision impaired people across the globe to access the Windows Operating System independently, improving socialization, education and employment outcomes. It is used by around 200,000 people in over 150 countries.

    The NVDA screen reader is written primarily in Python, and is currently using Python 3.7. We would like to upgrade to a newer Python release, but currently cannot due to this bug. We currently use the x86 build of Python as a small (but significant) number of our users are still on a pure 32-bit build of Windows. Taking into account the fact that are main demographic are people from developing countries, it is not simple for many of our users to upgrade their existing hardware to 64-bit.

    There will come a time where we will drop x86 support, and just release an x64 build of NVDA, but right now our pure x86 user numbers are still too high.

    We would love to be able to get our 200,000 users onto a more recent and secure Python version as soon as possible, but we can only do this once this bug is addressed.

    This bug has now been open for 2 years. If the bug is impossible to fix, or no one is able, then perhaps Python needs to consider dropping support for x86, as currently this build can cause stack corruption, as demonstrated by this issue and the pr tests.

    @michaeldcurran
    Copy link
    Mannequin

    michaeldcurran mannequin commented Nov 15, 2021

    PR: #26204
    Looks like a maintainer needs to allow a workflow to run for the remaining checks.

    @junkmd
    Copy link
    Contributor

    junkmd commented Nov 27, 2022

    Perhaps this is the cause of the long unresolved problem with comtypes.

    Please see enthought/comtypes#212.
    The problem also occurs in Python 3.11.

    Does anyone have knowledge/opinion on this?
    Any opinions would be appreciated.

    @davidheff
    Copy link

    I'm not very familiar with how bug resolution works in this project. What is the process by which this bug gets fixed?

    @zooba
    Copy link
    Member

    zooba commented Sep 1, 2023

    It's basically described at https://devguide.python.org/getting-started/pull-request-lifecycle/ along with all the other information about contributing. You'll also find it linked from the "contributing guidelines" in this repo, which GitHub links prominently all over the place (e.g., immediately below this textbox).

    @davidheff
    Copy link

    A pull request would be something I would do if I was a developer and had a solution. But I'm not a Python developer. I was more wondering if anybody was likely to attempt to solve this.

    @pfmoore
    Copy link
    Member

    pfmoore commented Sep 1, 2023

    I was more wondering if anybody was likely to attempt to solve this.

    Basically, what you're asking is whether anyone is likely to be motivated to put their time into looking into this problem and coming up with a fix. It's honestly impossible to answer that - most (if not all) contributions to Python are by volunteers who find a problem sufficiently interesting that they are motivated to dig into it, or by people directly affected by the issue. In this case it seems that the people impacted are not able to contribute a fix, so you're relying on "interested bystanders". Unfortunately, it's likely that people interested in low-level 32-bit Windows issues like this are few and far between, so you'll just need to be patient (or, I guess, find someone you can pay to do the investigation and develop a solution, but that's unlikely to be a realistic option in most cases).

    @seanbudd
    Copy link

    seanbudd commented Sep 2, 2023

    This can probably closed as this has been fixed.
    More discussion in #26204 #96965

    @davidheff
    Copy link

    OK, this is a very happy surprise. I've just checked with my code that tripped over the original defect. It fails in 3.8-3.10, but works correctly in 3.11. So the issue does indeed seem to be fixed. Thanks for all involved in doing this.

    @zooba zooba closed this as completed Sep 4, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows topic-ctypes type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants