Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown of pytest collection in sqlalchemy in python 3.9+ #9343

Closed
4 tasks
CaselIT opened this issue Nov 26, 2021 · 31 comments
Closed
4 tasks

Slowdown of pytest collection in sqlalchemy in python 3.9+ #9343

CaselIT opened this issue Nov 26, 2021 · 31 comments

Comments

@CaselIT
Copy link

CaselIT commented Nov 26, 2021

  • a detailed description of the bug or problem you are having
  • output of pip list from the virtual environment you are using
  • pytest and operating system versions
  • minimal example if possible

We noticed that from python 3.9 pytest takes a significantly more time (~4x) to collect the tests for sqlalchemy.
You can run the following docker to compare. python 3.7 and 3.8 are ok (not fast not slow), python 3.9 and 3.10 are noticeably slower

docker run -ti --name py --rm python:3.7 bash -c 'git clone https://github.com/sqlalchemy/sqlalchemy.git --depth 100; pip install pytest; cd sqlalchemy; time pytest --collect-only'
docker run -ti --name py --rm python:3.8 bash -c 'git clone https://github.com/sqlalchemy/sqlalchemy.git --depth 100; pip install pytest; cd sqlalchemy; time pytest --collect-only'
docker run -ti --name py --rm python:3.9 bash -c 'git clone https://github.com/sqlalchemy/sqlalchemy.git --depth 100; pip install pytest; cd sqlalchemy; time pytest --collect-only'
docker run -ti --name py --rm python:3.10 bash -c 'git clone https://github.com/sqlalchemy/sqlalchemy.git --depth 100; pip install pytest; cd sqlalchemy; time pytest --collect-only'

the times on my pc are as floows:

#py37
real    0m27.949s
user    0m26.717s
sys     0m1.220s
#py38
real    0m27.952s
user    0m26.893s
sys     0m1.050s
# py39
real    1m45.581s
user    1m42.574s
sys     0m2.990s
# py310
real    1m46.255s
user    1m43.648s
sys     0m2.590s

cc @zzzeek

@CaselIT CaselIT changed the title Slowdown of pytest collection in sqlalchemy since python 3.9 Slowdown of pytest collection in sqlalchemy in python 3.9+ Nov 26, 2021
@bluetech
Copy link
Member

Sounds like #9144. Could you try replacing pip install pytest with pip install git+https://github.com/pytest-dev/pytest.git and check the timing?

@CaselIT
Copy link
Author

CaselIT commented Nov 26, 2021

No tests are collected when installing from git, I guess we have some incompatibility with v7 on sqlalchemy side. (no error though)

@CaselIT
Copy link
Author

CaselIT commented Nov 26, 2021

Is there a changelog for v7 so I can try seeing what needs changing? The fact that no error is raised makes it hard to pinpoint a particular incompatibility

@CaselIT
Copy link
Author

CaselIT commented Nov 26, 2021

Installing from the 6.2.x branch makes no difference in the timing

@The-Compiler
Copy link
Member

Tentatively adding this to the 7.0 project board as a reminder to myself to take a quick look at what might be going on.

@bluetech
Copy link
Member

Looked into it, no tests are collected due to #9277. sqlalchemy does some collection magic in its pytestplugin.py which relies on isinstance(pytest.Instance) checks, something I didn't find when looking investigating for #9277.

We should either revert #9277, or, if we consider sqlalchemy to be a unique case of this, fix it there.


Back to the original issue, @CaselIT you can try the commit before #9277 was merged, i.e.

pip install git+https://github.com/pytest-dev/pytest.git@17b38259fde89caaf44a4e578d9f62d001a894c9

@bluetech
Copy link
Member

Link to the sqlalchemy pytest plugin: https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/testing/plugin/pytestplugin.py (search for pytest.Instance -- two hits).

@zzzeek
Copy link

zzzeek commented Nov 27, 2021

we can fix the isinstance() call, whats something we can change it towards that will work in pytest 6 and 7 ? (could just be isinstance(x, (Instance, SomethingElse)) but a duck-typing type of check would be preferred.)

@zzzeek
Copy link

zzzeek commented Nov 27, 2021

btw thanks for the fast help @bluetech ! I was initially a little nervous we wouldn't be able to get quick help with a less obvious issue like this.

@bluetech
Copy link
Member

I was only able to take a quick glance at sqlalchemy's plugin code. I can definitely look at it more later, though I can maybe give you a quick answer if you can explain why the plugin checks for pytest.Instance (what's the purpose of the checks).

(Just to be clear, we will definitely not release something that will break sqlalchemy's plugin -- it only uses public API and our policy is never to break that without at least a deprecation period. The pytest.Instance change was made after analysis of actual usage but didn't catch sqlalchemy's plugin since it only looked at standalone pytest plugins).

@bluetech
Copy link
Member

BTW, is the plugin only used by sqlalchemy's own tests, or is it meant for public use as well?

@zzzeek
Copy link

zzzeek commented Nov 27, 2021

the plugin is used in a public way by other dialects that want to integrate with SQLAlchemy's test suite. So yes, it is public for third party dialects.

we are using isinstance() because we iterate through the list of pytest items in pytest_collection_modifyitems and we expand out some of the testing objects into many more tests. As we iterate through this collection, we observed that objects of type "Instance" and these were the objects we actually needed, that is, they represent actual tests. so in this hook we need a way to iterate through items and identify test methods, like "SomeTestSuite().test_foo()" (all of our tests are methods on classes).

the other place we use it is in pytest_pycollect_makeitem where we are checking for "Instance" as a means to identify test objects that we want to include in the suite, there seems to be some kind of filtering going on here.

as far as what API we need here, it just needs to be that we are able to continue seeing individual tests as objects in some way, and identify them as such. if that's going away altogether then we have bigger issues.

@zzzeek
Copy link

zzzeek commented Nov 27, 2021

correction, it looks like "Instance" is how we identify the parent object (the collector?) that owns a series of tests that we want to include or exclude.

it's just part of our traversal so we just need to know how to correctly traverse.

@nicoddemus
Copy link
Member

nicoddemus commented Nov 27, 2021

it's just part of our traversal so we just need to know how to correctly traverse.

What happened is that we got rid of pytest.Instance, #9277 has a detailed description of the rationale, but in summary the collection changed from this:

<Module test_file.py>
  <Class TestIt>
    <Instance ()>
      <Function test_it>

To this:

<Module test_file.py>
  <Class TestIt>
    <Function test_it>

Looking at the plugin I see usages like this:

https://github.com/sqlalchemy/sqlalchemy/blob/9c5dfddfbd405a72d33b72ed097690a6c2c88b3a/lib/sqlalchemy/testing/plugin/pytestplugin.py#L197-L202

So I think you just need to change similar checks from:

if isinstance(item.parent, pytest.Instance)

To:

if isinstance(item.parent, pytest.Class)

You want to keep pytest<7 compatibility, so something like this would be needed at the top of the file:

if pytest.version_tuple[0] >= 7:
    CollectParent = pytest.Class
else:
    CollectParent = pytest.Instance

And then:

if isinstance(item.parent, CollectParent)

Sorry I can't test this myself now.

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

Thanks for all the help and suggestions!

Back to the original issue, @CaselIT you can try the commit before #9277 was merged, i.e.

pip install git+https://github.com/pytest-dev/pytest.git@17b38259fde89caaf44a4e578d9f62d001a894c9

I confirm that this commit does collect and also the slowdown in gone:

py 3.10 with 17b38259fde89caaf44a4e578d9f62d001a894c9
real    0m25.652s
user    0m24.515s
sys     0m1.131s

would it be possible to backport that improvement to the v6 series? @bluetech
alternatively is there an approximate date you are aiming towards for v7?


Regarding your suggestion @nicoddemus that fixes the collection but it will require other changes. The issue is that we use parent.parent in many places and we now should just use parent.
This change seems one that may trip many places even if they are not using Instance directly.

We also tripping an assert while adding a finalizer:

  File "test\..\lib\sqlalchemy\testing\plugin\pytestplugin.py", line 422, in pytest_runtest_setup
    get_test_class_from_fn(item).addfinalizer(finalize)
  File "lib\site-packages\_pytest\nodes.py", line 405, in addfinalizer
    self.session._setupstate.addfinalizer(fin, self)
  File "\lib\site-packages\_pytest\runner.py", line 510, in addfinalizer
    assert node in self.stack, (node, self.stack)
AssertionError: (<Class TestAsyncAdaptedQueue>, {})

pdbing there in v6 shows that the_class.session._setupstate.statck is also empty (and stack is a list not a dict):

# pdb on v6.2.5
-> get_test_class_from_fn(item).addfinalizer(finalize)
(Pdb) get_test_class_from_fn(item).session._setupstate.stack
[]

are are trying to add it in pytest_runtest_setup here:
https://github.com/sqlalchemy/sqlalchemy/blob/9c5dfddfbd405a72d33b72ed097690a6c2c88b3a/lib/sqlalchemy/testing/plugin/pytestplugin.py#L400

what's the suggested way of adding a class finalizer?

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

btw I've started supporting pytest v7 here https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3332

edit: apart from the addfinalizer issue above, everything seems to be working with v7. I confirm that also in windows the speedup is very significant, 1:33s down to 18s

edit of edit: no still some work to be done. we seem to be loosing the class variables from the test function that handle the skips. probably related to the parent change

@zzzeek
Copy link

zzzeek commented Nov 27, 2021

cant wait for the speedup! ive been pretty depressed about the slow collection times for some time now, there have been different "slow collection" issues over the years (like in the 4.x series too with fixtures).

@nicoddemus
Copy link
Member

nicoddemus commented Nov 27, 2021

alternatively is there an approximate date you are aiming towards for v7?

We are planning a release candidate pretty soon, likely within the week, so shouldn't be far.

We also tripping an assert while adding a finalizer

The traceback shows get_test_class_from_fn, does that function return pytest.Class both in v6 and v7? If so things should just work, you should be able to attach finalizers to classes.

are are trying to add it in pytest_runtest_setup here

I don't see get_test_class_from_fn in the link, was that removed afterwards?

Ahh I see it in Gerrit. You actually don't need that function, item.getparent(pytest.Class) will return the class for that item, or None in case it doesn't have a class (it is a free function). And this will work in v6 and v7.

Same is valid for other places like this:

    items[:] = sorted(
        newitems,
        key=lambda item: (
            get_test_class_from_fn(item).parent.name,
            get_test_class_from_fn(item).name,
            item.name,
        ),
    )

Which can be written as:

    items[:] = sorted(
        newitems,
        key=lambda item: (
            item.getparent(pytest.Module).name,
            item.getparent(pytest.Class).name,
            item.name,
        ),
    )

item.getparent(pytest.Module) will always return a pytest.Module of course.

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

The traceback shows get_test_class_from_fn, does that function return pytest.Class both in v6 and v7? If so things should just work, you should be able to attach finalizers to classes.

yes, that function returns a class in all cases, but an assert was added in v7 that's not present in v6.

This is v7

def addfinalizer(self, finalizer: Callable[[], object], node: Node) -> None:
"""Attach a finalizer to the given node.
The node must be currently active in the stack.
"""
assert node and not isinstance(node, tuple)
assert callable(finalizer)
assert node in self.stack, (node, self.stack)
self.stack[node][0].append(finalizer)

This is v6
def addfinalizer(self, finalizer: Callable[[], object], colitem) -> None:
"""Attach a finalizer to the given colitem."""
assert colitem and not isinstance(colitem, tuple)
assert callable(finalizer)
# assert colitem in self.stack # some unit tests don't setup stack :/
self._finalizers.setdefault(colitem, []).append(finalizer)

there is not check that callitem/node is in the stack, and in both version it's not.

Granted sqlalchemy may be doing something that should not do. Basically it's trying to attach a finalizer before the first method of a class, and it's trying to do that in pytest_runtest_setup

btw, thanks for that item.getparent(cls). I was not aware of it!

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

edit of edit: no still some work to be done. we seem to be loosing the class variables from the test function that handle the skips. probably related to the parent change

this is caused by the addfinalizer error, the global class set by sqlalchemy plugin never changes so sqlalchemy continues to checks skips of the class that failed to add finalizers

If so things should just work, you should be able to attach finalizers to classes.

it no longer seem to be possible to add finalizers before the class and the test has been setup. What I'm observing is that pytest_runtest_setup in a plugin runs before pytest implementation that setups the class, so it's impossible to add a finalizer there. Adding trylast=True makes pytest implementation run before the plugin one, setting up an item and it's class if not already setup, at the same time it will also call the class and function level fixtures.

In that sense it's no longer possible adding a finalizer to a class before the class lever fixtures and the first method fixtures are run. I've tries using hookwrapper=True on the function and add the finalizer after the setup has been done and it seems to be working. I'm not sure if it's a breaking change that may be is worth documenting?

@nicoddemus
Copy link
Member

yes, that function returns a class in all cases, but an assert was added in v7 that's not present in v6.

Oh didn't realize that. That has been added by c30feee (cc @bluetech).

Hmm yeah that assert assumes setup will be called first, which is done by the standard pytest_runtest_setup(item). That's why that comment was there in v6.

However I think we can try something to solve this, today you have this:

def pytest_runtest_setup(item):
    from sqlalchemy.testing import asyncio

    if not isinstance(item, pytest.Function):
        return

    # pytest_runtest_setup runs *before* pytest fixtures with scope="class".
    # plugin_base.start_test_class_outside_fixtures may opt to raise SkipTest
    # for the whole class and has to run things that are across all current
    # databases, so we run this outside of the pytest fixture system altogether
    # and ensure asyncio greenlet if any engines are async

    global _current_class

    if _current_class is None:
        ...
        
        def finalize():
            global _current_class, _current_report
            _current_class = None

        ...
        item.parent.parent.addfinalizer(finalize)

The "setup state" machinery is setup during pytest_runtest_setup(), but this hook (intentionally and correctly) wishes to run before that machinery is setup. This hook calls addfinalizer, which is part of the "setup state", but the added assert is veryfing that the machinery was initialized, but it was not at this point. Nothing wrong with that, it used to work fine, is public API and there's no documentation saying that you shouldn't use it.

We can debate if we should just go ahead and remove the assert from pytest, but perhaps it is possible to move the finalize definition to pytest_runtest_teardown(item). This would allow you to remove the global and bypass the problem. Something like this:

def pytest_runtest_setup(item):
    from sqlalchemy.testing import asyncio

    if not isinstance(item, pytest.Function):
        return

    # pytest_runtest_setup runs *before* pytest fixtures with scope="class".
    # plugin_base.start_test_class_outside_fixtures may opt to raise SkipTest
    # for the whole class and has to run things that are across all current
    # databases, so we run this outside of the pytest fixture system altogether
    # and ensure asyncio greenlet if any engines are async

    global _current_class

    if _current_class is None:
        asyncio._maybe_async_provisioning(
            plugin_base.start_test_class_outside_fixtures,
            item.parent.parent.cls,
        )
        _current_class = item.parent.parent
        


def pytest_runtest_teardown(item, nextitem):
    # runs inside of pytest function fixture scope
    # after test function runs

    from sqlalchemy.testing import asyncio

    asyncio._maybe_async(plugin_base.after_test, item)

    # The code below is the old finalize() in pytest_runtest_setup()
    global _current_class, _current_report
    
    if _current_class is None:
        return

    next_item_is_of_different_class = nextitem is not None and nextitem.getparent(pytest.Class).obj is not _current_class
    if next_item_is_of_different_class or nextitem is None:
        _current_class = None

        try:
            asyncio._maybe_async_provisioning(
                plugin_base.stop_test_class_outside_fixtures,
                item.parent.parent.cls,
            )
        except Exception as e:
            # in case of an exception during teardown attach the original
            # error to the exception message, otherwise it will get lost
            if _current_report.failed:
                if not e.args:
                    e.args = (
                        "__Original test failure__:\n"
                        + _current_report.longreprtext,
                    )
                elif e.args[-1] and isinstance(e.args[-1], str):
                    args = list(e.args)
                    args[-1] += (
                        "\n__Original test failure__:\n"
                        + _current_report.longreprtext
                    )
                    e.args = tuple(args)
                else:
                    e.args += (
                        "__Original test failure__",
                        _current_report.longreprtext,
                    )
            raise
        finally:
            _current_report = None

It needs some logic to detect that the last test from a class is being torn down (which you get automatically from addfinalizer)

However I do think we should reconsider just removing the assert from pytest (while adding a test to ensure this behavior is officially supported).

@nicoddemus
Copy link
Member

nicoddemus commented Nov 27, 2021

Well we posted nearly at the same time hehehe.

Let's see, seems convenient to be able to call addfinalizers even before SetupState has been setup, exactly in situations like this. By not allowing that, we force the user to write their own "teardown when test from a different class is next" logic, which is cumbersome (of course we didn't realize the implications when we added that assert).

Let's hear from @bluetech.

Meanwhile, I will open a PR removing that assert, then you can install pytest from there to get this problem out of the way.

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

many thanks @nicoddemus !

I agree that it may be useful being able to add a finalizer in the before the setup-state has be setup, but I think the solution of using hookwrapper or splitting the logic between pytest_runtest_setup or pytest_runtest_teardown is enough for sqlalchemy.

@nicoddemus
Copy link
Member

Meanwhile, I will open a PR removing that assert, then you can install pytest from there to get this problem out of the way.

Looking at the code it is not so simple, we have refactored that code significantly since then (it is more robust and faster because of that).

but I think the solution of using hookwrapper or splitting the logic between pytest_runtest_setup or pytest_runtest_teardown is enough for sqlalchemy.

That would be great. Could you try it and let us know?

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

Meanwhile, I will open a PR removing that assert, then you can install pytest from there to get this problem out of the way.

Looking at the code it is not so simple, we have refactored that code significantly since then (it is more robust and faster because of that).

yes, I noticed that's very different the code from v6

I've tried using hookwrapper is not the same since the finalizer is run before the class fixture yield tear-down. I guess it makes sense since the finalizers are run in reverse order and with yield the finalizer is effectiverly the last added.

will try using your suggestion of splitting the logic

@CaselIT
Copy link
Author

CaselIT commented Nov 27, 2021

Ok, I had to use hookwrapper in pytest_runtest_teardown since that finalizer needs to run after all the class fixture ones, but it seems to be working,

The last version is on gerrit https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3332

edit, yes the test all pass.

Having the previous behaviour of addfinalizer would be nice to have, but I think this can be closed. Thanks a lot for all the support and help!

@nicoddemus
Copy link
Member

Awesome, that's good news!

Having the previous behaviour of addfinalizer would be nice to have, but I think this can be closed.

Let's see what other maintainers say, but if we keep the assert I think we should add a message to it at least ("addfinalizer called for {item} but setup() not called").

Thanks a lot for all the support and help!

Sure, thanks a lot for the patience!

@bluetech
Copy link
Member

I believe you should be able to do away with CollectParent:

  1. For the if isinstance(item.parent, CollectParent) check in pytest_collection_modifyitems, you can change it to item.getparent(pytest.Class) is not None.
  2. In pytest_pycollect_makeitem you can change it to collector.cls is not None.

I think these are good ideas because it's less reliant on the exact collection hierarchy.

The pattern item.getparent(pytest.Class).cls and item.parent.cls etc. can be just item.cls because that already does the getparent bit. Same for module. Ideally you would just not use x.parent directly when it can be avoided.

With the above, that leaves one pytest7 check, in the per_cls_dict[inst_or_fn.name].append(inst_or_fn) code. In this case there is indeed a difference. But again I think it is better not to make assumptions about the hierarchy. The more robust thing would be to recursively expand the collectors to items, with the stop condition being if isinstance(node, pytest.Item) or conversely keep expanding while isinstance(node, pytest.Collector).


Regarding the SetupState assert: IIUC the use case is to register a finalizer to a node before the node is set up. That's interesting but I think the SetupState is the wrong place for that. Previously it wasn't really the intention to support it - the assert just used to be commented), and it didn't respect the stack invariants. I think that moving the teardown to pytest_runtest_teardown makes sense in terms of symmetry.

I was trying to understand the reason why the hooks are used instead of a class fixture, which are much easier to work with. I see the comment pytest_runtest_setup runs *before* pytest fixtures with scope="class". [...] but I don't follow what it says.


Anyway, thanks for working on a fix on your side, and sorry for the breakage. I hope you can at least see why we're getting rid of Instance.

@CaselIT
Copy link
Author

CaselIT commented Nov 28, 2021

I was trying to understand the reason why the hooks are used instead of a class fixture, which are much easier to work with. I see the comment pytest_runtest_setup runs *before* pytest fixtures with scope="class". [...] but I don't follow what it says.

What the SQLAlchemy plugin is trying to do before with in pytest_runtest_setup + finalizer and now in pytest_runtest_setup + hookwrapper after the yield it have a place where it can run some code before class level fixtures are executed and after they are cleaned up when fist starting execution of a class and when the running of the class is done. The comment wanted to indicate that. Currently it seems that pytest_runtest_setup does run before any fixture is executed at the beginning of a class, but pytest_runtest_setup is run inside the fixture scope (at least the class level ones) so hookwrapper needs to be used.

As mentioned above if there is a way of adding a finalizer before the class is setup it may make the code a bit easier to follow in the SQLAlchemy plugin, but the current solution is also ok!

Anyway, thanks for working on a fix on your side, and sorry for the breakage. I hope you can at least see why we're getting rid of Instance.

No problem, SQLAlchemy does some unusual stuff for sure in the tests, so we initially though it would be a lot more complex to make v7 work

I'll close this, since v7 is fixing the issue! Thanks again for the support

@The-Compiler
Copy link
Member

I opened #9347 to track the remaining issue about you not getting a warning when accessing pytest.Instance.

@CaselIT
Copy link
Author

CaselIT commented Nov 28, 2021

I believe you should be able to do away with CollectParent:

1. For the `if isinstance(item.parent, CollectParent)` check in `pytest_collection_modifyitems`, you can change it to `item.getparent(pytest.Class) is not None`.

2. In `pytest_pycollect_makeitem` you can change it to `collector.cls is not None`.

I think these are good ideas because it's less reliant on the exact collection hierarchy.

The pattern item.getparent(pytest.Class).cls and item.parent.cls etc. can be just item.cls because that already does the getparent bit. Same for module. Ideally you would just not use x.parent directly when it can be avoided.

With the above, that leaves one pytest7 check, in the per_cls_dict[inst_or_fn.name].append(inst_or_fn) code. In this case there is indeed a difference. But again I think it is better not to make assumptions about the hierarchy. The more robust thing would be to recursively expand the collectors to items, with the stop condition being if isinstance(node, pytest.Item) or conversely keep expanding while isinstance(node, pytest.Collector).

thanks, I've applied your suggestions. There is now no need of using any pytest7 check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants