Adding spans for setup, teardown, and individual fixtures #26

chrisguidry · 2023-09-23T15:20:05Z

In test suites with complex setups, teardowns, and fixtures, it's common to see
most of the test runtime happening in those stages rather than in the individual
tests themselves.

In this change, session- and module-scoped fixtures are attributed to the
overall test session, while function-scoped fixtures are attributed to the
setup for an individual test. This will help with artificial skew from tests
that happen to be the first ones that request a higher-scoped fixture.

I'd welcome feedback on the layout of the spans after some folks have had time
to try it out. pytest doesn't have a clearly defined session or module
setup stage where these fixtures are run, because they are invoked lazily
the first time a test requests them. This is ideal for performance, but these
spans will end up kind of "hanging in mid-air" between other test suites.
Depending on the visualization tool (OpenObserve, Jaeger, etc), you may see
those higher-scoped fixture spans showing up in different spots.

Thanks to @drcraig for the idea and @sihil for the cheers!

Closes #25

@drcraig

In test suites with complex setups, teardowns, and fixtures, it's common to see most of the test runtime happening in those stages rather than in the individual tests themselves. In this change, session- and module-scoped fixtures are attributed to the overall test session, while function-scoped fixtures are attributed to the setup for an individual test. This will help with artificial skew from tests that happen to be the first ones that request a higher-scoped fixture. I'd welcome feedback on the layout of the spans after some folks have had time to try it out. `pytest` doesn't have a clearly defined session or module `setup` stage where these fixtures are run, because they are invoked lazily the first time a test requests them. This is ideal for performance, but these spans will end up kind of "hanging in mid-air" between other test suites. Depending on the visualization tool (OpenObserve, Jaeger, etc), you may see those higher-scoped fixture spans showing up in different spots. Thanks to @drcraig for the idea and @sihil for the cheers! Closes #25

drcraig · 2023-09-23T22:44:21Z

I like it, this is very similar to how I had implemented it on my branch at https://github.com/drcraig/pytest-opentelemetry/tree/fixture-setup-teardown.

Regarding the higher scoped fixtures being under the session scope; I think it's actually okay for these to be under the scope of the individual test function that first called them.

I was envisioning the spans being organized like so, for a a simple case like

import pytest

@pytest.fixture
def fixt_1():
    yield

@pytest.fixture
def fixt_2():
    yield

def test_foo(fixt_1, fixt_2):
    assert True

[-----------session----------------------------------------------------------------------------------]
[-----------test_func-------------------------------------------------------------------------------]
[-----------test_func setup---------]
 [--fixt_1 setup--]
                   [--fixt_2 setup--]
                                     [--test_func call--]
                                                         [--------------test_func teardown---------]
                                                          [--fixt_2 teardown--]
                                                                               [--fixt_1 teardown--]

Where the "test_func" span is created by pytest_runtest_protocol, "test_func setup" is created by pytest_runtest_setup, "test_func call" is created by pytest_runtest_call, and "test_func teardown" is created by pytest_runtest_teardown. The fixture setup spans come from pytest_fixture_setup, and then the fixture teardown spans call from a combination of pytest_runtest_teardown and pytest_fixture_post_finalizer, which I'll describe in more detail later.

For session scoped fixtures, the spans would look something like the following for these tests. The setup for a shared fixture ends up underneath the span of the first test function that depended on it, and the teardown ends up under the last test function that depended on it.

import pytest

@pytest.fixture(scope="session")
def fixt_1():
    yield

@pytest.fixture
def fixt_2():
    yield

def test_func_1(fixt_1, fixt_2):
    assert True

def test_func_2(fixt_1, fixt_2):
    assert True

[-----------session--------------------------------------------------------------------------------------------------------------------------------------------------------------]
[-----------test_func_1------------------------------------------------------------------]
[---------test_func_1 setup---------]
 [--fixt_1 setup--]
                   [--fixt_2 setup--]
                                     [--test_func_1 call --]
                                                            [--test_func_1 teardown-]
                                                             [--fixt_2 teardown-----]
                                                                                     [---------------------------------test_func_2---------------------------------------------]
                                                                                      [--test_func_2 setup--]
                                                                                                             [--test_func_2 call--]
                                                                                                                                   [----------test_func_2 teardown-- ---------]
                                                                                                                                    [--fixt_2 teardown--]
                                                                                                                                                         [--fixt_1 teardown--]

The way to capture the teardown of an individual fixture is to create a presumptive span in pytest_runtest_teardown, just before yielding. You don't know which fixture it will be that will get torn down, so we give it a temporary name, and then update the name and attributes to the actual fixture in pytest_fixture_post_finalizer. After the post_finalizer yield, we close the span, and create a new presumptive span for the next fixture in line (which may not exist if this is the last fixture, but we don't know that yet). Finally, after the yield in pytest_runtest_teardown, when we know all the fixtures are torn down, we delete the last presumptive span, since it wasn't used. When a fixture is scoped higher than a function, then we can rely on FixtureDef.finish() having cleared the cached result so that we don't end up with spans for the fixture for every function that depended on it.

It ends up looking something like

    @pytest.hookimpl(hookwrapper=True)
    def pytest_runtest_teardown(self, item: Item) -> Generator[None, None, None]:
        with tracer.start_as_current_span(
            'teardown',
            attributes=self._attributes_from_item(item),
        ):
            self._fixture_teardown_span = tracer.start_span("presumptive fixture teardown")
            yield
        del self._fixture_teardown_span

    @pytest.hookimpl(hookwrapper=True)
    def pytest_fixture_post_finalizer(
        self, fixturedef: FixtureDef, request: SubRequest
    ) -> Generator[None, None, None]:
        # If the fixture has already been torn down (such as a module or session scoped fixture),
        # or we otherwise don't have a span to work with, create a new span for the next fixture in line.
        if fixturedef.cached_result is None or not hasattr(self, "_fixture_teardown_span"):
            yield
            self._fixture_teardown_span = tracer.start_span("presumptive fixture teardown")
            return

        name = ...
        attributes = ...
        self._fixture_teardown_span.update_name(name)
        self._fixture_teardown_span.set_attributes(attributes)
        yield
        self._fixture_teardown_span.end()
        self._fixture_teardown_span = tracer.start_span("presumptive fixture teardown")

Here's what it would look like for multi-file set of tests with module and session level fixtures.

One final question: what would you think of using item.nodeid as the name of the span instead of item.name? I know it's longer, but it's also unique, and handles things like parametrization. That's how test results are presented in the reports generated by pytest-html, and it's easy to read in my experience.

drcraig · 2023-09-23T22:49:57Z

Here's that exact same test report, but with nodeid as the name of the test, setup, call, and teardown spans. Makes it very clear what everything is.

chrisguidry · 2023-09-25T13:24:21Z

Okay there's a lot to unpack here, but the short answer is: YES! First off, sorry to jump the gun this weekend but I just got excited by the uptick in interest :D

I was trying to get a better handle on teardowns as well, and I didn't think of making that presumptive span, I love it, mind getting a PR in with that? I also came really close this weekend to switching name for nodeid, let's do that in your PR too!

About the "floating" session- and module-level spans, I'm kinda torn. I know that those are technically happening during the setup for the first test that happens to use them, but I didn't want to attribute them to that unit test for pragmatic reasons. My thinking here is that folks interested in seeing traces from pytest are probably interested in understanding test runtimes, and perhaps performance over time.

If we attribute a session-level fixture to the first test that uses that fixture, then we'll get different runtimes for different runs depending on the order that the tests ran (or which xdist worker they landed on, etc). By attaching them to the session span, we should still be able to see them in isolation, but they won't be artificially inflating the first unit test. What do you think? I'm also not exactly sure how to capture session-/module-level fixture teardowns and where to parent them in spans either.

Would love to hear your thoughts and would welcome PRs. I'll invite you as a contributor so you can do it with a little less friction :D

drcraig · 2023-09-26T23:01:54Z

Thanks for contributor status! I'll try to get a PR up by the end of the week (maybe sooner, if I can swing the time) for the presumptive span and nodeid changes.

The floating spans are tough, and totally agree there's not an obvious right answer. What nudges me to keeping it under the testcase, rather than floating, is that it's consistent with how pytest handles the accounting in other contexts. For example, pytest-html will include the logs of the module and session scoped fixtures in the teardown logs for the last tests that use them.

I think there is still a way to get some consistency out of test runtimes, regardless of the fixtures, through the "call" span provided by pytest_runtest_call. This would be only the duration of the test execution itself, and not include fixture setup or teardown.

chrisguidry added 2 commits September 23, 2023 11:11

Adding Python 3.11 as a test target.

b2af458

chrisguidry merged commit 3a5f5e2 into main Sep 23, 2023

chrisguidry deleted the test-phases branch September 23, 2023 15:25

chrisguidry mentioned this pull request Sep 25, 2023

More refined handling of fixture setups and teardowns #28

Closed

drcraig mentioned this pull request Sep 30, 2023

Refined fixture handling #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding spans for setup, teardown, and individual fixtures #26

Adding spans for setup, teardown, and individual fixtures #26

chrisguidry commented Sep 23, 2023

drcraig commented Sep 23, 2023

drcraig commented Sep 23, 2023

chrisguidry commented Sep 25, 2023

drcraig commented Sep 26, 2023

Adding spans for setup, teardown, and individual fixtures #26

Adding spans for setup, teardown, and individual fixtures #26

Conversation

chrisguidry commented Sep 23, 2023

drcraig commented Sep 23, 2023

drcraig commented Sep 23, 2023

chrisguidry commented Sep 25, 2023

drcraig commented Sep 26, 2023