Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore parent tests added edges for build selection #7431

Merged

Conversation

b-luu
Copy link
Contributor

@b-luu b-luu commented Apr 21, 2023

resolves #7289 #5193

Description

Removes the added tests edges (specific to build task) for children and/or parent selection.
This allows to avoid selecting any other nodes that aren't selected with the other dbt commands (list, run, ...).
The added tests edges are still respected for the GraphQueue to follow the execution.

Checklist

@b-luu b-luu requested a review from a team April 21, 2023 15:30
@b-luu b-luu requested a review from a team as a code owner April 21, 2023 15:30
@b-luu b-luu requested review from stu-k and QMalcolm April 21, 2023 15:30
@cla-bot cla-bot bot added the cla:yes label Apr 21, 2023
@b-luu
Copy link
Contributor Author

b-luu commented Apr 21, 2023

I'm not sure where to add the relevant tests though :-/

Please advise.

The tests I've ran manually (on a real dbt project) were:

  • dbt list --select <a_model_having_tests_and_children_models>+ and counted the number of elements listed
  • dbt build --select <a_model_having_tests_and_children_models>+ and verified same number of elements + order of execution respects the added tests edges (children models only get ran when parents tests have finished)
  • dbt list --select <a_test_of_a_parent_model>+ just lists the single test
  • dbt build --select <a_test_of_a_parent_model>+ only runs that single test (without this fix, it also runs all children of the model of the test!!)

@jtcohen6 jtcohen6 added Team:Execution ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Apr 22, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 22, 2023

@b-luu Thanks for the PR - nice work! :)

I think tests/functional/build/test_build would be a good place for this. Specifically, a test case that captures the scenario you described as being fixed in this PR:

dbt build --select <a_test_of_a_parent_model>+ only runs that single test (without this fix, it also runs all children of the model of the test!!)

Something like:

class TestDownstreamSelection:
    @pytest.fixture(scope="class")
    def models(self):
        return {
            "model_a.sql": models_simple_blocking__model_a_sql,
            "model_b.sql": models_simple_blocking__model_b_sql,
            "test.yml": models_simple_blocking__test_yml,
        }

    def test_downstream_selection(self, project):
        """Ensure that selecting test+ does not select model_a's other children"""
        results = run_dbt(["build", "--select", "model_a not_null_model_a_id+"], expect_pass=False)
        assert len(results) == 2

This fails on current main branch, with assert 3 == 2.

(While here, I noticed that we need to update our .gitignore to un-ignore the tests/functional/build directory: #7436)

@b-luu
Copy link
Contributor Author

b-luu commented Apr 22, 2023

Thanks @jtcohen6 🙏

I've added that, just changing expect_pass to True in the call to dbt.tests.utils.run_dbt as the build shouldn't fail in this current case.

So everything should be ready, please let me know if I can help in any other way.

core/dbt/graph/graph.py Outdated Show resolved Hide resolved
@b-luu b-luu requested a review from jtcohen6 April 24, 2023 05:49
@b-luu b-luu changed the title Ignore tests descendants for build selection Ignore parent tests added edges for build selection Apr 24, 2023
@jtcohen6 jtcohen6 linked an issue Apr 24, 2023 that may be closed by this pull request
1 task
@b-luu
Copy link
Contributor Author

b-luu commented Apr 24, 2023

Ok, the first test is clearly a false positive: now that we've fixed dbt build to match the other commands (by excluding the parents' tests added edges from selections), some tests that were initially set up to replicate that are now failing.

I've fixed the first one in TestBuildRunResultsState lines 218 & 220.

But I can't seem to wrap my head around the second one at TestConcurrentSelectionBuildRunResultsState line 485. I'm tempted to apply the exact same fix: len(results) -= 1 and remove table_model from the expected results set, but I'm not sure that's the actual culprit since we're not using dbt list for comparison in this particular test case...

Also, I'm having a hard time reproducing because my docker compose up is failing to build successfully... :-/

These are now fixed: we no longer run models downstream of tests if only the tests were selected, not the parent model...
@b-luu
Copy link
Contributor Author

b-luu commented Apr 24, 2023

Ok, the first test is clearly a false positive: now that we've fixed dbt build to match the other commands (by excluding the parents' tests added edges from selections), some tests that were initially set up to replicate that are now failing.

I've fixed the first one in TestBuildRunResultsState lines 218 & 220.

But I can't seem to wrap my head around the second one at TestConcurrentSelectionBuildRunResultsState line 485. I'm tempted to apply the exact same fix: len(results) -= 1 and remove table_model from the expected results set, but I'm not sure that's the actual culprit since we're not using dbt list for comparison in this particular test case...

Also, I'm having a hard time reproducing because my docker compose up is failing to build successfully... :-/

Actually it's the same for the second one: table_model should no longer be present in the selection => it was only present as an artificailly added dependency (parent test) to the unique_view_model_id test (whereas it's initial dependency is only on the view_model).

So I've also fixed this second test, it should be better now 👍

core/dbt/graph/graph.py Outdated Show resolved Hide resolved
@b-luu
Copy link
Contributor Author

b-luu commented Apr 25, 2023

Thanks for your reviews @jtcohen6 and @iknox-fa 🙏

I've applied your suggestions and also fixed the failing tests. ✔️

@iknox-fa
Copy link
Contributor

iknox-fa commented Apr 28, 2023

Hey @b-luu. Sorry for the delay in reviewing this one, since there was a non-trivial chance it would make an impact on performance it took me a little longer to try it out across a series of test projects and benchmark results.

From what I'm seeing the performance is within the bounds of standard deviation for the use cases I'd expect it to effect. I'm attaching the testing results for the largest and most complex project below for future reference.

It's probably worth noting that there were several times where the compile step took considerably longer than expected. Since I'm testing on a local laptop and I wasn't able to reliably duplicate it I've tossed those datapoints. They probably represent a background task on my local machine or some other non-dbt related factor. Still worth noting though :)

Test Results

Compile

dbt compile --full-refresh --project-dir ../dbt-projects/really_big_project --target-path ../dbt-projects/really_big_project/target

PR times:
1:34.64
1:30.29
1:32.03
1:32.76
1:35.46
----
avg: 1:33.03

Main times:
1:38.61
1:36.74
1:36.61
1:35.52
1:36.01
----
avg: 1:36.70

Result: PR = 3.80% shorter run time

List

(modify all tests)
time dbt list --project-dir ../dbt-projects/really_big_project --select state:modified+ --state ../dbt-projects/really_big_project/previous_target

PR times:
1:10.89
1:11.37
1:10.59
1:09.30
1:09.31
----
avg: 1:10.29

Main times:
1:15.18
1:13.14
1:13.54
1:15.23
1:15.20
----
avg: 1:14.45

Result: PR = 5.79% shorter run time

Run

(modify all tests)
time dbt run --project-dir ../dbt-projects/really_big_project --select state:modified+ --state ../dbt-projects/really_big_project/previous_target --target-path ../dbt-projects/really_big_project/target

PR times:
1:11.10
1:13.86
1:14.19
1:14.59
1:15.73
----
avg: 1:13.89

Main times:
1:16.28
1:15.55
1:10.44
1:11.97
1:11.37
----
avg: 1:13.12

Result: PR = 1.05% longer run time

Test

(modify all tests)
time dbt test --project-dir ../dbt-projects/really_big_project --select state:modified+ --state ../dbt-projects/really_big_project/previous_target --target-path ../dbt-projects/really_big_project/target

PR times:
1:12.19
1:14.01
1:15.97
1:16.12
1:15.57
----
avg: 1:14.77

Main times:
1:11.68
1:12.26
1:11:66
1:13.17
1:12.34
----
avg: 1:12.22

Result: PR = 3.53% longer run time

Copy link
Contributor

@iknox-fa iknox-fa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@iknox-fa iknox-fa merged commit f1dddaa into dbt-labs:main Apr 28, 2023
@b-luu
Copy link
Contributor Author

b-luu commented May 9, 2023

Awesome 🎉

Thanks for the thorough performance testing and taking the relay on this @iknox-fa 🙏

@aranke aranke mentioned this pull request May 9, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
3 participants