Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt build performance fix #6322

Closed
wants to merge 1 commit into from

Conversation

china-cse
Copy link

@china-cse china-cse commented Nov 27, 2022

introduced trim graph to speed up dbt build execution

resolves #6073

Description

it resolves the performance issue of dbt build with more than 2K nodes in project. its been tested in local environment, and its working as expected.. all details provided in #6073

Have worked on below approach, this takes less time in finding nodes.. and tested with 10K nodes.. seems its giving faster results.

this solution requires additional function (lets say "trim_graph()") for scalability purpose which main purpose is to remove unnecessary nodes/paths except direct paths
all details explained in #6073

Checklist

introduced trim graph to speed up dbt build execution
@china-cse china-cse requested review from a team and stu-k November 27, 2022 05:36
@cla-bot
Copy link

cla-bot bot commented Nov 27, 2022

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @china-cse

@jtcohen6 jtcohen6 added Team:Execution ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Nov 28, 2022
@jtcohen6
Copy link
Contributor

@china-cse Thanks for opening, and for the ongoing work on this!

Are you able to sign our CLA? So long as you can do that, one of the engineers from our team can take a look at the proposed changes, and determine if this PR is merge-ready.

@stu-k stu-k requested a review from iknox-fa November 28, 2022 17:02
@china-cse
Copy link
Author

Hi @jtcohen6

I'm unable to sign CLA using details link.. it takes me to https://s3.amazonaws.com/cla-bot-logs/ page.. but nothing to fill in. Where I have to sign it?

@china-cse
Copy link
Author

Hi @jtcohen6

sorry, please ignore previous msg.. was able to find the link to google forms from cla-bot message above.. filled in all info

@jtcohen6
Copy link
Contributor

@cla-bot check

@cla-bot cla-bot bot added the cla:yes label Nov 29, 2022
@cla-bot
Copy link

cla-bot bot commented Nov 29, 2022

The cla-bot has been summoned, and re-checked this pull request!

@china-cse
Copy link
Author

Team, please let me know if anything required from my side..

Copy link
Contributor

@stu-k stu-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with our methods of graph traversal, so I will leave this for other members on my team to approve or request changes. Just a small note about commented out code!

Comment on lines +113 to +124
# for node in self:
# if node not in include_nodes:
# source_nodes = [x for x, _ in new_graph.in_edges(node)]
# target_nodes = [x for _, x in new_graph.out_edges(node)]

new_edges = product(source_nodes, target_nodes)
non_cyclic_new_edges = [
(source, target) for source, target in new_edges if source != target
] # removes cyclic refs
# new_edges = product(source_nodes, target_nodes)
# non_cyclic_new_edges = [
# (source, target) for source, target in new_edges if source != target
# ] # removes cyclic refs

new_graph.add_edges_from(non_cyclic_new_edges)
new_graph.remove_node(node)
# new_graph.add_edges_from(non_cyclic_new_edges)
# new_graph.remove_node(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a need to keep commented out code, this can be removed.

@stu-k stu-k requested a review from a team December 2, 2022 15:53
@china-cse
Copy link
Author

DBT Team,

Any update on review of this pull request. It’s been a while and no status updated. Please confirm. If all good, I can just remove comments as mentioned by @stu-k and push the changes.

thanks

@jtcohen6
Copy link
Contributor

@china-cse Sorry for the delay. We've been discussing this one internally, and we want to make sure we're doing a thorough job of benchmarking & testing. See this issue for some proposed next steps: #6562

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 7, 2023

@china-cse After taking a closer look, and testing with some sample projects, we did not see a performance improvement associated with the changes in this PR. In some cases, we saw degraded performance.

Those investigations did yield some performance speedups, which we've since merged in:

I'm going to close this PR for now. That said, it sounds like you were able to verify locally that these changes led to a speedup as expected:

Have worked on below approach, this takes less time in finding nodes.. and tested with 10K nodes.. seems its giving faster results.

If you'd be able to share some detailed timing information with us, or even a version of your project (anonymized if need be) so that we could dig deeper in on our end, we might reopen the PR.

@jtcohen6 jtcohen6 closed this Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1353] [Bug] Performance issues with DBT build when nodes more than 2K in a project
3 participants