-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt build performance fix #6322
Conversation
introduced trim graph to speed up dbt build execution
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @china-cse |
@china-cse Thanks for opening, and for the ongoing work on this! Are you able to sign our CLA? So long as you can do that, one of the engineers from our team can take a look at the proposed changes, and determine if this PR is merge-ready. |
Hi @jtcohen6 I'm unable to sign CLA using details link.. it takes me to https://s3.amazonaws.com/cla-bot-logs/ page.. but nothing to fill in. Where I have to sign it? |
Hi @jtcohen6 sorry, please ignore previous msg.. was able to find the link to google forms from cla-bot message above.. filled in all info |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
Team, please let me know if anything required from my side.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with our methods of graph traversal, so I will leave this for other members on my team to approve or request changes. Just a small note about commented out code!
# for node in self: | ||
# if node not in include_nodes: | ||
# source_nodes = [x for x, _ in new_graph.in_edges(node)] | ||
# target_nodes = [x for _, x in new_graph.out_edges(node)] | ||
|
||
new_edges = product(source_nodes, target_nodes) | ||
non_cyclic_new_edges = [ | ||
(source, target) for source, target in new_edges if source != target | ||
] # removes cyclic refs | ||
# new_edges = product(source_nodes, target_nodes) | ||
# non_cyclic_new_edges = [ | ||
# (source, target) for source, target in new_edges if source != target | ||
# ] # removes cyclic refs | ||
|
||
new_graph.add_edges_from(non_cyclic_new_edges) | ||
new_graph.remove_node(node) | ||
# new_graph.add_edges_from(non_cyclic_new_edges) | ||
# new_graph.remove_node(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a need to keep commented out code, this can be removed.
DBT Team, Any update on review of this pull request. It’s been a while and no status updated. Please confirm. If all good, I can just remove comments as mentioned by @stu-k and push the changes. thanks |
@china-cse Sorry for the delay. We've been discussing this one internally, and we want to make sure we're doing a thorough job of benchmarking & testing. See this issue for some proposed next steps: #6562 |
@china-cse After taking a closer look, and testing with some sample projects, we did not see a performance improvement associated with the changes in this PR. In some cases, we saw degraded performance. Those investigations did yield some performance speedups, which we've since merged in: I'm going to close this PR for now. That said, it sounds like you were able to verify locally that these changes led to a speedup as expected:
If you'd be able to share some detailed timing information with us, or even a version of your project (anonymized if need be) so that we could dig deeper in on our end, we might reopen the PR. |
introduced trim graph to speed up dbt build execution
resolves #6073
Description
it resolves the performance issue of dbt build with more than 2K nodes in project. its been tested in local environment, and its working as expected.. all details provided in #6073
Have worked on below approach, this takes less time in finding nodes.. and tested with 10K nodes.. seems its giving faster results.
this solution requires additional function (lets say "trim_graph()") for scalability purpose which main purpose is to remove unnecessary nodes/paths except direct paths
all details explained in #6073
Checklist
changie new
to create a changelog entry