-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Manifest instead of ParseResult [#3163] #3219
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cool! I left a few initial notes as I was trying to read through + understand the changes. Thank you for all the inline comments along the way :)
# finally, we should hash the actual profile used, not just root project + | ||
# profiles.yml + relevant args. While sufficient, it is definitely overkill. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it's not clear to me why we need the full profile information here. On the one hand, it is common for users to include special Jinja variables like {{ target.name }}
in node configs, so we'd need to re-render those if the target
changes. Ideally, we'd be able to take an approach like the one you outlined above for vars
—so that changing target
doesn't invalidate everything, just the nodes that depend on it.
Aside from those Jinja variables, I don't believe the parsed representation / internal manifest should depend on any part of the specific connection, except for the adapter/credentials type
being used. Each adapter plugin contains macros, and it can add or override certain node configs, so parsing the same project with dbt-postgres
vs. dbt-redshift
would produce a genuinely different manifest. But, absent any target
-aware logic, I think parsing the same project for one Redshift database vs. a different Redshift database should return the exact same manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a case where we're being overzealous, and fully re-parsing the project when we could make partial parsing smarter. Therefore, this is in the "slow + correct" bucket, rather than the "fast + incorrect" bucket. We shouldn't lose track of this, but for now, we should be prioritizing edge cases in the latter category.
# TODO: this should be calculated per-file based on the vars() calls made in | ||
# parsing, so changing one var doesn't invalidate everything. also there should | ||
# be something like that for env_var - currently changing env_vars in way that | ||
# impact graph selection or configs will result in weird test failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1000. Thanks for calling it out here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was already, I just moved it :) But it's a good comment.
process_sources(self.manifest, project_name) | ||
process_refs(self.manifest, project_name) | ||
process_docs(self.manifest, self.root_project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it right, our next goal (in #3217) is to move these methods to before the partial-parsing save point. Today, dbt fully re-processes every single source, ref, and description, every single time. Ideally, partial-parsing should only be re-processing nodes that have changed—whether the change occurred in a .sql
file, .yml
file, or .md
file associated with that node.
Do I have that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
211f593
to
e5edc57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
def sanitized_update( | ||
self, | ||
source_file: SourceFile, | ||
old_manifest: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, you can do something like
old_manifest: Any, | |
old_manifest: Optional["Manifest"], |
here to refer to itself as a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've seen code using the string class name but didn't think of it for here.
I'm not 100% on what is causing the Windows tests to fail, can we re-run it to see if it is just a transient error?
I'm actually not 100% how to do this haha |
I've re-run it twice, and it's failing reliably (though inscrutably) on this line:
|
67c98ed
to
a132540
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good in local testing!
resolves #3163
Description
As part of the performance initiative we are working on saving more of the parsing state to be reused when partial parsing is turned on. This ticket switches to use a Manifest object instead of a ParseResult object to store parsing state. The eventual goal is to be able to reload a previously fully parsed Manifest.
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.