Use Manifest instead of ParseResult [#3163] #3219

gshank · 2021-04-01T16:36:27Z

resolves #3163

Description

As part of the performance initiative we are working on saving more of the parsing state to be reused when partial parsing is turned on. This ticket switches to use a Manifest object instead of a ParseResult object to store parsing state. The eventual goal is to be able to reload a previously fully parsed Manifest.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6

So cool! I left a few initial notes as I was trying to read through + understand the changes. Thank you for all the inline comments along the way :)

jtcohen6 · 2021-04-01T16:53:53Z

core/dbt/parser/manifest.py

+    # finally, we should hash the actual profile used, not just root project +
+    # profiles.yml + relevant args. While sufficient, it is definitely overkill.


I agree, it's not clear to me why we need the full profile information here. On the one hand, it is common for users to include special Jinja variables like {{ target.name }} in node configs, so we'd need to re-render those if the target changes. Ideally, we'd be able to take an approach like the one you outlined above for vars—so that changing target doesn't invalidate everything, just the nodes that depend on it.

Aside from those Jinja variables, I don't believe the parsed representation / internal manifest should depend on any part of the specific connection, except for the adapter/credentials type being used. Each adapter plugin contains macros, and it can add or override certain node configs, so parsing the same project with dbt-postgres vs. dbt-redshift would produce a genuinely different manifest. But, absent any target-aware logic, I think parsing the same project for one Redshift database vs. a different Redshift database should return the exact same manifest.

This is a case where we're being overzealous, and fully re-parsing the project when we could make partial parsing smarter. Therefore, this is in the "slow + correct" bucket, rather than the "fast + incorrect" bucket. We shouldn't lose track of this, but for now, we should be prioritizing edge cases in the latter category.

jtcohen6 · 2021-04-01T16:56:27Z

core/dbt/parser/manifest.py

+    # TODO: this should be calculated per-file based on the vars() calls made in
+    # parsing, so changing one var doesn't invalidate everything. also there should
+    # be something like that for env_var - currently changing env_vars in way that
+    # impact graph selection or configs will result in weird test failures.


+1000. Thanks for calling it out here!

The comment was already, I just moved it :) But it's a good comment.

jtcohen6 · 2021-04-01T17:06:21Z

core/dbt/parser/manifest.py

+        process_sources(self.manifest, project_name)
+        process_refs(self.manifest, project_name)
+        process_docs(self.manifest, self.root_project)


If I understand it right, our next goal (in #3217) is to move these methods to before the partial-parsing save point. Today, dbt fully re-processes every single source, ref, and description, every single time. Ideally, partial-parsing should only be re-processing nodes that have changed—whether the change occurred in a .sql file, .yml file, or .md file associated with that node.

Do I have that right?

Yes, that's correct.

kwigley

Looks good!

kwigley · 2021-04-05T14:21:24Z

core/dbt/contracts/graph/manifest.py

+    def sanitized_update(
+        self,
+        source_file: SourceFile,
+        old_manifest: Any,


Just a note, you can do something like

Suggested change

old_manifest: Any,

old_manifest: Optional["Manifest"],

here to refer to itself as a type.

Thanks! I've seen code using the string class name but didn't think of it for here.

kwigley · 2021-04-05T14:50:23Z

I'm not 100% on what is causing the Windows tests to fail, can we re-run it to see if it is just a transient error?

can we re-run it

I'm actually not 100% how to do this haha

jtcohen6 · 2021-04-05T15:14:48Z

I'm not 100% on what is causing the Windows tests to fail, can we re-run it to see if it is just a transient error?

can we re-run it

I'm actually not 100% how to do this haha

I've re-run it twice, and it's failing reliably (though inscrutably) on this line:

test/unit/test_config.py::TestProject::test_cycle Windows fatal exception: stack overflow

jtcohen6

All good in local testing!

gshank force-pushed the partial_parsing branch from 39024b1 to 39c7dbd Compare April 1, 2021 16:42

cla-bot bot added the cla:yes label Apr 1, 2021

gshank requested review from jtcohen6 and kwigley April 1, 2021 16:42

gshank changed the title ~~Use Manifest instead of ParseResults [#3163]~~ Use Manifest instead of ParseResult [#3163] Apr 1, 2021

jtcohen6 reviewed Apr 1, 2021

View reviewed changes

gshank force-pushed the partial_parsing branch 2 times, most recently from 211f593 to e5edc57 Compare April 2, 2021 14:15

kwigley approved these changes Apr 5, 2021

View reviewed changes

gshank force-pushed the partial_parsing branch 3 times, most recently from 67c98ed to a132540 Compare April 6, 2021 16:54

Use Manifest instead of ParseResults [#3163]

307d47e

gshank force-pushed the partial_parsing branch from a132540 to 307d47e Compare April 6, 2021 17:51

jtcohen6 approved these changes Apr 6, 2021

View reviewed changes

gshank merged commit 749f873 into develop Apr 6, 2021

gshank deleted the partial_parsing branch April 6, 2021 18:05

kwigley mentioned this pull request May 11, 2021

Feature/schema tests are more unique #3335

Merged

4 tasks

jtcohen6 mentioned this pull request Jan 6, 2023

[CT-1761] Update code comment for build_manifest_state_check #6536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Manifest instead of ParseResult [#3163] #3219

Use Manifest instead of ParseResult [#3163] #3219

gshank commented Apr 1, 2021 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Apr 1, 2021

jtcohen6 Apr 6, 2021

jtcohen6 Apr 1, 2021

gshank Apr 2, 2021

jtcohen6 Apr 1, 2021

gshank Apr 1, 2021

kwigley left a comment

kwigley Apr 5, 2021

gshank Apr 6, 2021

kwigley commented Apr 5, 2021 •

edited

Loading

jtcohen6 commented Apr 5, 2021

jtcohen6 left a comment

		# finally, we should hash the actual profile used, not just root project +
		# profiles.yml + relevant args. While sufficient, it is definitely overkill.

Use Manifest instead of ParseResult [#3163] #3219

Use Manifest instead of ParseResult [#3163] #3219

Conversation

gshank commented Apr 1, 2021 • edited Loading

Description

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwigley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwigley commented Apr 5, 2021 • edited Loading

jtcohen6 commented Apr 5, 2021

jtcohen6 left a comment

Choose a reason for hiding this comment

gshank commented Apr 1, 2021 •

edited

Loading

kwigley commented Apr 5, 2021 •

edited

Loading