-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update project load tracking to include experimental parser info #3495
Conversation
* Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Event looks good, I confirmed the data flows all the way through as expected
- The updates to
path_count
+parsed_path_count
are right on - We're undercounting for
static_analysis_path_count
+static_analysis_parsed_path_count
because of how we're using thenode.meta
property right now. Let's find another way to do this
core/dbt/parser/manifest.py
Outdated
@@ -614,6 +629,9 @@ def track_project_load(self): | |||
"is_partial_parse_enabled": ( | |||
self._perf_info.is_partial_parse_enabled | |||
), | |||
"is_static_analysis_enabled": flags.USE_EXPERIMENTAL_PARSER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add is_static_analysis_enabled
to the output of dbt parse
(target/perf_info.json
) as well?
core/dbt/parser/models.py
Outdated
) -> None: | ||
node.meta["is_statically_extractable"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use meta
for this. That's a property that users can set in yaml files (docs). If the user sets any yaml properties for the node, even if they don't set any meta
properties, the node drops {'is_statically_extractable': True}
during yaml parsing here. The upshot is that we significantly undercount the number of statically extractable + extracted paths.
Is there a different preexisting, internal-use node property we could set here? Or another way to tag the node (without adding having to adjust the manifest schema)?
I'd also be open to adjusting the v2
manifest schema so that, instead of just adding a standalone created_at
property, we added a parsing
dictionary (which could include created_at
, statically_extractable
, statically_extracted
, total parse time, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option might be to store the information in an attribute/dictionary that is filtered out in serialization so that we don't need to update the manifest schema. I couldn't do that with 'created_at' because it had to be in the partial_parse.msgpack file, but I think the information we're discussion here doesn't need to be stored.
I suspect that we're going to keep running into this "new attribute vs updating schema" issue, so we might want to think about something like the 'parsing' dictionary Jeremy proposes, maybe 'internal' or something, where we can put data that probably is not relevant except internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see Kyle already is filtering out that info. Nevermind :)
008b238
to
a195e28
Compare
… and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being
a195e28
to
ebe6f3a
Compare
Did we decide not to track the path counts on a project/parser basis? I see the total path_count is saved from the number of files. Also we had discussed storing whether partial parsing is actually running ('partially_parsing'). Is that something we want to save? |
@gshank I think we can infer this from whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Numbers are looking right on.
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]>
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]>
…) (#3500) * Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> Co-authored-by: kostek-pl <[email protected]>
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 4d24656
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 4d24656 a76ec42
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 2cc0579 4d24656
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 4d24656 7563b99 aadf3c7
resolves #3438
Description
I reset this branch several times while trying to add this data, this is what I landed on that doesn't involve refactoring pieces of partial parsing. Will pull logic in to
0.20.latest
branch after approval.parsed_path_count
-> path_count can represent the size of the project and amount of files eligible to be parsed while parsed_path_count will be the actual amount of files parsedis_static_analysis_enabled
-> set if flag for experimental parser is turned onstatic_analysis_path_count
-> number of files eligible to be parsed by experimental parserstatic_analysis_parsed_path_count
-> number of files actually parsed by experimental parserChecklist
CHANGELOG.md
and added information about my change to the "dbt next" section.