Update project load tracking to include experimental parser info #3495

kwigley · 2021-06-24T20:11:00Z

resolves #3438

Description

I reset this branch several times while trying to add this data, this is what I landed on that doesn't involve refactoring pieces of partial parsing. Will pull logic in to 0.20.latest branch after approval.

parsed_path_count -> path_count can represent the size of the project and amount of files eligible to be parsed while parsed_path_count will be the actual amount of files parsed
is_static_analysis_enabled -> set if flag for experimental parser is turned on
static_analysis_path_count -> number of files eligible to be parsed by experimental parser
static_analysis_parsed_path_count -> number of files actually parsed by experimental parser

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

* Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update

jtcohen6

Event looks good, I confirmed the data flows all the way through as expected
The updates to path_count + parsed_path_count are right on
We're undercounting for static_analysis_path_count + static_analysis_parsed_path_count because of how we're using the node.meta property right now. Let's find another way to do this

jtcohen6 · 2021-06-25T04:07:23Z

core/dbt/parser/manifest.py

@@ -614,6 +629,9 @@ def track_project_load(self):
            "is_partial_parse_enabled": (
                self._perf_info.is_partial_parse_enabled
            ),
+            "is_static_analysis_enabled": flags.USE_EXPERIMENTAL_PARSER,


Could we add is_static_analysis_enabled to the output of dbt parse (target/perf_info.json) as well?

jtcohen6 · 2021-06-25T04:14:44Z

core/dbt/parser/models.py

    ) -> None:
+        node.meta["is_statically_extractable"] = True


I don't think we can use meta for this. That's a property that users can set in yaml files (docs). If the user sets any yaml properties for the node, even if they don't set any meta properties, the node drops {'is_statically_extractable': True} during yaml parsing here. The upshot is that we significantly undercount the number of statically extractable + extracted paths.

Is there a different preexisting, internal-use node property we could set here? Or another way to tag the node (without adding having to adjust the manifest schema)?

I'd also be open to adjusting the v2 manifest schema so that, instead of just adding a standalone created_at property, we added a parsing dictionary (which could include created_at, statically_extractable, statically_extracted, total parse time, etc).

One option might be to store the information in an attribute/dictionary that is filtered out in serialization so that we don't need to update the manifest schema. I couldn't do that with 'created_at' because it had to be in the partial_parse.msgpack file, but I think the information we're discussion here doesn't need to be stored.

I suspect that we're going to keep running into this "new attribute vs updating schema" issue, so we might want to think about something like the 'parsing' dictionary Jeremy proposes, maybe 'internal' or something, where we can put data that probably is not relevant except internally.

Oh, I see Kyle already is filtering out that info. Nevermind :)

… and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

gshank · 2021-06-25T14:14:19Z

Did we decide not to track the path counts on a project/parser basis? I see the total path_count is saved from the number of files.

Also we had discussed storing whether partial parsing is actually running ('partially_parsing'). Is that something we want to save?

jtcohen6 · 2021-06-25T14:24:54Z

Also we had discussed storing whether partial parsing is actually running ('partially_parsing'). Is that something we want to save?

@gshank I think we can infer this from whether path_count != parsed_path_count

jtcohen6

Nice work! Numbers are looking right on.

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]>

…) (#3500) * Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> Co-authored-by: kostek-pl <[email protected]>

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 4d24656

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 4d24656 a76ec42

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 2cc0579 4d24656

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408) * Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update * add static analysis info to parsing data * update changelog * don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being * fix partial parsing case Co-authored-by: kostek-pl <[email protected]> automatic commit by git-black, original commits: 4d24656 7563b99 aadf3c7

kostek-pl and others added 2 commits June 24, 2021 16:06

Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

c153738

* Fix docs generating for cross-db sources * Code reorganization * Code adjustments according to flake8 * Error message adjusted to be more precise * CHANGELOG update

add static analysis info to parsing data

d9384c5

kwigley requested review from gshank and jtcohen6 June 24, 2021 20:11

cla-bot bot added the cla:yes label Jun 24, 2021

update changelog

3763656

kwigley temporarily deployed to Postgres June 24, 2021 20:18 Inactive

kwigley had a problem deploying to Bigquery June 24, 2021 20:18 Error

kwigley had a problem deploying to Bigquery June 24, 2021 20:18 Failure

kwigley temporarily deployed to Snowflake June 24, 2021 20:18 Inactive

kwigley temporarily deployed to Redshift June 24, 2021 20:18 Inactive

kwigley temporarily deployed to Postgres June 25, 2021 02:30 Inactive

kwigley temporarily deployed to Redshift June 25, 2021 02:30 Inactive

kwigley temporarily deployed to Bigquery June 25, 2021 02:30 Inactive

kwigley temporarily deployed to Snowflake June 25, 2021 02:30 Inactive

jtcohen6 requested changes Jun 25, 2021

View reviewed changes

kwigley force-pushed the add/update-project-load-tracking branch from 008b238 to a195e28 Compare June 25, 2021 12:24

don't use meta! need better separation between dbt internal objects…

ebe6f3a

… and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

kwigley force-pushed the add/update-project-load-tracking branch from a195e28 to ebe6f3a Compare June 25, 2021 12:25

kwigley temporarily deployed to Postgres June 25, 2021 12:30 Inactive

kwigley temporarily deployed to Redshift June 25, 2021 12:30 Inactive

kwigley had a problem deploying to Bigquery June 25, 2021 12:30 Failure

kwigley had a problem deploying to Bigquery June 25, 2021 12:30 Error

kwigley temporarily deployed to Snowflake June 25, 2021 12:30 Inactive

jtcohen6 approved these changes Jun 25, 2021

View reviewed changes

fix partial parsing case

7178260

kwigley temporarily deployed to Postgres June 25, 2021 15:01 Inactive

kwigley temporarily deployed to Snowflake June 25, 2021 15:01 Inactive

kwigley temporarily deployed to Redshift June 25, 2021 15:01 Inactive

kwigley temporarily deployed to Bigquery June 25, 2021 15:01 Inactive

gshank approved these changes Jun 25, 2021

View reviewed changes

kwigley merged commit 4d24656 into develop Jun 28, 2021

kwigley deleted the add/update-project-load-tracking branch June 28, 2021 14:09

jtcohen6 mentioned this pull request Jun 28, 2021

Tracking error: partial parsing with no model changes #3506

Closed

jmriego mentioned this pull request Jul 1, 2021

'+' config prefix handling whitespace #3526

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update project load tracking to include experimental parser info #3495

Update project load tracking to include experimental parser info #3495

kwigley commented Jun 24, 2021 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Jun 25, 2021

jtcohen6 Jun 25, 2021

gshank Jun 25, 2021

gshank Jun 25, 2021

gshank commented Jun 25, 2021

jtcohen6 commented Jun 25, 2021

jtcohen6 left a comment •

edited

Loading

Update project load tracking to include experimental parser info #3495

Update project load tracking to include experimental parser info #3495

Conversation

kwigley commented Jun 24, 2021 • edited Loading

Description

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Jun 25, 2021

Choose a reason for hiding this comment

jtcohen6 Jun 25, 2021

Choose a reason for hiding this comment

gshank Jun 25, 2021

Choose a reason for hiding this comment

gshank Jun 25, 2021

Choose a reason for hiding this comment

gshank commented Jun 25, 2021

jtcohen6 commented Jun 25, 2021

jtcohen6 left a comment • edited Loading

Choose a reason for hiding this comment

kwigley commented Jun 24, 2021 •

edited

Loading

jtcohen6 left a comment •

edited

Loading