Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update project load tracking to include experimental parser info #3495

Merged
merged 5 commits into from
Jun 28, 2021

Conversation

kwigley
Copy link
Contributor

@kwigley kwigley commented Jun 24, 2021

resolves #3438

Description

I reset this branch several times while trying to add this data, this is what I landed on that doesn't involve refactoring pieces of partial parsing. Will pull logic in to 0.20.latest branch after approval.

  • parsed_path_count -> path_count can represent the size of the project and amount of files eligible to be parsed while parsed_path_count will be the actual amount of files parsed
  • is_static_analysis_enabled -> set if flag for experimental parser is turned on
  • static_analysis_path_count -> number of files eligible to be parsed by experimental parser
  • static_analysis_parsed_path_count -> number of files actually parsed by experimental parser

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

kostek-pl and others added 2 commits June 24, 2021 16:06
* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update
@kwigley kwigley requested review from gshank and jtcohen6 June 24, 2021 20:11
@cla-bot cla-bot bot added the cla:yes label Jun 24, 2021
@kwigley kwigley temporarily deployed to Postgres June 24, 2021 20:18 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 24, 2021 20:18 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 24, 2021 20:18 Inactive
@kwigley kwigley temporarily deployed to Redshift June 24, 2021 20:18 Inactive
@kwigley kwigley temporarily deployed to Redshift June 24, 2021 20:18 Inactive
@kwigley kwigley temporarily deployed to Postgres June 25, 2021 02:30 Inactive
@kwigley kwigley temporarily deployed to Redshift June 25, 2021 02:30 Inactive
@kwigley kwigley temporarily deployed to Redshift June 25, 2021 02:30 Inactive
@kwigley kwigley temporarily deployed to Bigquery June 25, 2021 02:30 Inactive
@kwigley kwigley temporarily deployed to Bigquery June 25, 2021 02:30 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 25, 2021 02:30 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 25, 2021 02:30 Inactive
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Event looks good, I confirmed the data flows all the way through as expected
  • The updates to path_count + parsed_path_count are right on
  • We're undercounting for static_analysis_path_count + static_analysis_parsed_path_count because of how we're using the node.meta property right now. Let's find another way to do this

@@ -614,6 +629,9 @@ def track_project_load(self):
"is_partial_parse_enabled": (
self._perf_info.is_partial_parse_enabled
),
"is_static_analysis_enabled": flags.USE_EXPERIMENTAL_PARSER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add is_static_analysis_enabled to the output of dbt parse (target/perf_info.json) as well?

) -> None:
node.meta["is_statically_extractable"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use meta for this. That's a property that users can set in yaml files (docs). If the user sets any yaml properties for the node, even if they don't set any meta properties, the node drops {'is_statically_extractable': True} during yaml parsing here. The upshot is that we significantly undercount the number of statically extractable + extracted paths.

Is there a different preexisting, internal-use node property we could set here? Or another way to tag the node (without adding having to adjust the manifest schema)?

I'd also be open to adjusting the v2 manifest schema so that, instead of just adding a standalone created_at property, we added a parsing dictionary (which could include created_at, statically_extractable, statically_extracted, total parse time, etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option might be to store the information in an attribute/dictionary that is filtered out in serialization so that we don't need to update the manifest schema. I couldn't do that with 'created_at' because it had to be in the partial_parse.msgpack file, but I think the information we're discussion here doesn't need to be stored.

I suspect that we're going to keep running into this "new attribute vs updating schema" issue, so we might want to think about something like the 'parsing' dictionary Jeremy proposes, maybe 'internal' or something, where we can put data that probably is not relevant except internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see Kyle already is filtering out that info. Nevermind :)

@kwigley kwigley force-pushed the add/update-project-load-tracking branch from 008b238 to a195e28 Compare June 25, 2021 12:24
… and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being
@kwigley kwigley force-pushed the add/update-project-load-tracking branch from a195e28 to ebe6f3a Compare June 25, 2021 12:25
@kwigley kwigley temporarily deployed to Postgres June 25, 2021 12:30 Inactive
@kwigley kwigley temporarily deployed to Redshift June 25, 2021 12:30 Inactive
@kwigley kwigley temporarily deployed to Redshift June 25, 2021 12:30 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 25, 2021 12:30 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 25, 2021 12:30 Inactive
@gshank
Copy link
Contributor

gshank commented Jun 25, 2021

Did we decide not to track the path counts on a project/parser basis? I see the total path_count is saved from the number of files.

Also we had discussed storing whether partial parsing is actually running ('partially_parsing'). Is that something we want to save?

@jtcohen6
Copy link
Contributor

Also we had discussed storing whether partial parsing is actually running ('partially_parsing'). Is that something we want to save?

@gshank I think we can infer this from whether path_count != parsed_path_count

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Numbers are looking right on.

@kwigley kwigley temporarily deployed to Postgres June 25, 2021 15:01 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 25, 2021 15:01 Inactive
@kwigley kwigley temporarily deployed to Snowflake June 25, 2021 15:01 Inactive
@kwigley kwigley temporarily deployed to Redshift June 25, 2021 15:01 Inactive
@kwigley kwigley temporarily deployed to Redshift June 25, 2021 15:01 Inactive
@kwigley kwigley temporarily deployed to Bigquery June 25, 2021 15:01 Inactive
@kwigley kwigley temporarily deployed to Bigquery June 25, 2021 15:01 Inactive
@kwigley kwigley merged commit 4d24656 into develop Jun 28, 2021
@kwigley kwigley deleted the add/update-project-load-tracking branch June 28, 2021 14:09
kwigley pushed a commit that referenced this pull request Jun 28, 2021
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>
kwigley pushed a commit that referenced this pull request Jun 29, 2021
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>
kwigley pushed a commit that referenced this pull request Jun 29, 2021
…) (#3500)

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

Co-authored-by: kostek-pl <[email protected]>
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  4d24656
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  4d24656
  a76ec42
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  2cc0579
  4d24656
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  4d24656
  7563b99
  aadf3c7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update project load tracking for v0.20 parsing changes
4 participants