You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parsing files, especially .yml files, is one of the most time-consuming operations at the start of an invocation. Partial parsing is a powerful feature that enables dbt to avoid re-parsing unchanged files in subsequent runs. While it's no substitute for improving overall parse time, the better it can be, the better developers' quality of life will be, especially when working with very big projects.
Today, there are some significant limitations. From docs:
If environment variables control the parsed representation of your project, then the logic executed by dbt may differ from the logic specified in your project. Partial parsing should only be used when all of the logic in your dbt project is encoded in the files inside of that project.
What would we need in order to return reliable, up-to-date results? I imagine dbt would need to:
capture which files/macros/configs/etc depend on the values of environment variables
record which environment variables matter, and what their values are
compare the env var values in subsequent runs to determine if the files need to be re-parsed
If partial parsing is enabled and --vars change between runs, dbt will always re-parse.
What would we need to avoid re-parsing all files when --vars change? I imagine it's similar to above! In both cases, if dbt could statically analyze which files directly or indirectly depend on the {{ env_var() }} macro and {{ var() }} macros, and compare the values in subsequent runs, it can know whether to re-parse certain files.
By default, partial_parse is set to false
What would need to change to turn this on for all projects by default? I imagine it's having good answers to env vars above. (Currently, dbt will return correct parsed results without speed improvements if --vars change, but there's a chance it returns incorrect results if env vars change.) There may be other things I'm missing as well.
Alternatively, we could distinguish between the two and say that env vars are a non-dbt construct, whereas --vars are a dbt construct for which it must take responsibility. It's up to the developer / deployment orchestrator to ensure that, if partial parse is being used, all env vars are consistent across the pickled and current runtimes.
The text was updated successfully, but these errors were encountered:
Why don't we pickle the "final" version of the manifest? Today, patch_sources_elapsed and process_manifest_elapsed have to be performed in full, partial parsing or no. (We opened this as a separate issue: Refactor partial parsing to cover more startup costs #3163)
Could we use msgpack (via Mashumaro) or cbor instead of a pickle file?
A number of these improvements are included in 0.20.0rc1. An additional slew of test cases and edge cases are detailed in #3371. Closing in favor of that issue.
Parsing files, especially
.yml
files, is one of the most time-consuming operations at the start of an invocation. Partial parsing is a powerful feature that enables dbt to avoid re-parsing unchanged files in subsequent runs. While it's no substitute for improving overall parse time, the better it can be, the better developers' quality of life will be, especially when working with very big projects.Today, there are some significant limitations. From docs:
What would we need in order to return reliable, up-to-date results? I imagine dbt would need to:
What would we need to avoid re-parsing all files when
--vars
change? I imagine it's similar to above! In both cases, if dbt could statically analyze which files directly or indirectly depend on the{{ env_var() }}
macro and{{ var() }}
macros, and compare the values in subsequent runs, it can know whether to re-parse certain files.What would need to change to turn this on for all projects by default? I imagine it's having good answers to env vars above. (Currently, dbt will return correct parsed results without speed improvements if
--vars
change, but there's a chance it returns incorrect results if env vars change.) There may be other things I'm missing as well.Alternatively, we could distinguish between the two and say that env vars are a non-dbt construct, whereas
--vars
are a dbt construct for which it must take responsibility. It's up to the developer / deployment orchestrator to ensure that, if partial parse is being used, all env vars are consistent across the pickled and current runtimes.The text was updated successfully, but these errors were encountered: