Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimal validation of schema file yaml prior to partial parsing #3460

Merged
merged 1 commit into from
Jun 18, 2021

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Jun 15, 2021

resolves #3246

Description

Partial parsing expects a few things from the dictionary contents of a schema yaml file, but validation hasn't been performed yet, so certain types of errors in the yaml would result in unhelpful errors, such as "key error". This adds validation that the value of the expected yaml keys is a list, that the list contains dictionaries, and that the dictionaries contain a 'name' key.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 15, 2021
@gshank gshank temporarily deployed to Redshift June 15, 2021 20:49 Inactive
@gshank gshank temporarily deployed to Redshift June 15, 2021 20:49 Inactive
@gshank gshank temporarily deployed to Bigquery June 15, 2021 20:49 Inactive
@gshank gshank temporarily deployed to Bigquery June 15, 2021 20:49 Inactive
@gshank gshank temporarily deployed to Postgres June 15, 2021 20:49 Inactive
@gshank gshank force-pushed the minimal_pp_validation branch from ee905b1 to e0fc1a7 Compare June 15, 2021 20:51
@gshank gshank temporarily deployed to Postgres June 15, 2021 20:59 Inactive
@gshank gshank temporarily deployed to Redshift June 15, 2021 20:59 Inactive
@gshank gshank temporarily deployed to Redshift June 15, 2021 20:59 Inactive
@gshank gshank temporarily deployed to Bigquery June 15, 2021 20:59 Inactive
@gshank gshank temporarily deployed to Bigquery June 15, 2021 20:59 Inactive
@gshank gshank requested review from jtcohen6 and kwigley June 15, 2021 21:22
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

Encountered an error:
string indices must be integers

After:

Encountered an error:
Compilation Error
  The schema file at models/schema.yml is invalid because the value of 'models' is not a list

Definitely an improvement!

@@ -3,7 +3,8 @@
FilePath, ParseFileType, SourceFile, FileHash, AnySourceFile, SchemaSourceFile
)

from dbt.parser.schemas import yaml_from_file
from dbt.parser.schemas import yaml_from_file, schema_file_keys, check_format_version
from dbt.exceptions import CompilationException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later, not now: We may want a cleaner distinction between compile-time exceptions (Jinja rendering, SQL templating) and validation/parse-time exceptions. This would be one of the latter.

return source_file


# Do some minimal validation of the yaml in a schema file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing this validation on all yaml files, or just the ones we've identified to be new/changed during partial parsing?

Copy link
Contributor Author

@gshank gshank Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this for a while, and in theory we could get away with just doing it on changed and new files, but then the validation would have to be done in the partial parsing code, which felt wrong, and also yaml files with these errors wouldn't get any benefit from the better errors. Also we'd have different error messages depending on whether or not somebody was doing partial parsing.

@gshank gshank temporarily deployed to Postgres June 16, 2021 16:26 Inactive
@gshank gshank temporarily deployed to Bigquery June 16, 2021 16:27 Inactive
@gshank gshank temporarily deployed to Bigquery June 16, 2021 16:27 Inactive
@gshank gshank temporarily deployed to Redshift June 16, 2021 16:27 Inactive
@gshank gshank temporarily deployed to Redshift June 16, 2021 16:27 Inactive
@gshank gshank temporarily deployed to Snowflake June 16, 2021 16:27 Inactive
@gshank gshank temporarily deployed to Snowflake June 16, 2021 16:27 Inactive
@gshank gshank force-pushed the minimal_pp_validation branch from e0fc1a7 to eb47b85 Compare June 17, 2021 17:25
@gshank gshank temporarily deployed to Bigquery June 17, 2021 17:31 Inactive
@gshank gshank temporarily deployed to Bigquery June 17, 2021 17:31 Inactive
@gshank gshank temporarily deployed to Postgres June 17, 2021 17:31 Inactive
@gshank gshank temporarily deployed to Snowflake June 17, 2021 17:31 Inactive
@gshank gshank temporarily deployed to Snowflake June 17, 2021 17:31 Inactive
@gshank gshank temporarily deployed to Redshift June 17, 2021 17:31 Inactive
@gshank gshank temporarily deployed to Redshift June 17, 2021 17:31 Inactive
@gshank gshank merged commit eb46bfc into develop Jun 18, 2021
@gshank gshank deleted the minimal_pp_validation branch June 18, 2021 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dots in model names
3 participants