Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Model versions parsing #7204

Closed
wants to merge 7 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ParserRef.from_version
MichelleArk committed Mar 28, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit cc3b6afa28784b157e7cd9d8f9fb74cfc33dd47d
17 changes: 16 additions & 1 deletion core/dbt/contracts/graph/unparsed.py
Original file line number Diff line number Diff line change
@@ -148,7 +148,22 @@ class UnparsedVersion(HasConfig, HasColumnProps):
columns: Sequence[Union[dbt.helper_types.IncludeExclude, UnparsedColumn]] = field(
default_factory=list
)
# TODO: all other model configs? Not sure HasDocs is the right thing to use here
# TODO: all other model configs? Not sure HasColumnProps is the right thing to use here

@property
def include_exclude(self) -> dbt.helper_types.IncludeExclude:
return self._include_exclude

def __post_init__(self):
has_include_exclude = False
self._include_exclude = dbt.helper_types.IncludeExclude(include=[])
for column in self.columns:
if isinstance(column, dbt.helper_types.IncludeExclude):
if not has_include_exclude:
self._include_exclude = column
has_include_exclude = True
else:
raise ParsingError("version can have at most one include/exclude element")


@dataclass
30 changes: 26 additions & 4 deletions core/dbt/parser/schemas.py
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@
import pathlib

from abc import ABCMeta, abstractmethod
from typing import Iterable, Dict, Any, Union, List, Optional, Generic, TypeVar, Type
from typing import Iterable, Dict, Any, Union, List, Optional, Generic, TypeVar, Type, Sequence

from dbt.dataclass_schema import ValidationError, dbtClassMixin

@@ -53,6 +53,7 @@
UnparsedMetric,
UnparsedSourceDefinition,
UnparsedGroup,
UnparsedVersion,
)
from dbt.exceptions import (
CompilationError,
@@ -151,6 +152,21 @@ def from_target(cls, target: Union[HasColumnDocs, HasColumnTests]) -> "ParserRef
refs._add(column)
return refs

@classmethod
def from_version(
cls, base_columns: Sequence[HasColumnProps], version: UnparsedVersion
) -> "ParserRef":
refs = cls()
for base_column in base_columns:
if version.include_exclude.includes(base_column.name):
refs._add(base_column)

for column in version.columns:
if isinstance(column, UnparsedColumn):
refs._add(column)

return refs


def _trimmed(inp: str) -> str:
if len(inp) < 50:
@@ -890,7 +906,7 @@ def parse_patch(self, block: TargetBlock[NodeTarget], refs: ParserRef) -> None:
f"file {source_file.path.original_file_path}"
)

# patch versioned nodes.
# patch versioned nodes
versions = block.target.versions
if versions:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the meat and potatoes of the change. Currently it's in NonSourceParser, which is subclassed by TestablePatchParser for parsing models, seeds, and snapshots. Given the increasing number of model-specific patching that's emerged as part of the 'Models as APIs' work (access, constraints, now versions) - I'm thinking this could be split out into a new subclass for parsing models patches independently of seeds and snapshots. Perhaps simply ModelPatchParser. What do you think @gshank, cc @peterallenwebb?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never been enthusiastic about having all that code in common. I understand why it was that way to start with, since originally there were no differences, but it provides friction when we need and want to handle differences. I'd be in favor of having them all subclasses, even if some of them were almost duplicates to start with.

latest_version = block.target.latest_version or max(
@@ -920,13 +936,19 @@ def parse_patch(self, block: TargetBlock[NodeTarget], refs: ParserRef) -> None:
versioned_model_node.fqn[-1] = patch.name # bleugh
versioned_model_node.fqn.append(f"v{unparsed_version.name}")
self.manifest.nodes[versioned_model_node.unique_id] = versioned_model_node

# flatten columns based on include/exclude
version_refs: ParserRef = ParserRef.from_version(
block.target.columns, unparsed_version
)

versioned_model_patch = ParsedNodePatch(
name=patch.name,
original_file_path=versioned_model_node.original_file_path,
yaml_key=patch.yaml_key,
package_name=versioned_model_node.package_name,
description=unparsed_version.description or versioned_model_node.description,
columns={}, # TODO: flatten columns based on include/exclude
columns=version_refs.column_info,
meta=unparsed_version.meta or versioned_model_node.meta,
docs=unparsed_version.docs or versioned_model_node.docs,
config=unparsed_version.config or versioned_model_node.config,
@@ -935,8 +957,8 @@ def parse_patch(self, block: TargetBlock[NodeTarget], refs: ParserRef) -> None:
is_latest_version=latest_version == unparsed_version.name,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opted to have an is_latest_version: bool attribute on the ModelNode instead of the raw latest_version: str attribute from the 'canonical' node patch as ref lookups need to know whether a given node is the latest version (logic here)

Could alternatively keep latest_version: str on the ModelNode and add an is_latest_version property if we'd like to keep the ModelNode mapping more closely to the patch.

)
Copy link
Contributor Author

@MichelleArk MichelleArk Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these fields need a closer inspection - but the gist of it is that the versioned model node should inherit properties and configs from the base node patch (where the versions block is configured), extending or overriding them as appropriate.

versioned_model_node.patch(versioned_model_patch)
# TODO - update alias
self.manifest.rebuild_ref_lookup() # TODO: is this necessary at this point?
# TODO - update alias
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't handle parsing any tests defined on versions within the versions block yet


# handle disabled nodes
1 change: 0 additions & 1 deletion test/unit/utils.py
Original file line number Diff line number Diff line change
@@ -210,7 +210,6 @@ def assert_to_dict(obj, dct):
dct["created_at"] = 1
if obj_to_dict != dct:
compare_dicts(obj_to_dict, dct)
# breakpoint()
assert obj_to_dict == dct