Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic rule: models should implement uniqueness test for their PK #90

Merged
merged 14 commits into from
Jan 27, 2025

Conversation

matthieucan
Copy link
Contributor

For models materialized as table or incremental:

  • Loop over their columns to extract PK
  • Loop over their tests to find a uniqueness test matching the PK columns

Consider both table- and column-level constraints and tests.

Copy link
Contributor

@druzhinin-kirill druzhinin-kirill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💪

src/dbt_score/rules/generic.py Outdated Show resolved Hide resolved
src/dbt_score/rules/generic.py Outdated Show resolved Hide resolved
@matthieucan matthieucan force-pushed the matthieucan/rule-uniqueness-test branch from 16bca96 to 8461cf2 Compare January 6, 2025 17:27
Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some minor comments

CHANGELOG.md Show resolved Hide resolved
src/dbt_score/models.py Outdated Show resolved Hide resolved
Comment on lines 56 to 96
@rule(rule_filters={is_table()})
def has_uniqueness_test(model: Model) -> RuleViolation | None:
"""Model has uniqueness test for primary key."""
# ruff: noqa: C901 [too-complex]
# ruff: noqa: PLR0912 [too-many-branches]

# Extract PK
pk_columns = None
# At column level?
for column in model.columns:
for column_constraint in column.constraints:
if column_constraint.type == "primary_key":
pk_columns = [column.name]
break
else:
continue
break
# Or at table level?
if pk_columns is None:
for model_constraint in model.constraints:
if model_constraint.type == "primary_key":
pk_columns = model_constraint.columns
break

if pk_columns is None: # No PK, no need for uniqueness test
return None

# Look for matching uniqueness test
if len(pk_columns) == 1:
for column in model.columns:
if column.name == pk_columns[0]:
for data_test in column.tests:
if data_test.type == "unique":
return None

for data_test in model.tests:
if data_test.type == "unique_combination_of_columns":
if set(data_test.kwargs.get("combination_of_columns")) == set(pk_columns): # type: ignore
return None

return RuleViolation("There is no uniqueness test defined and matching the PK.")
Copy link
Contributor

@druzhinin-kirill druzhinin-kirill Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@rule(rule_filters={is_table()})
def has_uniqueness_test(model: Model) -> RuleViolation | None:
"""Model has uniqueness test for primary key."""
# ruff: noqa: C901 [too-complex]
# ruff: noqa: PLR0912 [too-many-branches]
# Extract PK
pk_columns = None
# At column level?
for column in model.columns:
for column_constraint in column.constraints:
if column_constraint.type == "primary_key":
pk_columns = [column.name]
break
else:
continue
break
# Or at table level?
if pk_columns is None:
for model_constraint in model.constraints:
if model_constraint.type == "primary_key":
pk_columns = model_constraint.columns
break
if pk_columns is None: # No PK, no need for uniqueness test
return None
# Look for matching uniqueness test
if len(pk_columns) == 1:
for column in model.columns:
if column.name == pk_columns[0]:
for data_test in column.tests:
if data_test.type == "unique":
return None
for data_test in model.tests:
if data_test.type == "unique_combination_of_columns":
if set(data_test.kwargs.get("combination_of_columns")) == set(pk_columns): # type: ignore
return None
return RuleViolation("There is no uniqueness test defined and matching the PK.")
pk_columns = next(
(
[column.name]
for column in model.columns
for constraint in column.constraints
if constraint.type == "primary_key"
),
[],
)
if not pk_columns:
pk_columns = next(
(
constraint.columns
for constraint in model.constraints
if constraint.columns and constraint.type == "primary_key"
),
[],
)
if len(pk_columns) == 1:
pk_column = next(
column for column in model.columns if column.name == pk_columns[0]
)
if not any(test.type == "unique" for test in pk_column.tests):
return RuleViolation("There is no uniqueness test defined.")
elif len(pk_columns) > 1:
constraint_test = next(
(
test
for test in model.tests
if test.type == "unique_combination_of_columns"
),
None,
)
if not constraint_test:
return RuleViolation("There is no uniqueness test defined.")
if set(constraint_test.kwargs.get("combination_of_columns", [])) == set(
pk_columns
):
return RuleViolation("Uniqueness test does not match the PK.")

I tried to get rid of lint exceptions and make it a bit more flat - feel free to use fully or partially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for challenging! I adopted some of it, making the function flatter.
I didn't take the list comprehensions however, as keeping breaks improves performances a bit. I could get rid of one ruff-ignore, not the other unfortunately.
Overall, while simplifying the logic, I was led to add 2 related rules, making it clearer when a model doesn't conform to expectations. PTAL :)

Copy link
Contributor

@jochemvandooren jochemvandooren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, works well! 🚀

Copy link
Contributor

@npeshkov npeshkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing some changes to text and approving 👍

Comment on lines 81 to 82
def has_uniqueness_test(model: Model) -> RuleViolation | None:
"""Model has uniqueness test for primary key."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def has_uniqueness_test(model: Model) -> RuleViolation | None:
"""Model has uniqueness test for primary key."""
def primary_key_uniqueness_is_tested(model: Model) -> RuleViolation | None:
"""Model tests uniqueness of primary keys."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go with has_uniqueness_test_for_pk, but find the suggestion too verbose


return RuleViolation("There is no uniqueness test defined and matching the PK.")
return RuleViolation(
f"No uniqueness test defined and matching PK {','.join(pk_columns)}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"No uniqueness test defined and matching PK {','.join(pk_columns)}."
f"No uniqueness test defined for matching Primary Keys {','.join(pk_columns)}."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While acronyms are sometimes more confusing than anything else, in this context, I believe PK is well understood

for data_test in column.tests:
if data_test.type == "unique":
return None
return RuleViolation(
f"No unique constraint defined on PK column {column.name}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"No unique constraint defined on PK column {column.name}."
f"No unique constraint defined on Primary Key column {column.name}."

@jochemvandooren jochemvandooren enabled auto-merge (squash) January 27, 2025 13:31
@jochemvandooren jochemvandooren merged commit 28b6441 into master Jan 27, 2025
4 checks passed
@jochemvandooren jochemvandooren deleted the matthieucan/rule-uniqueness-test branch January 27, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants