Add generic rule: models should implement uniqueness test for their PK #90

matthieucan · 2024-12-24T15:00:42Z

For models materialized as table or incremental:

Loop over their columns to extract PK
Loop over their tests to find a uniqueness test matching the PK columns

Consider both table- and column-level constraints and tests.

druzhinin-kirill

💪

src/dbt_score/rules/generic.py

jochemvandooren

Looks good! Some minor comments

CHANGELOG.md

src/dbt_score/models.py

druzhinin-kirill · 2025-01-18T10:32:04Z

src/dbt_score/rules/generic.py

+@rule(rule_filters={is_table()})
+def has_uniqueness_test(model: Model) -> RuleViolation | None:
+    """Model has uniqueness test for primary key."""
+    # ruff: noqa: C901 [too-complex]
+    # ruff: noqa: PLR0912 [too-many-branches]
+
+    # Extract PK
+    pk_columns = None
+    # At column level?
+    for column in model.columns:
+        for column_constraint in column.constraints:
+            if column_constraint.type == "primary_key":
+                pk_columns = [column.name]
+                break
+        else:
+            continue
+        break
+    # Or at table level?
+    if pk_columns is None:
+        for model_constraint in model.constraints:
+            if model_constraint.type == "primary_key":
+                pk_columns = model_constraint.columns
+                break
+
+    if pk_columns is None: # No PK, no need for uniqueness test
+        return None
+
+    # Look for matching uniqueness test
+    if len(pk_columns) == 1:
+        for column in model.columns:
+            if column.name == pk_columns[0]:
+                for data_test in column.tests:
+                    if data_test.type == "unique":
+                        return None
+
+    for data_test in model.tests:
+        if data_test.type == "unique_combination_of_columns":
+            if set(data_test.kwargs.get("combination_of_columns")) == set(pk_columns): # type: ignore
+                return None
+
+    return RuleViolation("There is no uniqueness test defined and matching the PK.")


Suggested change

@rule(rule_filters={is_table()})

def has_uniqueness_test(model: Model) -> RuleViolation | None:

"""Model has uniqueness test for primary key."""

# ruff: noqa: C901 [too-complex]

# ruff: noqa: PLR0912 [too-many-branches]

# Extract PK

pk_columns = None

# At column level?

for column in model.columns:

for column_constraint in column.constraints:

if column_constraint.type == "primary_key":

pk_columns = [column.name]

break

else:

continue

break

# Or at table level?

if pk_columns is None:

for model_constraint in model.constraints:

if model_constraint.type == "primary_key":

pk_columns = model_constraint.columns

break

if pk_columns is None: # No PK, no need for uniqueness test

return None

# Look for matching uniqueness test

if len(pk_columns) == 1:

for column in model.columns:

if column.name == pk_columns[0]:

for data_test in column.tests:

if data_test.type == "unique":

return None

for data_test in model.tests:

if data_test.type == "unique_combination_of_columns":

if set(data_test.kwargs.get("combination_of_columns")) == set(pk_columns): # type: ignore

return None

return RuleViolation("There is no uniqueness test defined and matching the PK.")

pk_columns = next(

(

[column.name]

for column in model.columns

for constraint in column.constraints

if constraint.type == "primary_key"

),

[],

)

if not pk_columns:

pk_columns = next(

(

constraint.columns

for constraint in model.constraints

if constraint.columns and constraint.type == "primary_key"

),

[],

)

if len(pk_columns) == 1:

pk_column = next(

column for column in model.columns if column.name == pk_columns[0]

)

if not any(test.type == "unique" for test in pk_column.tests):

return RuleViolation("There is no uniqueness test defined.")

elif len(pk_columns) > 1:

constraint_test = next(

(

test

for test in model.tests

if test.type == "unique_combination_of_columns"

),

None,

)

if not constraint_test:

return RuleViolation("There is no uniqueness test defined.")

if set(constraint_test.kwargs.get("combination_of_columns", [])) == set(

pk_columns

):

return RuleViolation("Uniqueness test does not match the PK.")

I tried to get rid of lint exceptions and make it a bit more flat - feel free to use fully or partially

Thanks for challenging! I adopted some of it, making the function flatter.
I didn't take the list comprehensions however, as keeping breaks improves performances a bit. I could get rid of one ruff-ignore, not the other unfortunately.
Overall, while simplifying the logic, I was led to add 2 related rules, making it clearer when a model doesn't conform to expectations. PTAL :)

jochemvandooren

Nice, works well! 🚀

npeshkov

Proposing some changes to text and approving 👍

npeshkov · 2025-01-24T07:44:01Z

src/dbt_score/rules/generic.py

 def has_uniqueness_test(model: Model) -> RuleViolation | None:
    """Model has uniqueness test for primary key."""


Suggested change

def has_uniqueness_test(model: Model) -> RuleViolation | None:

"""Model has uniqueness test for primary key."""

def primary_key_uniqueness_is_tested(model: Model) -> RuleViolation | None:

"""Model tests uniqueness of primary keys."""

I could go with has_uniqueness_test_for_pk, but find the suggestion too verbose

npeshkov · 2025-01-24T08:04:36Z

src/dbt_score/rules/generic.py

-
-    return RuleViolation("There is no uniqueness test defined and matching the PK.")
+    return RuleViolation(
+        f"No uniqueness test defined and matching PK {','.join(pk_columns)}."


Suggested change

f"No uniqueness test defined and matching PK {','.join(pk_columns)}."

f"No uniqueness test defined for matching Primary Keys {','.join(pk_columns)}."

While acronyms are sometimes more confusing than anything else, in this context, I believe PK is well understood

npeshkov · 2025-01-24T08:06:32Z

src/dbt_score/rules/generic.py

                for data_test in column.tests:
                    if data_test.type == "unique":
                        return None
+                return RuleViolation(
+                    f"No unique constraint defined on PK column {column.name}."


Suggested change

f"No unique constraint defined on PK column {column.name}."

f"No unique constraint defined on Primary Key column {column.name}."

matthieucan requested review from michael-the1, jochemvandooren and druzhinin-kirill December 24, 2024 15:00

matthieucan self-assigned this Dec 24, 2024

druzhinin-kirill reviewed Dec 30, 2024

View reviewed changes

src/dbt_score/rules/generic.py Outdated Show resolved Hide resolved

src/dbt_score/rules/generic.py Outdated Show resolved Hide resolved

matthieucan added 2 commits January 6, 2025 18:02

Add generic rule: models should implement uniqueness test for their PK

37c96fc

Implement rule filter for tables

8461cf2

matthieucan force-pushed the matthieucan/rule-uniqueness-test branch from 16bca96 to 8461cf2 Compare January 6, 2025 17:27

matthieucan added 3 commits January 6, 2025 18:30

Add changelog entry

ee2c465

Fix mypy

a165455

mypy is weird

ce420dc

jochemvandooren reviewed Jan 7, 2025

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

src/dbt_score/models.py Outdated Show resolved Hide resolved

matthieucan added 2 commits January 16, 2025 15:32

Use Constraints class for model-level constraints

1323a3d

Update changelog

8cbda66

matthieucan requested review from jochemvandooren and druzhinin-kirill January 16, 2025 14:34

druzhinin-kirill reviewed Jan 18, 2025

View reviewed changes

jochemvandooren approved these changes Jan 23, 2025

View reviewed changes

Improve logic of rule, add new related rules

071c0d2

matthieucan requested a review from druzhinin-kirill January 23, 2025 22:05

prettier

7455fd5

npeshkov approved these changes Jan 24, 2025

View reviewed changes

matthieucan added 3 commits January 27, 2025 14:28

Fix ruff

06b4aeb

Fix changelog

01ea269

prettier

d3f3168

jochemvandooren enabled auto-merge (squash) January 27, 2025 13:31

matthieucan added 2 commits January 27, 2025 14:37

mypy

ea63a56

Undo mypy import-not-found

9d71dc9

jochemvandooren merged commit 28b6441 into master Jan 27, 2025
4 checks passed

jochemvandooren deleted the matthieucan/rule-uniqueness-test branch January 27, 2025 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic rule: models should implement uniqueness test for their PK #90

Add generic rule: models should implement uniqueness test for their PK #90

matthieucan commented Dec 24, 2024

druzhinin-kirill left a comment

jochemvandooren left a comment

druzhinin-kirill Jan 18, 2025 •

edited

Loading

matthieucan Jan 23, 2025

jochemvandooren left a comment

npeshkov left a comment

npeshkov Jan 24, 2025

matthieucan Jan 24, 2025

npeshkov Jan 24, 2025

matthieucan Jan 24, 2025

npeshkov Jan 24, 2025

		def has_uniqueness_test(model: Model) -> RuleViolation \| None:
		"""Model has uniqueness test for primary key."""

	f"No uniqueness test defined and matching PK {','.join(pk_columns)}."
	f"No uniqueness test defined for matching Primary Keys {','.join(pk_columns)}."

	f"No unique constraint defined on PK column {column.name}."
	f"No unique constraint defined on Primary Key column {column.name}."

Add generic rule: models should implement uniqueness test for their PK #90

Add generic rule: models should implement uniqueness test for their PK #90

Conversation

matthieucan commented Dec 24, 2024

druzhinin-kirill left a comment

Choose a reason for hiding this comment

jochemvandooren left a comment

Choose a reason for hiding this comment

druzhinin-kirill Jan 18, 2025 • edited Loading

Choose a reason for hiding this comment

matthieucan Jan 23, 2025

Choose a reason for hiding this comment

jochemvandooren left a comment

Choose a reason for hiding this comment

npeshkov left a comment

Choose a reason for hiding this comment

npeshkov Jan 24, 2025

Choose a reason for hiding this comment

matthieucan Jan 24, 2025

Choose a reason for hiding this comment

npeshkov Jan 24, 2025

Choose a reason for hiding this comment

matthieucan Jan 24, 2025

Choose a reason for hiding this comment

npeshkov Jan 24, 2025

Choose a reason for hiding this comment

druzhinin-kirill Jan 18, 2025 •

edited

Loading