Fix wildcard aggregation by skipping #448

danielhuppmann · 2024-12-19T21:34:45Z

This PR is an alternative solution to #446, assuming that wildcard-variables should not be aggregated.

So this PR implements the following:

Require that wildcard-variables have an explicit attribute "skip-region-aggregation: True" to avoid confusion and be forward-compatible in case we ever decide to change this behavior
Skip common-region-aggregation for any variables that are not explicitly listed in the VariableCodeList (i.e., wildcard variables)
Deactivate aggregate-check for any variables that are not explicitly listed in the VariableCodeList (i.e., wildcard variables)

In the process, I also noticed that the test test_region_processing_skip_aggregation passed even though the mapping-file was misspelled (m_a instead of model_a), but because the test looked at skipping aggregation, skip-aggregation and no mapping actually had the same effect. I fixed it anyway

closes #444
closes #445

danielhuppmann · 2024-12-20T06:08:05Z

At least this issue made me realize that I hadn't set skip-region-aggregation where I should have done so...
IAMconsortium/common-definitions#249

phackstock

As it stands right now, I think this PR does not fix the problem.
See details below.

phackstock · 2024-12-20T09:44:44Z

nomenclature/definition.py

@@ -139,6 +139,9 @@ def check_aggregate(self, df: IamDataFrame, **kwargs) -> None:

        with adjust_log_level(level="WARNING"):
            for code in df.variable:


Not part of this PR but, I'd strongly suggest renaming the iterator to variable since we're looking at a variable (from a data frame) and not a variable code.

Suggested change

for code in df.variable:

for variable in df.variable:

phackstock · 2024-12-20T09:55:37Z

nomenclature/code.py

@@ -208,6 +209,13 @@ def deserialize_json(cls, v):
    def convert_none_to_empty_string(cls, v):
        return v if v is not None else ""

+    @model_validator(mode="after")
+    def wildcard_must_skip_region_aggregation(cls, data):


According to the pydantic docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators), a model_validator that's run "after" uses self:

Suggested change

def wildcard_must_skip_region_aggregation(cls, data):

def wildcard_must_skip_region_aggregation(self) -> Self:

phackstock · 2024-12-20T09:56:30Z

nomenclature/code.py

@@ -208,6 +209,13 @@ def deserialize_json(cls, v):
    def convert_none_to_empty_string(cls, v):
        return v if v is not None else ""

+    @model_validator(mode="after")
+    def wildcard_must_skip_region_aggregation(cls, data):
+        if "*" in data.name and data.skip_region_aggregation is False:


I always like using properties to make code more readable, so I'd suggest the following:

Suggested change

if "*" in data.name and data.skip_region_aggregation is False:

if self.is_wildcard and self.skip_region_aggregation is False:

phackstock · 2024-12-20T09:56:52Z

nomenclature/code.py

+            raise ValueError(
+                f"Wildcard variable '{data.name}' must skip region aggregation"
+            )
+


A validator also needs to return the instance:

Suggested change

return self

phackstock · 2024-12-20T10:07:22Z

tests/test_core.py

+    "model_name, region_names",
+    [("model_a", ("region_A", "region_B")), ("model_b", ("region_A", "region_b"))],
+)
+def test_region_processing_wildcard_skip_aggregation(model_name, region_names):


I think this test misses a crucial case where the code would currently still fail. The model mappings contain no aggregation instructions. I think they have to since most, if not all, real world examples contain aggregation instructions.
Because it skips all aggregation it also skips over a point where the code would still fail (I think).
I've added comments above where I think that would be.

phackstock · 2024-12-20T10:08:57Z

nomenclature/codelist.py

@@ -610,7 +612,9 @@ def vars_kwargs(self, variables: list[str]) -> list[VariableCode]:
        return [
            self[var]
            for var in variables
-            if self[var].agg_kwargs and not self[var].skip_region_aggregation
+            if var in self.keys()
+            and self[var].agg_kwargs


If we attempted aggregation with wildcard variables (which is the real world example because we need to look at them to decide if we skip or not) this code would throw and error because the lookup for: a variable called Variable|1 would fail if there's a code Variable*

phackstock · 2024-12-20T10:09:21Z

nomenclature/codelist.py

            for var in variables
-            if not self[var].agg_kwargs and not self[var].skip_region_aggregation
+            if var in self.keys()
+            and not self[var].agg_kwargs


If we attempted aggregation with wildcard variables (which is the real world example because we need to look at them to decide if we skip or not) this code would throw and error because the lookup for: a variable called Variable|1 would fail if there's a code Variable*

phackstock and others added 7 commits December 19, 2024 20:31

Add external repo cleanup

a2bad1e

Return variable name string directly

101b9cc

Update usage of vars_default_args

11d0ac8

Add a simple test

cde25cc

Implement skipping region-aggregation for wildcard variables

7815fb7

Require that wildcard-variables have explicit skip-region-aggregation

aa83924

Make ruff

829f07a

danielhuppmann requested review from phackstock and dc-almeida December 19, 2024 21:34

danielhuppmann self-assigned this Dec 19, 2024

phackstock requested changes Dec 20, 2024

View reviewed changes

phackstock added 2 commits December 20, 2024 13:39

Add region aggregation for wildcard aggregation test

0540ed3

Fix pydantic validator

38868d5

phackstock merged commit bfa80ff into IAMconsortium:main Dec 20, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wildcard aggregation by skipping #448

Fix wildcard aggregation by skipping #448

danielhuppmann commented Dec 19, 2024 •

edited

Loading

danielhuppmann commented Dec 20, 2024

phackstock left a comment

phackstock Dec 20, 2024

phackstock Dec 20, 2024

phackstock Dec 20, 2024

phackstock Dec 20, 2024

phackstock Dec 20, 2024

phackstock Dec 20, 2024

phackstock Dec 20, 2024

		@@ -139,6 +139,9 @@ def check_aggregate(self, df: IamDataFrame, **kwargs) -> None:

		with adjust_log_level(level="WARNING"):
		for code in df.variable:

	def wildcard_must_skip_region_aggregation(cls, data):
	def wildcard_must_skip_region_aggregation(self) -> Self:

	if "*" in data.name and data.skip_region_aggregation is False:
	if self.is_wildcard and self.skip_region_aggregation is False:

Fix wildcard aggregation by skipping #448

Fix wildcard aggregation by skipping #448

Conversation

danielhuppmann commented Dec 19, 2024 • edited Loading

danielhuppmann commented Dec 20, 2024

phackstock left a comment

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

phackstock Dec 20, 2024

Choose a reason for hiding this comment

danielhuppmann commented Dec 19, 2024 •

edited

Loading