Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wildcard aggregation by skipping #448

Merged

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Dec 19, 2024

This PR is an alternative solution to #446, assuming that wildcard-variables should not be aggregated.

So this PR implements the following:

  • Require that wildcard-variables have an explicit attribute "skip-region-aggregation: True" to avoid confusion and be forward-compatible in case we ever decide to change this behavior
  • Skip common-region-aggregation for any variables that are not explicitly listed in the VariableCodeList (i.e., wildcard variables)
  • Deactivate aggregate-check for any variables that are not explicitly listed in the VariableCodeList (i.e., wildcard variables)

In the process, I also noticed that the test test_region_processing_skip_aggregation passed even though the mapping-file was misspelled (m_a instead of model_a), but because the test looked at skipping aggregation, skip-aggregation and no mapping actually had the same effect. I fixed it anyway

closes #444
closes #445

@danielhuppmann
Copy link
Member Author

At least this issue made me realize that I hadn't set skip-region-aggregation where I should have done so...
IAMconsortium/common-definitions#249

Copy link
Contributor

@phackstock phackstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands right now, I think this PR does not fix the problem.
See details below.

@@ -139,6 +139,9 @@ def check_aggregate(self, df: IamDataFrame, **kwargs) -> None:

with adjust_log_level(level="WARNING"):
for code in df.variable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR but, I'd strongly suggest renaming the iterator to variable since we're looking at a variable (from a data frame) and not a variable code.

Suggested change
for code in df.variable:
for variable in df.variable:

@@ -208,6 +209,13 @@ def deserialize_json(cls, v):
def convert_none_to_empty_string(cls, v):
return v if v is not None else ""

@model_validator(mode="after")
def wildcard_must_skip_region_aggregation(cls, data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the pydantic docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators), a model_validator that's run "after" uses self:

Suggested change
def wildcard_must_skip_region_aggregation(cls, data):
def wildcard_must_skip_region_aggregation(self) -> Self:

@@ -208,6 +209,13 @@ def deserialize_json(cls, v):
def convert_none_to_empty_string(cls, v):
return v if v is not None else ""

@model_validator(mode="after")
def wildcard_must_skip_region_aggregation(cls, data):
if "*" in data.name and data.skip_region_aggregation is False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always like using properties to make code more readable, so I'd suggest the following:

Suggested change
if "*" in data.name and data.skip_region_aggregation is False:
if self.is_wildcard and self.skip_region_aggregation is False:

raise ValueError(
f"Wildcard variable '{data.name}' must skip region aggregation"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A validator also needs to return the instance:

Suggested change
return self

"model_name, region_names",
[("model_a", ("region_A", "region_B")), ("model_b", ("region_A", "region_b"))],
)
def test_region_processing_wildcard_skip_aggregation(model_name, region_names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test misses a crucial case where the code would currently still fail. The model mappings contain no aggregation instructions. I think they have to since most, if not all, real world examples contain aggregation instructions.
Because it skips all aggregation it also skips over a point where the code would still fail (I think).
I've added comments above where I think that would be.

@@ -610,7 +612,9 @@ def vars_kwargs(self, variables: list[str]) -> list[VariableCode]:
return [
self[var]
for var in variables
if self[var].agg_kwargs and not self[var].skip_region_aggregation
if var in self.keys()
and self[var].agg_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we attempted aggregation with wildcard variables (which is the real world example because we need to look at them to decide if we skip or not) this code would throw and error because the lookup for: a variable called Variable|1 would fail if there's a code Variable*

for var in variables
if not self[var].agg_kwargs and not self[var].skip_region_aggregation
if var in self.keys()
and not self[var].agg_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we attempted aggregation with wildcard variables (which is the real world example because we need to look at them to decide if we skip or not) this code would throw and error because the lookup for: a variable called Variable|1 would fail if there's a code Variable*

@phackstock phackstock merged commit bfa80ff into IAMconsortium:main Dec 20, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants