-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wildcard aggregation by skipping #448
Fix wildcard aggregation by skipping #448
Conversation
At least this issue made me realize that I hadn't set skip-region-aggregation where I should have done so... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands right now, I think this PR does not fix the problem.
See details below.
@@ -139,6 +139,9 @@ def check_aggregate(self, df: IamDataFrame, **kwargs) -> None: | |||
|
|||
with adjust_log_level(level="WARNING"): | |||
for code in df.variable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of this PR but, I'd strongly suggest renaming the iterator to variable since we're looking at a variable (from a data frame) and not a variable code.
for code in df.variable: | |
for variable in df.variable: |
nomenclature/code.py
Outdated
@@ -208,6 +209,13 @@ def deserialize_json(cls, v): | |||
def convert_none_to_empty_string(cls, v): | |||
return v if v is not None else "" | |||
|
|||
@model_validator(mode="after") | |||
def wildcard_must_skip_region_aggregation(cls, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the pydantic docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators), a model_validator that's run "after" uses self:
def wildcard_must_skip_region_aggregation(cls, data): | |
def wildcard_must_skip_region_aggregation(self) -> Self: |
nomenclature/code.py
Outdated
@@ -208,6 +209,13 @@ def deserialize_json(cls, v): | |||
def convert_none_to_empty_string(cls, v): | |||
return v if v is not None else "" | |||
|
|||
@model_validator(mode="after") | |||
def wildcard_must_skip_region_aggregation(cls, data): | |||
if "*" in data.name and data.skip_region_aggregation is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always like using properties to make code more readable, so I'd suggest the following:
if "*" in data.name and data.skip_region_aggregation is False: | |
if self.is_wildcard and self.skip_region_aggregation is False: |
raise ValueError( | ||
f"Wildcard variable '{data.name}' must skip region aggregation" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A validator also needs to return the instance:
return self |
"model_name, region_names", | ||
[("model_a", ("region_A", "region_B")), ("model_b", ("region_A", "region_b"))], | ||
) | ||
def test_region_processing_wildcard_skip_aggregation(model_name, region_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test misses a crucial case where the code would currently still fail. The model mappings contain no aggregation instructions. I think they have to since most, if not all, real world examples contain aggregation instructions.
Because it skips all aggregation it also skips over a point where the code would still fail (I think).
I've added comments above where I think that would be.
@@ -610,7 +612,9 @@ def vars_kwargs(self, variables: list[str]) -> list[VariableCode]: | |||
return [ | |||
self[var] | |||
for var in variables | |||
if self[var].agg_kwargs and not self[var].skip_region_aggregation | |||
if var in self.keys() | |||
and self[var].agg_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we attempted aggregation with wildcard variables (which is the real world example because we need to look at them to decide if we skip or not) this code would throw and error because the lookup for: a variable called Variable|1
would fail if there's a code Variable*
for var in variables | ||
if not self[var].agg_kwargs and not self[var].skip_region_aggregation | ||
if var in self.keys() | ||
and not self[var].agg_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we attempted aggregation with wildcard variables (which is the real world example because we need to look at them to decide if we skip or not) this code would throw and error because the lookup for: a variable called Variable|1 would fail if there's a code Variable*
This PR is an alternative solution to #446, assuming that wildcard-variables should not be aggregated.
So this PR implements the following:
In the process, I also noticed that the test
test_region_processing_skip_aggregation
passed even though the mapping-file was misspelled (m_a instead of model_a), but because the test looked at skipping aggregation, skip-aggregation and no mapping actually had the same effect. I fixed it anywaycloses #444
closes #445