-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update mean and sum functions #643
base: develop
Are you sure you want to change the base?
Conversation
…correctly handle NaN values in coefficients irreg Updated mean an sum functions for FData, FDataGrid, FDataBasis and FDataIrregular to correctly handle NaN values in coefficients
A FDataBasis object with just one sample representing | ||
the mean of all the samples in the original object. | ||
""" | ||
super().mean(axis=axis, dtype=dtype, out=out, keepdims=keepdims, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no longer sure that we want to do any validation in the abstract class. It is confusing. I would rather move the validation to the subclasses, or, if we do not want to repeat code, to a function in _utils
or in a (maybe private for now) function in misc.validation
.
if min_count > 0: | ||
valid = ~np.isnan(self.data_matrix) | ||
n_valid = np.sum(valid, axis=0) | ||
data[n_valid < min_count] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a conditional be more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not seem to understand where and how you are suggesting to use a conditional, the code does seem clear to me (as the author, I might be biased)
skfda/representation/grid.py
Outdated
return self._compute_aggregate(operation='sum', skipna=skipna, | ||
min_count=min_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multiline expressions, our style guide is to put each parameter starting a line of its own, and the matching delimiter starting its own line (at the same indentation level as the line in which it is opened:
return self._compute_aggregate(operation='sum', skipna=skipna, | |
min_count=min_count) | |
return self._compute_aggregate( | |
operation='sum', | |
skipna=skipna, | |
min_count=min_count, | |
) |
Please, do the same in the other cases you edited.
if skipna: | ||
count_values = np.sum(~np.isnan(common_values), axis=0) | ||
else: | ||
count_values = np.full(sum_values.shape, self.n_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just self.n_samples
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To operate with sum_values, it is needed in array form to fit seamlessly with the flow of the case where skipna is specified
out: None = None, | ||
keepdims: bool = False, | ||
skipna: bool = False, | ||
min_count: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that min_count
is not being used here. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is left for compatibility with the mean functions of FDataIrregular and Grid, but it does not make sense to use it, as you do not have measurements for each observation, but simply the observations approximated by functions.
@@ -882,6 +882,7 @@ def mean( | |||
out: None = None, | |||
keepdims: bool = False, | |||
skipna: bool = False, | |||
min_count: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is min_count
removed?
skfda/representation/grid.py
Outdated
|
||
data = agg_func(self.data_matrix, axis=0, keepdims=True) | ||
|
||
if min_count > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be done if skipna == True
.
skfda/representation/irregular.py
Outdated
else: | ||
count_values = np.full(sum_values.shape, self.n_samples) | ||
|
||
if min_count > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be done if skipna == True
.
Update mean and sum functions for FData, FDataGrid, FDataIrregular and FDataBasis to correctly handle NaN values in coefficients.
Fixes #642
Describe the proposed changes
Edit the mean function from FData so that it only becomes a parameter check, leaving the checks as it is.
Add an auxiliar function in FDataGrid that works for mean, sum and var, and simply calls the relevant np.sum/nansum, mean/nanmean, var/nanvar when relevant depending on the skipna parameter, have the mean and sum function work with this auxiliar function.
Add a mean function in FDataBasis that calculates the means for the coefficients when the functions have no nan values in the coefficients, otherwise it is not considered for the calculations.
Add a mean function in FDataIrregular that calculates the mean based on the mean_counts parameter and depending on skipna or not.