-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor monthly quickflow function #1109
Conversation
def qfi_sum_op(*qf_values): | ||
"""Sum the monthly qfis.""" | ||
qf_sum = numpy.zeros(qf_values[0].shape) | ||
valid_mask = ~utils.array_equals_nodata(qf_values[0], qf_nodata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were assuming that all 12 monthly quickflow rasters would have nodata in the same areas, but that's not necessarily true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would that not be true? If a pixel does have no quickflow in a month, I'd expect it to be 0. If its outside of the model domain it would be always nan?
exp_result[nonzero_e1_mask] = numpy.exp( | ||
(0.8 * valid_si[nonzero_e1_mask]) / a_im[nonzero_e1_mask] + | ||
numpy.log(E1[nonzero_e1_mask])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old implementation took advantage of exponent math to avoid overflow: By rewriting exp(0.8 * s_i / a_im) * E1(s_i / a_im)
as the equivalent exp(0.8 * s_i / a_im + log(E1(s_i / a_im)))
, the result of exp
wouldn't get so large. But then, there's another edge case where E1 = 0 and log(E1) = infinity. That was handled with the nonzero_e1_mask
above.
I replaced that with handling the overflow warning, below. It's closer to the original equation, and IMO it's clearer what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @emlys, thanks for looking into this and being thorough. I like your approach in breaking out the components and the documentation that accompanies it. I think the handling of the OverflowError is works well, but I do wonder if there's a way to constrain the problem further up somehow. I'll be interested to hear what Rafa says.
Oh, I didn't write this as a comment, but is there a reason you chose to "ignore" the error vs surround it in a try / except block?
Thanks!
HISTORY.rst
Outdated
@@ -45,6 +45,9 @@ Unreleased Changes | |||
now reprojected to the ``lulc_cur_path`` raster. This fixes a bug where | |||
rasters with a different SRS would appear to not intersect the | |||
``lulc_cur_path`` even if they did. (https://github.com/natcap/invest/issues/1093) | |||
* Seasonal Water Yield | |||
* Fixed a bug where monthy quickflow nodata pixels were not being passed | |||
on to the total quickflow raster (`#1105 <https://github.com/natcap/invest/issues/1105>`_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is user facing, maybe an additional note of: This could result in negative values on the edges
# Solution: Catch the overflow warning, and set the term to 0 | ||
# anywhere that overflow happens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any numerical constraints that make sense for s_i
, a_im
, or s_i / a_im
that would allow us to mask this out proactively? I'd be curious from the science side of what these terms represent when these boundaries are hit.
We're setting to 0 because if the exp
has overflowed, then we know the E1
term is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values of s_i / a_im
that make exp
overflow aren't common, but are totally possible with reasonable input data, for example:
CN = 30
P = 8 (millimeters of rain per month)
n = 10 (number of rain events per month)
s_i = 1000/CN - 10 = 23.33
a_im = P/(n * 25.4) = 8/(10 * 25.4) = 0.031
s_i / a_im = 23.33 / 0.031 = 752
exp(752) = 3.8 * 10^326, which overflows float64.
as far as numerical constraints, exp(x)
overflows when x
is about 709.78. I'm not sure if this might vary on different systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're setting to 0 because if the exp has overflowed, then we know the E1 term is 0?
yes, the E1 will be extremely close to 0 and "underflow" to 0. for example:
>>> scipy.special.exp1(200)
6.885226106307636e-90
>>> scipy.special.exp1(400)
4.776013586420972e-177
>>> scipy.special.exp1(700)
1.406518766234033e-307
>>> scipy.special.exp1(800)
0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emlys thanks for making this example. It makes numerically sense, but let's look at it from a process perspective. basically we have no precip (0.8 mm per rain is basically no rain) meeting a very permeable soil (very low curve number). As I see from the above, if we just use the threshold value of exp(709) then QF would be 0, right? Probably we can check that somehow bottom up? Calculate for each pixel what values of x would result in an overflow and set quickflow to 0 there? There might e a more elegant way. Let's think about it after thanksgiving.
|
||
# case 1: there is no precipitation where both p_im and n_m are | ||
# defined and equal to zero. | ||
case_1_mask = ~precip_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be mistaken or misinterpreting but case_1_mask
here could include nodata pixels, which would not be "defined".
precip_mask ~precip_mask
X V 0 0 1 0 1 0 1 X = NoData
V 0 0 1 0 0 0 1 1 V = Value > 0
V V X 1 1 0 0 0 1 0 = Value == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because precip_mask is defined above as:
# precip mask: both p_im and n_m are defined and greater than 0
precip_mask = valid_p_mask & valid_n_mask & (p_im > 0) & (n_m > 0)
it could include pixels that have nodata in the s_i
or stream
arrays, but not in the precip or n_events arrays.
That's because you don't need to know s_i
or stream
to calculate QF = 0 when P = 0
. I think you could argue that it should be nodata anyway, but this is consistent with the past implementation.
term_result[subcase_a_mask] = 0 | ||
|
||
# edge case 3b: set the whole term to 0 when exp() overflows | ||
subcase_b_mask = numpy.isinf(exp_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so ignoring the OverflowError
will leave exp_result
with an infinity value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes:
>>> numpy.exp(800)
<stdin>:1: RuntimeWarning: overflow encountered in exp
inf
To your question above
is there a reason you chose to "ignore" the error vs surround it in a try / except block?
Because it's a warning not an error, and a RuntimeWarning could be caused by other things whereas the numpy context manager lets you specifically ignore overflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting! I was testing math.exp(800)
and getting:
>>> math.exp(900)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: math range error
That makes sense.
case_3_mask = valid_mask & precip_mask & ~stream_mask | ||
|
||
term_result = numpy.full( | ||
qf_im[case_3_mask].shape, TARGET_NODATA, dtype=numpy.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is qf_im[case_3_mask].shape
== qf_im.shape
? Is there a reason to mask here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masking gets you the shape of where the mask is true, i.e. a 1-d array whose length is sum(mask)
.
>>> a = numpy.array([True, False, True, False])
>>> b = numpy.array([1, 1, 1, 1])
>>> b.shape
(4,)
>>> b[a].shape
(2,)
Co-authored-by: Doug <[email protected]>
@dcdenu4 I've updated this with the latest from our conversation with @lmandle and @schmittrjp last week! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emlys,
I had a few comments / suggestions.
I was also wondering if we talked about updating the Users Guide at all to better reflect what the model was doing under some of these instances? I can't remember if during our discussion with Rafa or Lisa that came up.
valid_mask = ( | ||
valid_p_mask & | ||
valid_n_mask & | ||
~utils.array_equals_nodata(stream, stream_nodata) & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this valid_mask
rework is missing the stream_mask
. This might work itself out below, but just noting it as I work through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the stream mask is separate
) | ||
|
||
# case 3d: set any negative values to 0 | ||
qf_im[qf_im < 0] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the nodata value we're using here positive or is it -1
? I'm seeing TARGET_NODATA
set as -1, so maybe we should have a nodata mask here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that!
expected_quickflow_array, atol=1e-5) | ||
|
||
def test_monthly_quickflow_large_si_aim_ratio(self): | ||
"""Test `_calculate_monthly_quick_flow` with undefined nodata values""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emlys, let's roll this in!
Description
Fixes #1318
This PR refactors the monthly quickflow function with the goal of making it clear how the different cases and edge cases are handled. I reorganized the function around 3 main cases, one of which has a couple edge cases. I tried to make the code line up as closely as possible with the quickflow equation in the user's guide. I hope that the big comment block clarifies what's going on.
The functional changes from #1318 are:
And a few trivial changes:
Checklist