Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor monthly quickflow function #1109

Merged
merged 19 commits into from
Jul 31, 2023
Merged

Refactor monthly quickflow function #1109

merged 19 commits into from
Jul 31, 2023

Conversation

emlys
Copy link
Member

@emlys emlys commented Nov 3, 2022

Description

Fixes #1318
This PR refactors the monthly quickflow function with the goal of making it clear how the different cases and edge cases are handled. I reorganized the function around 3 main cases, one of which has a couple edge cases. I tried to make the code line up as closely as possible with the quickflow equation in the user's guide. I hope that the big comment block clarifies what's going on.

The functional changes from #1318 are:

  • When the s_i / a_im ratio is greater than 100 (an arbitrary value I chose as per discussion with Lisa and Rafa), set QF to 0
  • When QF is negative, set it to 0

And a few trivial changes:

  • remove trailing zeros left over from python 2 era
  • use the same TARGET_NODATA variable for all output nodatas
  • remove 2 unused parameters of the monthly quickflow function

Checklist

  • Updated HISTORY.rst (if these changes are user-facing)
  • Updated the user's guide (if needed)
  • Tested the affected models' UIs (if relevant)

def qfi_sum_op(*qf_values):
"""Sum the monthly qfis."""
qf_sum = numpy.zeros(qf_values[0].shape)
valid_mask = ~utils.array_equals_nodata(qf_values[0], qf_nodata)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were assuming that all 12 monthly quickflow rasters would have nodata in the same areas, but that's not necessarily true.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that not be true? If a pixel does have no quickflow in a month, I'd expect it to be 0. If its outside of the model domain it would be always nan?

Comment on lines -980 to -1124
exp_result[nonzero_e1_mask] = numpy.exp(
(0.8 * valid_si[nonzero_e1_mask]) / a_im[nonzero_e1_mask] +
numpy.log(E1[nonzero_e1_mask]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation took advantage of exponent math to avoid overflow: By rewriting exp(0.8 * s_i / a_im) * E1(s_i / a_im) as the equivalent exp(0.8 * s_i / a_im + log(E1(s_i / a_im))), the result of exp wouldn't get so large. But then, there's another edge case where E1 = 0 and log(E1) = infinity. That was handled with the nonzero_e1_mask above.

I replaced that with handling the overflow warning, below. It's closer to the original equation, and IMO it's clearer what's going on.

@emlys emlys requested review from dcdenu4 and schmittrjp November 16, 2022 00:02
@emlys emlys self-assigned this Nov 16, 2022
@emlys emlys changed the title SWY quick flow function Fix quickflow masking bug and refactor quickflow function Nov 16, 2022
@emlys emlys marked this pull request as ready for review November 16, 2022 00:09
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @emlys, thanks for looking into this and being thorough. I like your approach in breaking out the components and the documentation that accompanies it. I think the handling of the OverflowError is works well, but I do wonder if there's a way to constrain the problem further up somehow. I'll be interested to hear what Rafa says.

Oh, I didn't write this as a comment, but is there a reason you chose to "ignore" the error vs surround it in a try / except block?

Thanks!

HISTORY.rst Outdated
@@ -45,6 +45,9 @@ Unreleased Changes
now reprojected to the ``lulc_cur_path`` raster. This fixes a bug where
rasters with a different SRS would appear to not intersect the
``lulc_cur_path`` even if they did. (https://github.com/natcap/invest/issues/1093)
* Seasonal Water Yield
* Fixed a bug where monthy quickflow nodata pixels were not being passed
on to the total quickflow raster (`#1105 <https://github.com/natcap/invest/issues/1105>`_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is user facing, maybe an additional note of: This could result in negative values on the edges

Comment on lines 979 to 980
# Solution: Catch the overflow warning, and set the term to 0
# anywhere that overflow happens.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any numerical constraints that make sense for s_i, a_im, or s_i / a_im that would allow us to mask this out proactively? I'd be curious from the science side of what these terms represent when these boundaries are hit.

We're setting to 0 because if the exp has overflowed, then we know the E1 term is 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values of s_i / a_im that make exp overflow aren't common, but are totally possible with reasonable input data, for example:

CN = 30
P = 8 (millimeters of rain per month)
n = 10 (number of rain events per month)

s_i = 1000/CN - 10 = 23.33
a_im = P/(n * 25.4) = 8/(10 * 25.4) = 0.031

s_i / a_im = 23.33 / 0.031 = 752

exp(752) = 3.8 * 10^326, which overflows float64.

as far as numerical constraints, exp(x) overflows when x is about 709.78. I'm not sure if this might vary on different systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're setting to 0 because if the exp has overflowed, then we know the E1 term is 0?

yes, the E1 will be extremely close to 0 and "underflow" to 0. for example:

>>> scipy.special.exp1(200)
6.885226106307636e-90
>>> scipy.special.exp1(400)
4.776013586420972e-177
>>> scipy.special.exp1(700)
1.406518766234033e-307
>>> scipy.special.exp1(800)
0.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emlys thanks for making this example. It makes numerically sense, but let's look at it from a process perspective. basically we have no precip (0.8 mm per rain is basically no rain) meeting a very permeable soil (very low curve number). As I see from the above, if we just use the threshold value of exp(709) then QF would be 0, right? Probably we can check that somehow bottom up? Calculate for each pixel what values of x would result in an overflow and set quickflow to 0 there? There might e a more elegant way. Let's think about it after thanksgiving.


# case 1: there is no precipitation where both p_im and n_m are
# defined and equal to zero.
case_1_mask = ~precip_mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken or misinterpreting but case_1_mask here could include nodata pixels, which would not be "defined".

            precip_mask    ~precip_mask
X V 0         0 1 0               1 0 1                 X = NoData
V 0 0         1 0 0               0 1 1                 V = Value > 0
V V X         1 1 0               0 0 1                 0 = Value == 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because precip_mask is defined above as:

# precip mask: both p_im and n_m are defined and greater than 0
precip_mask = valid_p_mask & valid_n_mask & (p_im > 0) & (n_m > 0)

it could include pixels that have nodata in the s_i or stream arrays, but not in the precip or n_events arrays.

That's because you don't need to know s_i or stream to calculate QF = 0 when P = 0. I think you could argue that it should be nodata anyway, but this is consistent with the past implementation.

term_result[subcase_a_mask] = 0

# edge case 3b: set the whole term to 0 when exp() overflows
subcase_b_mask = numpy.isinf(exp_result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so ignoring the OverflowError will leave exp_result with an infinity value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes:

>>> numpy.exp(800)
<stdin>:1: RuntimeWarning: overflow encountered in exp
inf

To your question above

is there a reason you chose to "ignore" the error vs surround it in a try / except block?

Because it's a warning not an error, and a RuntimeWarning could be caused by other things whereas the numpy context manager lets you specifically ignore overflows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! I was testing math.exp(800) and getting:

>>> math.exp(900)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: math range error

That makes sense.

case_3_mask = valid_mask & precip_mask & ~stream_mask

term_result = numpy.full(
qf_im[case_3_mask].shape, TARGET_NODATA, dtype=numpy.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is qf_im[case_3_mask].shape == qf_im.shape? Is there a reason to mask here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masking gets you the shape of where the mask is true, i.e. a 1-d array whose length is sum(mask).

>>> a = numpy.array([True, False, True, False])
>>> b = numpy.array([1, 1, 1, 1])
>>> b.shape
(4,)
>>> b[a].shape
(2,)

@dcdenu4 dcdenu4 added the on hold There's a reason we're not working on this yet label May 8, 2023
@emlys emlys changed the title Fix quickflow masking bug and refactor quickflow function Refactor quickflow function Jun 2, 2023
@emlys emlys changed the base branch from main to feature/output-spec June 2, 2023 17:59
@emlys emlys changed the base branch from feature/output-spec to main June 2, 2023 17:59
@emlys emlys removed the on hold There's a reason we're not working on this yet label Jun 21, 2023
@emlys
Copy link
Member Author

emlys commented Jun 23, 2023

@dcdenu4 I've updated this with the latest from our conversation with @lmandle and @schmittrjp last week!

@emlys emlys requested a review from dcdenu4 June 23, 2023 00:08
@emlys emlys changed the title Refactor quickflow function Refactor monthly quickflow function Jun 23, 2023
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emlys,

I had a few comments / suggestions.

I was also wondering if we talked about updating the Users Guide at all to better reflect what the model was doing under some of these instances? I can't remember if during our discussion with Rafa or Lisa that came up.

valid_mask = (
valid_p_mask &
valid_n_mask &
~utils.array_equals_nodata(stream, stream_nodata) &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this valid_mask rework is missing the stream_mask. This might work itself out below, but just noting it as I work through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the stream mask is separate

)

# case 3d: set any negative values to 0
qf_im[qf_im < 0] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the nodata value we're using here positive or is it -1? I'm seeing TARGET_NODATA set as -1, so maybe we should have a nodata mask here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that!

expected_quickflow_array, atol=1e-5)

def test_monthly_quickflow_large_si_aim_ratio(self):
"""Test `_calculate_monthly_quick_flow` with undefined nodata values"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docstring here?

@emlys emlys requested a review from dcdenu4 July 19, 2023 18:54
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emlys, let's roll this in!

@dcdenu4 dcdenu4 merged commit 8045806 into natcap:main Jul 31, 2023
@emlys emlys deleted the bugfix/1105 branch October 3, 2024 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SWY: Monthly quickflow implementation is unclear and unmaintainable
3 participants