Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basin and timestep information for error messages and warnings #723

Open
SnowHydrology opened this issue Feb 7, 2024 · 11 comments
Open
Labels
enhancement New feature or request

Comments

@SnowHydrology
Copy link
Contributor

The hydrologic models running in NextGen may fail for many reasons. Some of them have error messages that print to the console, which is helpful for diagnosing model-specific issues, but not for determining where and when the model failed.

Current behavior

Error messages and warning write to the console without location or time information.

Expected behavior

Concatenate basin and timestamp information to warning and error messages. Something like:

cat-<id>, timestep <model-timestep>: <error-message>

@SnowHydrology SnowHydrology added the enhancement New feature or request label Feb 7, 2024
@PhilMiller
Copy link
Contributor

I strongly agree having that information would improve the ability to isolate, diagnose, and fix issues within realization modules.

As a matter of implementation, prefixing the error message uniformly might be pretty hard without close cooperation with all the formulation implementers. For C++ and Python formulations, if they throw an exception, we can catch it and handle accordingly. For all languages, if they return BMI_FAILURE from the call to update() or update_until(), we can print the basin and timestep after whatever the module itself may have printed.

In theory, we could add verbose debug printing to emit basin, timestep, and module name before each call we make, but that would only be affordable for runs where the user knows they need it, because printing is expensive (relatively speaking).

@hellkite500
Copy link
Contributor

I've been looking at possible mechanisms for intercepting exit calls from the loaded library modules that may allow us to unwind the stack and provide additional context -- this would let us more gracefully shutdown when a loaded module calls exit or, in fortran stop or abort.

@PhilMiller
Copy link
Contributor

It would be pretty easy to at least set some variables indicating the relevant debugging variables, and add an atexit() handler to print them if exit() is called unexpectedly. I think that's a pretty good idea.

I would be really hesitant to intercept exit() or the like.

I did a slight bit of looking, and it may be workable to throw an exception in an atexit() handler, to skip directly exiting. That feels super sketchy to me, but ...

@SnowHydrology
Copy link
Contributor Author

@PhilMiller For the legacy models that we've updated and implemented with BMI (e.g. Noah-OWP-Modular), we have a lot of flexibility in modernizing code. If there's anything specific we should do to make the models more framework-friendly, please let us know.

@ajkhattak
Copy link
Contributor

@hellkite500 I was thinking of a simpler workflow that Keith and Phill somehow described -- verbosity at the ngen level with/without getting any information from the models.

For each timestep, before calling update or update_until, ngen should screen print metadata about that timestep such as: catchment_id, nexus_id, timestep, model, inputs_to_model, etc. Now when ngen calls a model to advance its timestep and the model crashes for some reason, we will already know the location (catchment, nexus) and the model that caused the crash. Once we know that, we can run the model in standalone mode for further debugging. Since printing is expensive, so this should be optional, controlled through some verbosity options in the realization file.

Other options could be based on BMI_FAILURE (as Phill mentioned) or each model should implement its own ValidTimeStep() function that could be used to validate if the model has taken a successful timestep or not. I think this will require a little bit of work, but doable...

@PhilMiller
Copy link
Contributor

I think the friendliest thing to do on the model side would be to make sure that the call returns BMI_FAILURE, possibly with a preceding print of the nature of the error.

For languages that can throw an exception, that's perfectly fine too, since those are cheap and easy to catch and can be handled the same way as a returned BMI_FAILURE.

I don't think we can do a ValidTimeStep() sort of thing, since that would be a function added to BMI, breaking all sorts of compatibility constraints.

@SnowHydrology
Copy link
Contributor Author

@PhilMiller, thanks for that. Below are my assumptions for implementing this in Noah-OWP-Modular for the error @yuqiong77 saw (linked here). Please correct anything that's wrong:

  • In this case, we would remove the STOP from the Noah-OWP-Modular code
  • Save the error message and make a flag
  • Add code to bmi::update that checks the error flag value and returns BMI_FAILURE and writes the message

Or something like that.

@ajkhattak
Copy link
Contributor

@SnowHydrology if I am understanding this correctly, won't removing STOP impact the functionality in standalone mode?

@PhilMiller sorry if I was not clear enough, the ValidTimeStep() will be part of the model and not the BMI, it will be called at the end of the update or update_until to set the flag BMI_FAILURE which will be returned to the ngen... does it make sense?

@SnowHydrology
Copy link
Contributor Author

@ajkhattak It shouldn't matter in the case of Noah-OM because we run the main program using BMI commands. So if we implement BMI_FAILURE correctly, it should exit the update routine.

But I haven't tested it, so the "shoulds" might not correspond to reality.

@PhilMiller
Copy link
Contributor

That all sounds good to me.

@SnowHydrology
Copy link
Contributor Author

@GreyEvenson-NOAA Can you open an issue on Noah-OWP-Modular to test this new implementation of error handling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants