-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use iris' regridder caching for faster regridding? #2341
Comments
I'd recommend using a profiler, eg py-spy, first to profile a run of a typical recipe using regrid and then see where most time is spent. Of it turns out that it is indeed regridding, then we could consider this. |
That's a very good suggestion. Here is a run with ~20 variables of 1 year of ICON data (regridding only, nothing else). No caching (current main): esmvaltool-no_caching.json The function that calculates the weights is For that specific use case (multiple variables from one model) this can save up quite some time with minimal effort, so I think it's worth to include this. |
Maybe you could also try if it still saves time when running with #2316, as it looks like that's where we we're headed. How much memory do the weights take up? |
The weights are 2D arrays whose size scales with the number of source/target points. For example, for a 1000x1000 grid (which is much larger than commonly used for CMIP6 models), this sums up to ~8MiB assuming float64. Of course, for very very high resolution grids you might end up with ~1GiB, but I don't think that this matters on machines where you want to evaluate that kind of data. Also, the improvement shown here is only a lower bound for that use case, as it's data with a very low resolution. For higher resolutions, calculating the regridding weights will even take more time. For other use cases (like one variable, many models), this should not change anything regarding runtime. Why would you think that #2316 changes something here? |
Because there al preprocessor functions are run on dask workers (with limited memory), so the cache will likely not yet be available on the worker you're running on. Of course there are smarter ways to manage the cache that would also work in a dask cluster, but then you would need to copy it to the workers again.. |
Sorry, I can't get #2316 to run. With the same setup that I used for the other tests (4 workers with 7 GiB of memory each), the tool fails with a wild dask error (I guess because it runs out of memory), and after increasing the resources to 2 workers with 15 GiB this runs since an hour but is still at 0% (the other tests finished in ~3min). It would be also nice to discuss #2316 at some point. This looks like a drastic change, and to me it it's not 100% why "we're headed" that way and what the benefits of that are. On the other hand, I really think that we should implement this. Under the described circumstances, not reusing regridding weights is really bad practice (this is the most computationally intensive part). In all other cases, this won't change performance at all. I opened a PR about this (#2344) and it really doesn't require a lot of changes. We basically rely on the weights caching of iris (or other regridding libraries). |
Interesting, could you share the recipe (and required data) so I can see if I can find out why it does not work with #2316? So far, I've tested it with examples/recipe_easy_ipcc.yml and gotten very good performance (was able to run the recipe in half an hour instead of the original 3-4 hours). Of course, we won't adopt that approach if it turns out that it does not work after all. |
The result of investigating the issue with #2316 was that the FileLock that prevents downloading the grid multiple times is blocking the computation (fairly easy to solve) and |
Thanks for investigating! I still think it's worth to include this PR. What do you think? |
surely having to do with HDF5's thread unsafetyness! Well, I reviewed and approved Manu's PR which I think it's a good idea - caching is always a nice alternative to on the fly, if it's not a bulky, mem-hogging cache, that is 🍺 |
Some regridding schemes of iris allow reusing the regridding weights (see here for an overview of them), which might significantly speed up our
regrid
preprocessor if multiple variables of the same dataset are regridded. This is described here and basically boils down to using the following patterninstead of
This should be fairly easy to implement with a
dict
cache that uses the source and target latitude and longitude dimensions as keys (we cannot uselru_cache
since thehash
of a coordinate is simply itsid
, therefore this would never use cached results for different cubes).One drawback here is that we have an additional comparison of the coordinates for each
regrid
call, which makes it slightly slower if just a single variable for many data sets is analyzed.@ESMValGroup/esmvaltool-coreteam is this something we want? I can open a draft PR so that the discussion is easier.
The text was updated successfully, but these errors were encountered: