-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triangulation speed #163
Triangulation speed #163
Conversation
Impressive speedups! We should implement in-memory caching for the triangulation but not sure exactly where that would belong. emsarray can certainly cache it (probably already does) on the dataset so maybe MoVE can keep track of this for any new files that are accessed and bypass triangulation for any new datasets that are opened and one already exists. |
What needs caching, for how long, and at what level of persistence is extremely application dependent so in this example MoVE is the correct place to implement a cache. The import xarray
from cachetools import LRUCache, cached
from emsarray.operations.cache import make_cache_key
from emsarray.operations.triangulate import triangulate_dataset
@cached(
cache=LRUCache(maxsize=10),
key=make_cache_key,
)
def triangulate_dataset_cached(dataset):
return triangulate_dataset(dataset) Other approaches using e.g. an external redis cache or persisting data to disk would be similar. |
Yep, I was thinking a redis server (with disk backup) would be a nice solution for a global cache but, yes, all at the application level. Thanks for the cachetools tip |
Thanks for these improvements @mx-moth, the polygon batching looks good to me and will be a big help with large models. |
48d045b
to
d6d6317
Compare
d6d6317
to
7ce5ed2
Compare
Further to #151 this PR adds improvements to both polygon generation speeds and triangulation speeds. The improvements compared to 0.7.0 are huge with many datasets triangulated in under a second. For some conventions the slowest part became constructing the dataset polygons rather than the triangulation, so that has been improved as well.
The following triangulation speeds were observed on my laptop for various datasets and versions of emsarray. I only did one run per version so these numbers are not a thorough benchmark.
Draft PR for now. The new private triangulation function could do with a unit test or two, although the unit tests of the public API implicitly test the private functions anyway.