Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triangulation speed #163

Merged
merged 4 commits into from
Nov 6, 2024
Merged

Triangulation speed #163

merged 4 commits into from
Nov 6, 2024

Conversation

mx-moth
Copy link
Contributor

@mx-moth mx-moth commented Nov 4, 2024

Further to #151 this PR adds improvements to both polygon generation speeds and triangulation speeds. The improvements compared to 0.7.0 are huge with many datasets triangulated in under a second. For some conventions the slowest part became constructing the dataset polygons rather than the triangulation, so that has been improved as well.

The following triangulation speeds were observed on my laptop for various datasets and versions of emsarray. I only did one run per version so these numbers are not a thorough benchmark.

Dataset Convention Polygons Triangles 0.7.0 (s) 0.8.0 (s) #163 (s)
gbr4 ShocSimple 92565 185130 18.3000 7.4340 0.5143
setas UGrid 10072 38496 5.9360 1.1820 0.3006
access_vt CFGrid1D 857960 1715920 218.0150 84.4070 4.1140
macq_extract ShocStandard 8083 16166 1.8460 1.0690 0.0813
austen UGrid 183810 727701 126.7560 27.7250 1.8460

Draft PR for now. The new private triangulation function could do with a unit test or two, although the unit tests of the public API implicitly test the private functions anyway.

@mx-moth mx-moth self-assigned this Nov 4, 2024
@frizwi
Copy link
Contributor

frizwi commented Nov 4, 2024

Impressive speedups! We should implement in-memory caching for the triangulation but not sure exactly where that would belong. emsarray can certainly cache it (probably already does) on the dataset so maybe MoVE can keep track of this for any new files that are accessed and bypass triangulation for any new datasets that are opened and one already exists.

@mx-moth
Copy link
Contributor Author

mx-moth commented Nov 4, 2024

Impressive speedups! We should implement in-memory caching for the triangulation but not sure exactly where that would belong. emsarray can certainly cache it (probably already does) on the dataset so maybe MoVE can keep track of this for any new files that are accessed and bypass triangulation for any new datasets that are opened and one already exists.

What needs caching, for how long, and at what level of persistence is extremely application dependent so in this example MoVE is the correct place to implement a cache. The emsarray.operations.cache module is intended to help application developers put together a cache that suits their needs. For example, an in-memory LRU cache with 10 entries can be implemented making use of the cachetools package:

import xarray
from cachetools import LRUCache, cached

from emsarray.operations.cache import make_cache_key
from emsarray.operations.triangulate import triangulate_dataset


@cached(
    cache=LRUCache(maxsize=10),
    key=make_cache_key,
)
def triangulate_dataset_cached(dataset):
    return triangulate_dataset(dataset)

Other approaches using e.g. an external redis cache or persisting data to disk would be similar.

@frizwi
Copy link
Contributor

frizwi commented Nov 4, 2024

Yep, I was thinking a redis server (with disk backup) would be a nice solution for a global cache but, yes, all at the application level. Thanks for the cachetools tip

@dengwirda
Copy link

Thanks for these improvements @mx-moth, the polygon batching looks good to me and will be a big help with large models.

@mx-moth mx-moth force-pushed the triangulation-speed branch 2 times, most recently from 48d045b to d6d6317 Compare November 5, 2024 04:03
@mx-moth mx-moth force-pushed the triangulation-speed branch from d6d6317 to 7ce5ed2 Compare November 5, 2024 06:19
@mx-moth mx-moth marked this pull request as ready for review November 5, 2024 07:17
@mx-moth mx-moth merged commit 0bbb470 into main Nov 6, 2024
15 checks passed
@mx-moth mx-moth deleted the triangulation-speed branch November 6, 2024 03:43
mx-moth added a commit that referenced this pull request Jan 23, 2025
mx-moth added a commit that referenced this pull request Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants