You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's easy to have high memory usage without noticing, or to add regressions that increase memory usage. As a developer of a library, I would like to ensure my code doesn't use too much memory; making assertions about memory usage in tests is one way to do this.
The key measure is peak or high water mark memory usage. The memory usage pattern in scientific computing involves spiky large allocations, and the peak is the bottleneck that will drive hardware requirements.
Unlike performance benchmarking, memory measurements can be integrated into existing tests as an additional assertion. It does not require a special setup. So this is actually pretty easy technically, the main constraint is education and cultural norms around what counts as best practices.
High memory usage is a lot harder to notice than slow code, because until you hit a certain threshold and start swapping it doesn't have visible symptoms. But it has a significant financial cost at scale. For an example of regression see e.g. pola-rs/polars#15098: someone introduced a bug and never noticed because main symptom was higher memory usage.
The text was updated successfully, but these errors were encountered:
For libraries that use other memory allocators, options include:
With no code changes needed, you can use pytest-memray.
Wrap existing allocation APIs with tracemalloc registration APIs, e.g. https://github.com/pola-rs/polars/blob/main/py-polars/src/memory.rs. Given performance sensitivity Polars only does this in debug compilation profile, which is what is used to run unit tests during development.
itamarst
changed the title
Memory benchmarking
Memory usage assertions in tests
Jul 20, 2024
It's easy to have high memory usage without noticing, or to add regressions that increase memory usage. As a developer of a library, I would like to ensure my code doesn't use too much memory; making assertions about memory usage in tests is one way to do this.
The key measure is peak or high water mark memory usage. The memory usage pattern in scientific computing involves spiky large allocations, and the peak is the bottleneck that will drive hardware requirements.
Unlike performance benchmarking, memory measurements can be integrated into existing tests as an additional assertion. It does not require a special setup. So this is actually pretty easy technically, the main constraint is education and cultural norms around what counts as best practices.
High memory usage is a lot harder to notice than slow code, because until you hit a certain threshold and start swapping it doesn't have visible symptoms. But it has a significant financial cost at scale. For an example of regression see e.g. pola-rs/polars#15098: someone introduced a bug and never noticed because main symptom was higher memory usage.
The text was updated successfully, but these errors were encountered: