Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations during encoding #11

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Spaceface16518
Copy link
Contributor

I didn't know whether to expect #10 to be accepted, but since it was, here are some similar optimizations (with more detailed explanations).

Summary

Using the iterator extend pattern can drastically reduce the number of allocations (as well as total memory allocated!) when generating the hash for image (in the encode function).

Status Quo

Previously, encode_base83_string allocated a new String every time it was called. This incurred a heap allocation at least 3 times, despite the string only being between 1 and 4 characters long. Additionally, another heap allocation would be incurred for every encoded ac component.

Optimizations

  1. Preallocating the hash string so that it never has to be resized. Visually counting up the number of characters to be placed in the string yields this equation for the hash string size: 1 + 1 + 4 + 2(factors). Specifically, 1 for the size flag, 1 for quantized maximum value (or a zero), 4 for the dc component, and 2 times the number of factors (i.e. 2 * ac.len()).
  2. Extending the string with a char iterator rather than appending a String. Changing the return of the internal function encode_base83_string to the opaque type impl Iterator<Item=char> means that the characters of the hash string can be lazily generated without intermediate heap allocations for each String. The iterator is then consumed by the String::extend method, which takes into account the size hint of the iterator; since the size hint will be exact (check RangeFull::size_hint and Map::size_hint), this means that the method will always know the exact size to reserve during the extension. However, since we already preallocated, we will never have the reserve anything. This means that extend will simply be a safety size check (simply an integer comparison) and then a bunch of push calls (usually just 1 or 2 anyway).

Implications and Benefits

All in all, we reduce the number of allocations from O(n) to O(1) (where n is the number of ac components). This is beneficial because the WASM memory model punishes frequent, small allocations. Additionally, using less memory when using wee_alloc is beneficial because wee_alloc's allocation is an O(n) operation. Moreover, wee_alloc has a two-word overhead for each allocation, meaning we have a constant four words of overhead rather than an unbounded amount.

Comment on lines -206 to +208
(normalisation
normalisation
* f64::cos((PI * x as f64 * a) / width as f64)
* f64::cos((PI * y as f64 * b) / height as f64))
* f64::cos((PI * y as f64 * b) / height as f64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just gets rid of a rustc warning

Comment on lines -303 to +316
fn encode_dc(value: [f64; 3]) -> usize {
let rounded_r = linear_to_srgb(value[0]);
let rounded_g = linear_to_srgb(value[1]);
let rounded_b = linear_to_srgb(value[2]);
((rounded_r << 16) + (rounded_g << 8) + rounded_b) as usize
fn encode_dc([r, g, b]: [f64; 3]) -> usize {
let rounded = |v| linear_to_srgb(v);
((rounded(r) << 16) + (rounded(g) << 8) + rounded(b)) as usize
}

fn encode_ac(value: [f64; 3], maximum_value: f64) -> usize {
let quant_r = f64::floor(f64::max(
0f64,
f64::min(
18f64,
f64::floor(sign_pow(value[0] / maximum_value, 0.5) * 9f64 + 9.5),
),
));
let quant_g = f64::floor(f64::max(
0f64,
f64::min(
18f64,
f64::floor(sign_pow(value[1] / maximum_value, 0.5) * 9f64 + 9.5),
),
));
let quant_b = f64::floor(f64::max(
0f64,
f64::min(
18f64,
f64::floor(sign_pow(value[2] / maximum_value, 0.5) * 9f64 + 9.5),
),
));

(quant_r * 19f64 * 19f64 + quant_g * 19f64 + quant_b) as usize
fn encode_ac([r, g, b]: [f64; 3], maximum_value: f64) -> usize {
let quant = |v| {
(sign_pow(v / maximum_value, 0.5) * 9. + 9.5)
.floor()
.min(18.)
.max(0.)
};

(quant(r) * 19f64 * 19f64 + quant(g) * 19f64 + quant(b)) as usize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a closure here drastically reduces the code size. It has the exact same behavior.

Using a closure will not have any overhead, since the virtual function does not escape the parent function.

.map(|[a, b, c]| f64::max(f64::max(f64::abs(a), f64::abs(b)), f64::abs(c)))
.max_by(|a, b| a.partial_cmp(b).unwrap_or(Ordering::Equal))
.iter()
.map(|channels| channels.iter().copied().map(f64::abs).max_by(maxf).unwrap())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are iterating over an array, this will not be any less efficient than the hand-rolled version because it will likely be unrolled or vectorized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant