-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations during encoding #11
base: master
Are you sure you want to change the base?
Conversation
Using associated methods by member access is more idiomatic.
(normalisation | ||
normalisation | ||
* f64::cos((PI * x as f64 * a) / width as f64) | ||
* f64::cos((PI * y as f64 * b) / height as f64)) | ||
* f64::cos((PI * y as f64 * b) / height as f64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just gets rid of a rustc warning
fn encode_dc(value: [f64; 3]) -> usize { | ||
let rounded_r = linear_to_srgb(value[0]); | ||
let rounded_g = linear_to_srgb(value[1]); | ||
let rounded_b = linear_to_srgb(value[2]); | ||
((rounded_r << 16) + (rounded_g << 8) + rounded_b) as usize | ||
fn encode_dc([r, g, b]: [f64; 3]) -> usize { | ||
let rounded = |v| linear_to_srgb(v); | ||
((rounded(r) << 16) + (rounded(g) << 8) + rounded(b)) as usize | ||
} | ||
|
||
fn encode_ac(value: [f64; 3], maximum_value: f64) -> usize { | ||
let quant_r = f64::floor(f64::max( | ||
0f64, | ||
f64::min( | ||
18f64, | ||
f64::floor(sign_pow(value[0] / maximum_value, 0.5) * 9f64 + 9.5), | ||
), | ||
)); | ||
let quant_g = f64::floor(f64::max( | ||
0f64, | ||
f64::min( | ||
18f64, | ||
f64::floor(sign_pow(value[1] / maximum_value, 0.5) * 9f64 + 9.5), | ||
), | ||
)); | ||
let quant_b = f64::floor(f64::max( | ||
0f64, | ||
f64::min( | ||
18f64, | ||
f64::floor(sign_pow(value[2] / maximum_value, 0.5) * 9f64 + 9.5), | ||
), | ||
)); | ||
|
||
(quant_r * 19f64 * 19f64 + quant_g * 19f64 + quant_b) as usize | ||
fn encode_ac([r, g, b]: [f64; 3], maximum_value: f64) -> usize { | ||
let quant = |v| { | ||
(sign_pow(v / maximum_value, 0.5) * 9. + 9.5) | ||
.floor() | ||
.min(18.) | ||
.max(0.) | ||
}; | ||
|
||
(quant(r) * 19f64 * 19f64 + quant(g) * 19f64 + quant(b)) as usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a closure here drastically reduces the code size. It has the exact same behavior.
Using a closure will not have any overhead, since the virtual function does not escape the parent function.
.map(|[a, b, c]| f64::max(f64::max(f64::abs(a), f64::abs(b)), f64::abs(c))) | ||
.max_by(|a, b| a.partial_cmp(b).unwrap_or(Ordering::Equal)) | ||
.iter() | ||
.map(|channels| channels.iter().copied().map(f64::abs).max_by(maxf).unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are iterating over an array, this will not be any less efficient than the hand-rolled version because it will likely be unrolled or vectorized.
I didn't know whether to expect #10 to be accepted, but since it was, here are some similar optimizations (with more detailed explanations).
Summary
Using the iterator extend pattern can drastically reduce the number of allocations (as well as total memory allocated!) when generating the hash for image (in the
encode
function).Status Quo
Previously,
encode_base83_string
allocated a newString
every time it was called. This incurred a heap allocation at least 3 times, despite the string only being between 1 and 4 characters long. Additionally, another heap allocation would be incurred for every encoded ac component.Optimizations
hash
string so that it never has to be resized. Visually counting up the number of characters to be placed in the string yields this equation for the hash string size:1 + 1 + 4 + 2(factors)
. Specifically, 1 for the size flag, 1 for quantized maximum value (or a zero), 4 for the dc component, and 2 times the number of factors (i.e.2 * ac.len()
).char
iterator rather than appending aString
. Changing the return of the internal functionencode_base83_string
to the opaque typeimpl Iterator<Item=char>
means that the characters of the hash string can be lazily generated without intermediate heap allocations for eachString
. The iterator is then consumed by theString::extend
method, which takes into account the size hint of the iterator; since the size hint will be exact (checkRangeFull::size_hint
andMap::size_hint
), this means that the method will always know the exact size to reserve during the extension. However, since we already preallocated, we will never have the reserve anything. This means thatextend
will simply be a safety size check (simply an integer comparison) and then a bunch ofpush
calls (usually just 1 or 2 anyway).Implications and Benefits
All in all, we reduce the number of allocations from
O(n)
toO(1)
(wheren
is the number of ac components). This is beneficial because the WASM memory model punishes frequent, small allocations. Additionally, using less memory when usingwee_alloc
is beneficial becausewee_alloc
's allocation is anO(n)
operation. Moreover,wee_alloc
has a two-word overhead for each allocation, meaning we have a constant four words of overhead rather than an unbounded amount.