-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IntoParam<PCWSTR> for &str
unnecessarily allocates twice
#1712
Comments
I don't think you could know the length of the UTF-16 array until you iterate the result of |
You can't know, but you can have a good estimate. Most strings are ASCII, so when considering a string of N chars (UTF-8), you could allocate N wchars (UTF-16). My idea here is not to iterate twice, but to set a bound. And again, just one copy and one allocation would be made here as opposed to the two ones that exist right now.
Yes the collect call does reallocate too, which is a problem on its own. But GB is out of question here, the most MSVC allows is 64K strings. Additionally these strings are used ephemerally, they're just made to be compatible with the windows API and freed afterwards either way. |
The challenge is that you need to construct a |
Can't we build a |
Here is an example of how this could work: pub fn allocate_string(slice: &str) -> Result<*const u16> {
unsafe {
let len_with_nul = slice.len() + 1;
let ptr = HeapAlloc(
GetProcessHeap()?,
HEAP_ZERO_MEMORY,
len_with_nul * mem::size_of::<u16>(),
);
if ptr.is_null() {
return Err(E_OUTOFMEMORY.into());
}
{
let input = slice.as_bytes();
let output = slice::from_raw_parts_mut(ptr as *mut u16, len_with_nul);
let written = MultiByteToWideChar(
CP_UTF8,
MB_ERR_INVALID_CHARS | MB_PRECOMPOSED,
input,
output,
);
// TODO: MultiByteToWideChar error handling
output[written as usize] = 0;
}
Ok(ptr as *const u16)
}
}
pub fn free_string(ptr: *const u16) {
unsafe {
if let Ok(heap) = GetProcessHeap() {
HeapFree(heap, HEAP_ZERO_MEMORY, ptr as *const c_void);
}
}
} |
You can ask Not currently supported by the Luckily, you can also ask Rust to calculate the resulting UTF-16 string length, using its internal UTF-16 encoder implementation ( While I cannot see an implementation using either of the above to precalculate the required final buffer size that wouldn't add some complexity, saving a (transient) heap allocation is a pretty substantial justification for doing so (if added complexity is indeed a consequence, and I'm not just blind in seeing the solution). |
That's true of course! I meant it in the way of "you can't know unless you try" while In light of these strings primarily used ephemerally, I do believe it is acceptable to "over-allocate" to the upper bound and just do the UTF-8 to UTF-16 transcode process once and be done with it, no additional copies, encodings and allocations. Additionally, when allocating the memory from a zero initialized heap we get two advantages:
|
There's also the approach that https://github.com/microsoft/windows-rs/blob/master/crates/libs/windows/src/core/hstring.rs Ideally, we can just have a single approach. We could do something similar for |
That's a great idea! I'd heavily favour this one as it is most elegant and allows us to stay inside rust code. I've attempted to implement it: #1713 |
When converting from a
&str
to aPCWSTR
,encode_utf16().collect()
is first called to translate UTF-8 to UTF-16:windows-rs/crates/libs/windows/src/core/pcwstr.rs
Lines 46 to 50 in 5ea7a54
This allocates a new
Vec<u16>
with the UTF-16 characters. That in turn is passed to heap_string:windows-rs/crates/libs/windows/src/core/heap.rs
Lines 43 to 51 in 5ea7a54
This function allocates a new string with len + 1, copies the existing Vec into it and appends a null terminating character. This could be simplified by directly allocating a new UTF-16 Vec with space for all UTF-8 units + 1 and doing the UTF-8 to UTF-16 translation in place. This saves one copy and one allocation.
The text was updated successfully, but these errors were encountered: