Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntoParam<PCWSTR> for &str unnecessarily allocates twice #1712

Closed
AronParker opened this issue Apr 22, 2022 · 9 comments · Fixed by #1713
Closed

IntoParam<PCWSTR> for &str unnecessarily allocates twice #1712

AronParker opened this issue Apr 22, 2022 · 9 comments · Fixed by #1713
Labels
enhancement New feature or request

Comments

@AronParker
Copy link
Contributor

AronParker commented Apr 22, 2022

When converting from a &str to a PCWSTR, encode_utf16().collect() is first called to translate UTF-8 to UTF-16:

impl<'a> IntoParam<'a, PCWSTR> for &str {
fn into_param(self) -> Param<'a, PCWSTR> {
Param::Boxed(PCWSTR(heap_string(&self.encode_utf16().collect::<alloc::vec::Vec<u16>>())))
}
}

This allocates a new Vec<u16> with the UTF-16 characters. That in turn is passed to heap_string:

pub fn heap_string<T: Copy + Default + Sized>(slice: &[T]) -> *const T {
unsafe {
let buffer = heap_alloc((slice.len() + 1) * std::mem::size_of::<T>()).expect("could not allocate string") as *mut T;
assert!(buffer.align_offset(std::mem::align_of::<T>()) == 0, "heap allocated buffer is not properly aligned");
buffer.copy_from_nonoverlapping(slice.as_ptr(), slice.len());
buffer.add(slice.len()).write(T::default());
buffer
}
}

This function allocates a new string with len + 1, copies the existing Vec into it and appends a null terminating character. This could be simplified by directly allocating a new UTF-16 Vec with space for all UTF-8 units + 1 and doing the UTF-8 to UTF-16 translation in place. This saves one copy and one allocation.

@ryancerium
Copy link
Contributor

I don't think you could know the length of the UTF-16 array until you iterate the result of encode_utf16(), which would then require iterating it again to do the actual copy. Converting and iterating twice is probably cheaper than two (or more) heap allocations and copies, but I'm not sure - I'm not an expert on UTF conversions. The collect() call might cause the vector to have to do multiple reallocations and copies to store increasing amounts of data, i.e. multiple KB or MB or GB (!) of string data.

@AronParker
Copy link
Contributor Author

I don't think you could know the length of the UTF-16 array until you iterate the result of encode_utf16(), which would then require iterating it again to do the actual copy. Converting and iterating twice is probably cheaper than two (or more) heap allocations and copies, but I'm not sure - I'm not an expert on UTF conversions.

You can't know, but you can have a good estimate. Most strings are ASCII, so when considering a string of N chars (UTF-8), you could allocate N wchars (UTF-16). My idea here is not to iterate twice, but to set a bound. And again, just one copy and one allocation would be made here as opposed to the two ones that exist right now.

The collect() call might cause the vector to have to do multiple reallocations and copies to store increasing amounts of data, i.e. multiple KB or MB or GB (!) of string data.

Yes the collect call does reallocate too, which is a problem on its own. But GB is out of question here, the most MSVC allows is 64K strings. Additionally these strings are used ephemerally, they're just made to be compatible with the windows API and freed afterwards either way.

@kennykerr
Copy link
Collaborator

The challenge is that you need to construct a Param::Boxed(PCWSTR(value)) to hold the temporary value, otherwise just using a Vec would be ideal.

@AronParker
Copy link
Contributor Author

The challenge is that you need to construct a Param::Boxed(PCWSTR(value)) to hold the temporary value, otherwise just using a Vec would be ideal.

Can't we build a Vec::with_capacity use Vec::extend(self.encode_utf16()), leak it and then free it via the Param::Boxed function? We know it's gonna be from the Global Allocator of rust, or does the Allocator matter too?

@AronParker
Copy link
Contributor Author

AronParker commented Apr 22, 2022

Here is an example of how this could work:

pub fn allocate_string(slice: &str) -> Result<*const u16> {
    unsafe {
        let len_with_nul = slice.len() + 1;
        let ptr = HeapAlloc(
            GetProcessHeap()?,
            HEAP_ZERO_MEMORY,
            len_with_nul * mem::size_of::<u16>(),
        );

        if ptr.is_null() {
            return Err(E_OUTOFMEMORY.into());
        }

        {
            let input = slice.as_bytes();
            let output = slice::from_raw_parts_mut(ptr as *mut u16, len_with_nul);
            let written = MultiByteToWideChar(
                CP_UTF8,
                MB_ERR_INVALID_CHARS | MB_PRECOMPOSED,
                input,
                output,
            );

            // TODO: MultiByteToWideChar error handling

            output[written as usize] = 0;
        }

        Ok(ptr as *const u16)
    }
}

pub fn free_string(ptr: *const u16) {
    unsafe {
        if let Ok(heap) = GetProcessHeap() {
            HeapFree(heap, HEAP_ZERO_MEMORY, ptr as *const c_void);
        }
    }
}

@tim-weis
Copy link
Contributor

tim-weis commented Apr 23, 2022

You can't know, but you can have a good estimate.

You can ask MultiByteToWideChar to calculate the required output buffer size, turning the estimate into an accurate quantity. This would require passing ptr::null_mut(), 0 as the final two arguments.

Not currently supported by the windows crate, that translates the lpWideCharStr/cchWideChar-pair into a &mut [u16] slice. I don't know whether this is caused by insufficient/incorrect metadata information or an oversight in the code generator (or a combination of both).

Luckily, you can also ask Rust to calculate the resulting UTF-16 string length, using its internal UTF-16 encoder implementation (encode_utf16().count()). This doesn't incur any heap allocations either.

While I cannot see an implementation using either of the above to precalculate the required final buffer size that wouldn't add some complexity, saving a (transient) heap allocation is a pretty substantial justification for doing so (if added complexity is indeed a consequence, and I'm not just blind in seeing the solution).

@AronParker
Copy link
Contributor Author

AronParker commented Apr 23, 2022

You can ask MultiByteToWideChar to calculate the required output buffer size, turning the estimate into an accurate quantity.

That's true of course! I meant it in the way of "you can't know unless you try" while MultiByteToWideChar certainly doesn't write anything when passing ptr:null_mut(), 0 as arguments, it still decodes UTF8 to UTF16 and returns the number of UTF-16 characters. So you'd effectively decode it twice, which arguably might still be better than allocation twice, but still incurs a performance cost.

In light of these strings primarily used ephemerally, I do believe it is acceptable to "over-allocate" to the upper bound and just do the UTF-8 to UTF-16 transcode process once and be done with it, no additional copies, encodings and allocations.

Additionally, when allocating the memory from a zero initialized heap we get two advantages:

  • We can use the safe MultiByteToWideChar function by making a slice to the newly allocated memory. This wouldn't normally be allowed, as the memory must point to properly initialized values, but since we know it is safely initialized to zero we can.
  • The string is implicitly null terminated if we ensure that we allocate the upper bound + 1

@kennykerr
Copy link
Collaborator

There's also the approach that HSTRING takes:

https://github.com/microsoft/windows-rs/blob/master/crates/libs/windows/src/core/hstring.rs

Ideally, we can just have a single approach. We could do something similar for PCWSTR to avoid the double allocation.

@AronParker
Copy link
Contributor Author

There's also the approach that HSTRING takes:

That's a great idea! I'd heavily favour this one as it is most elegant and allows us to stay inside rust code. I've attempted to implement it: #1713

@kennykerr kennykerr added the enhancement New feature or request label Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants