`IntoParam<PCWSTR> for &str` unnecessarily allocates twice #1712

AronParker · 2022-04-22T10:30:30Z

When converting from a &str to a PCWSTR, encode_utf16().collect() is first called to translate UTF-8 to UTF-16:

windows-rs/crates/libs/windows/src/core/pcwstr.rs

Lines 46 to 50 in 5ea7a54

    
           impl<'a> IntoParam<'a, PCWSTR> for &str { 
        
               fn into_param(self) -> Param<'a, PCWSTR> { 
        
                   Param::Boxed(PCWSTR(heap_string(&self.encode_utf16().collect::<alloc::vec::Vec<u16>>()))) 
        
               } 
        
           }

This allocates a new Vec<u16> with the UTF-16 characters. That in turn is passed to heap_string:

windows-rs/crates/libs/windows/src/core/heap.rs

Lines 43 to 51 in 5ea7a54

    
           pub fn heap_string<T: Copy + Default + Sized>(slice: &[T]) -> *const T { 
        
               unsafe { 
        
                   let buffer = heap_alloc((slice.len() + 1) * std::mem::size_of::<T>()).expect("could not allocate string") as *mut T; 
        
                   assert!(buffer.align_offset(std::mem::align_of::<T>()) == 0, "heap allocated buffer is not properly aligned"); 
        
                   buffer.copy_from_nonoverlapping(slice.as_ptr(), slice.len()); 
        
                   buffer.add(slice.len()).write(T::default()); 
        
                   buffer 
        
               } 
        
           }

This function allocates a new string with len + 1, copies the existing Vec into it and appends a null terminating character. This could be simplified by directly allocating a new UTF-16 Vec with space for all UTF-8 units + 1 and doing the UTF-8 to UTF-16 translation in place. This saves one copy and one allocation.

The text was updated successfully, but these errors were encountered:

ryancerium · 2022-04-22T16:40:44Z

I don't think you could know the length of the UTF-16 array until you iterate the result of encode_utf16(), which would then require iterating it again to do the actual copy. Converting and iterating twice is probably cheaper than two (or more) heap allocations and copies, but I'm not sure - I'm not an expert on UTF conversions. The collect() call might cause the vector to have to do multiple reallocations and copies to store increasing amounts of data, i.e. multiple KB or MB or GB (!) of string data.

AronParker · 2022-04-22T17:34:17Z

I don't think you could know the length of the UTF-16 array until you iterate the result of encode_utf16(), which would then require iterating it again to do the actual copy. Converting and iterating twice is probably cheaper than two (or more) heap allocations and copies, but I'm not sure - I'm not an expert on UTF conversions.

You can't know, but you can have a good estimate. Most strings are ASCII, so when considering a string of N chars (UTF-8), you could allocate N wchars (UTF-16). My idea here is not to iterate twice, but to set a bound. And again, just one copy and one allocation would be made here as opposed to the two ones that exist right now.

The collect() call might cause the vector to have to do multiple reallocations and copies to store increasing amounts of data, i.e. multiple KB or MB or GB (!) of string data.

Yes the collect call does reallocate too, which is a problem on its own. But GB is out of question here, the most MSVC allows is 64K strings. Additionally these strings are used ephemerally, they're just made to be compatible with the windows API and freed afterwards either way.

kennykerr · 2022-04-22T17:38:19Z

The challenge is that you need to construct a Param::Boxed(PCWSTR(value)) to hold the temporary value, otherwise just using a Vec would be ideal.

AronParker · 2022-04-22T18:25:15Z

The challenge is that you need to construct a Param::Boxed(PCWSTR(value)) to hold the temporary value, otherwise just using a Vec would be ideal.

Can't we build a Vec::with_capacity use Vec::extend(self.encode_utf16()), leak it and then free it via the Param::Boxed function? We know it's gonna be from the Global Allocator of rust, or does the Allocator matter too?

AronParker · 2022-04-22T19:10:37Z

Here is an example of how this could work:

pub fn allocate_string(slice: &str) -> Result<*const u16> {
    unsafe {
        let len_with_nul = slice.len() + 1;
        let ptr = HeapAlloc(
            GetProcessHeap()?,
            HEAP_ZERO_MEMORY,
            len_with_nul * mem::size_of::<u16>(),
        );

        if ptr.is_null() {
            return Err(E_OUTOFMEMORY.into());
        }

        {
            let input = slice.as_bytes();
            let output = slice::from_raw_parts_mut(ptr as *mut u16, len_with_nul);
            let written = MultiByteToWideChar(
                CP_UTF8,
                MB_ERR_INVALID_CHARS | MB_PRECOMPOSED,
                input,
                output,
            );

            // TODO: MultiByteToWideChar error handling

            output[written as usize] = 0;
        }

        Ok(ptr as *const u16)
    }
}

pub fn free_string(ptr: *const u16) {
    unsafe {
        if let Ok(heap) = GetProcessHeap() {
            HeapFree(heap, HEAP_ZERO_MEMORY, ptr as *const c_void);
        }
    }
}

tim-weis · 2022-04-23T10:09:56Z

You can't know, but you can have a good estimate.

You can ask MultiByteToWideChar to calculate the required output buffer size, turning the estimate into an accurate quantity. This would require passing ptr::null_mut(), 0 as the final two arguments.

Not currently supported by the windows crate, that translates the lpWideCharStr/cchWideChar-pair into a &mut [u16] slice. I don't know whether this is caused by insufficient/incorrect metadata information or an oversight in the code generator (or a combination of both).

Luckily, you can also ask Rust to calculate the resulting UTF-16 string length, using its internal UTF-16 encoder implementation (encode_utf16().count()). This doesn't incur any heap allocations either.

While I cannot see an implementation using either of the above to precalculate the required final buffer size that wouldn't add some complexity, saving a (transient) heap allocation is a pretty substantial justification for doing so (if added complexity is indeed a consequence, and I'm not just blind in seeing the solution).

AronParker · 2022-04-23T16:14:50Z

You can ask MultiByteToWideChar to calculate the required output buffer size, turning the estimate into an accurate quantity.

That's true of course! I meant it in the way of "you can't know unless you try" while MultiByteToWideChar certainly doesn't write anything when passing ptr:null_mut(), 0 as arguments, it still decodes UTF8 to UTF16 and returns the number of UTF-16 characters. So you'd effectively decode it twice, which arguably might still be better than allocation twice, but still incurs a performance cost.

In light of these strings primarily used ephemerally, I do believe it is acceptable to "over-allocate" to the upper bound and just do the UTF-8 to UTF-16 transcode process once and be done with it, no additional copies, encodings and allocations.

Additionally, when allocating the memory from a zero initialized heap we get two advantages:

We can use the safe MultiByteToWideChar function by making a slice to the newly allocated memory. This wouldn't normally be allowed, as the memory must point to properly initialized values, but since we know it is safely initialized to zero we can.
The string is implicitly null terminated if we ensure that we allocate the upper bound + 1

kennykerr · 2022-04-24T02:54:37Z

There's also the approach that HSTRING takes:

https://github.com/microsoft/windows-rs/blob/master/crates/libs/windows/src/core/hstring.rs

Ideally, we can just have a single approach. We could do something similar for PCWSTR to avoid the double allocation.

AronParker · 2022-04-24T10:01:27Z

There's also the approach that HSTRING takes:

That's a great idea! I'd heavily favour this one as it is most elegant and allows us to stay inside rust code. I've attempted to implement it: #1713

AronParker mentioned this issue Apr 24, 2022

Avoid double allocation when passing strings via IntoParam #1713

Merged

kennykerr added the enhancement New feature or request label Apr 25, 2022

rylev closed this as completed in #1713 Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IntoParam<PCWSTR> for &str` unnecessarily allocates twice #1712

`IntoParam<PCWSTR> for &str` unnecessarily allocates twice #1712

AronParker commented Apr 22, 2022 •

edited

Loading

ryancerium commented Apr 22, 2022

AronParker commented Apr 22, 2022

kennykerr commented Apr 22, 2022

AronParker commented Apr 22, 2022

AronParker commented Apr 22, 2022 •

edited

Loading

tim-weis commented Apr 23, 2022 •

edited

Loading

AronParker commented Apr 23, 2022 •

edited

Loading

kennykerr commented Apr 24, 2022

AronParker commented Apr 24, 2022

IntoParam<PCWSTR> for &str unnecessarily allocates twice #1712

IntoParam<PCWSTR> for &str unnecessarily allocates twice #1712

Comments

AronParker commented Apr 22, 2022 • edited Loading

ryancerium commented Apr 22, 2022

AronParker commented Apr 22, 2022

kennykerr commented Apr 22, 2022

AronParker commented Apr 22, 2022

AronParker commented Apr 22, 2022 • edited Loading

tim-weis commented Apr 23, 2022 • edited Loading

AronParker commented Apr 23, 2022 • edited Loading

kennykerr commented Apr 24, 2022

AronParker commented Apr 24, 2022

`IntoParam<PCWSTR> for &str` unnecessarily allocates twice #1712

`IntoParam<PCWSTR> for &str` unnecessarily allocates twice #1712

AronParker commented Apr 22, 2022 •

edited

Loading

AronParker commented Apr 22, 2022 •

edited

Loading

tim-weis commented Apr 23, 2022 •

edited

Loading

AronParker commented Apr 23, 2022 •

edited

Loading