-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utf16_units
#14613
Add utf16_units
#14613
Conversation
@@ -720,7 +720,7 @@ fn with_envp<T>(env: Option<&[(CString, CString)]>, cb: |*mut c_void| -> T) -> T | |||
let kv = format!("{}={}", | |||
pair.ref0().as_str().unwrap(), | |||
pair.ref1().as_str().unwrap()); | |||
blk.push_all(kv.to_utf16().as_slice()); | |||
blk.extend(kv.to_utf16()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this method mean to be extend(kv.utf16_iter())
?
Out of curiosity, what does the perf look like with an iterator vs a hard-coded |
extra: u16 | ||
} | ||
|
||
impl<'a> Iterator<u16> for Utf16Codeunits<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this override size_hint as well?
Is there any special reason why |
On the other hand, a UTF-16 vector does not have the same memory layout as a string, the |
/// External iterator for a string's UTF16 codeunits. | ||
/// Use with the `std::iter` module. | ||
#[deriving(Clone)] | ||
pub struct Utf16Codeunits<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utf16CodeUnits would be more consistent with Rust's style guide
Fixed tests, squashed and rebased. ping @huonw |
It feels unnecessary to have both |
This PR is doing this already: deprecation is how functions are removed (deprecated things are properly deleted a month or so after they were deprecated).
Using
This is a good idea. Theoretically both of these can just be iterator adaptors, that is fn from_utf16<I: Iterator<u16>>(it: I) -> FromUtf16<I> { ... }
fn to_utf16<I: Iterator<char>>(it: I) -> ToUtf16<I> { ... }
impl<I: Iterator<u16>> Iterator<char> for FromUtf16<I> { ... }
impl<I: Iterator<char>> Iterator<u16> for ToUtf16<I> { ... } and this would then be called like |
(BTW, I edited a [breaking-change] description into the PR text, since this "removes" a method from strings.) |
Closes #14358. The tests are not yet moved to `utf16_iter`, so this probably won't compile. I'm submitting this PR anyway so it can be reviewed and since it was mentioned in #14611. This deprecates `.to_utf16`. `x.to_utf16()` should be replaced by either `x.utf16_iter().collect::<Vec<u16>>()` (the type annotation may be optional), or just `x.utf16_iter()` directly, if it can be used in an iterator context. [breaking-change] cc @huonw
@huonw r? |
let (low, high) = self.chars.size_hint(); | ||
// we could be entirely valid surrogates (2 elements per | ||
// char), or entirely non-surrogates (1 element per char) | ||
(low / 2, high) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised this is wrong: it should be (low, high * 2)
, since every char
gets either one u16
or two u16
, meaning the length of this iterator is between 1 or 2 times as long as the underlying one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and it needs to use CheckedMul
(and I guess it needs to handle the Option
): high.and_then(|n| n.checked_mul(&2))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, that's embarrassing... I've pushed a fix.
This is tricky, since it touches a bunch of Windows-related things in @huonw r? Sorry for spamming about this, didn't anticipate the libnative errors... |
Shouldn't this be called utf16() or something like utf16_units() without _iter in the name? |
I asked on IRC. r=me if the name changes to |
This deprecates `.to_utf16`. `x.to_utf16()` should be replaced by either `x.utf16_units().collect::<Vec<u16>>()` (the type annotation may be optional), or just `x.utf16_units()` directly, if it can be used in an iterator context. Closes rust-lang#14358 [breaking-change]
Closes #14358. ~~The tests are not yet moved to `utf16_iter`, so this probably won't compile. I'm submitting this PR anyway so it can be reviewed and since it was mentioned in #14611.~~ EDIT: Tests now use `utf16_iter`. This deprecates `.to_utf16`. `x.to_utf16()` should be replaced by either `x.utf16_iter().collect::<Vec<u16>>()` (the type annotation may be optional), or just `x.utf16_iter()` directly, if it can be used in an iterator context. [breaking-change] cc @huonw
Yay, finally! |
Closes #14358.
The tests are not yet moved toEDIT: Tests now useutf16_iter
, so this probably won't compile. I'm submitting this PR anyway so it can be reviewed and since it was mentioned in #14611.utf16_iter
.This deprecates
.to_utf16
.x.to_utf16()
should be replaced by eitherx.utf16_iter().collect::<Vec<u16>>()
(the type annotation may be optional), or justx.utf16_iter()
directly, if it can be used in an iterator context.[breaking-change]
cc @huonw