-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Char::to_{lower,upper}case
should return Option<&'static str>
instead of char
#20333
Labels
A-collections
Area: `std::collection`
Comments
Another possibility is to make the |
bors
added a commit
that referenced
this issue
Jan 5, 2015
cc #19260 The casing transformations are left unstable (it is highly likely to be better to adopt the proper non-1-to-1 case mappings, per #20333) as are `is_xid_*`. I've got a little todo list in the last commit of things I thought about/was told about that I haven't yet handled (I'd also like some feedback).
bors
added a commit
that referenced
this issue
Mar 10, 2015
This commit performs another pass over the `std::char` module for stabilization. Some minor cleanup is performed such as migrating documentation from libcore to libunicode (where the `std`-facing trait resides) as well as a slight reorganiation in libunicode itself. Otherwise, the stability modifications made are: * `char::from_digit` is now stable * `CharExt::is_digit` is now stable * `CharExt::to_digit` is now stable * `CharExt::to_{lower,upper}case` are now stable after being modified to return an iterator over characters. While the implementation today has not changed this should allow us to implement the full set of case conversions in unicode where some characters can map to multiple when doing an upper or lower case mapping. * `StrExt::to_{lower,upper}case` was added as unstable for a convenience of not having to worry about characters expanding to more characters when you just want the whole string to get into upper or lower case. This is a breaking change due to the change in the signatures of the `CharExt::to_{upper,lower}case` methods. Code can be updated to use functions like `flat_map` or `collect` to handle the difference. [breaking-change] Closes #20333
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
These two methods should not be stabilized as-is. They should be changed to return a variable number of code points (between one and three), per Unicode’s
SpecialCasing.txt
.Such results could be represented as
&'static str
or&'static [char]
slices of a static table in libunicode. The former avoid re-encoding to UTF-8 when accumulating results in aString
. To avoid having an entry in that table for every one of the 1114111 code points, the return type could be anOption
, whereNone
means that the code point is unchanged by the mapping. (This is by large the common case.) Or it could be a new special-purpose type likeenum CaseMappingResult { Unchanged, MappedTo(&'static str) }
.Since the
Char
methods become less convenient to use, there should bestr::to_{lower,upper}case() -> String
wrappers.SpecialCasing.txt
also defines some language-sensitive mappings for Turkish and Lithuanian, but I suggest not including them, for a few reasons:Using the system’s locale is a very bad idea. Programs behaving differently on different systems is a source of countless bugs, and the system’s locale may not even be that of the end users (e.g for server-side software.)
Forcing users to specify a language is counter-productive since it might often end up being hard-coded to English or something. There should be a default.
Users who do care about language-specific tailoring may want to do more anyway.
SpecialCasing.txt
says:Finally, there are conditional mappings that depend on the context of surrounding code points, but not on the language. They could be special cases in the
str
methods, but I don’t know if it’s worth the bother since there is currently only one such special case. (Greek capital sigma at the end of a word.)More background on Unicode case mappings:
http://unicode.org/faq/casemap_charprop.html
http://www.unicode.org/reports/tr44/tr44-14.html#Casemapping
CC @huonw, @aturon
The text was updated successfully, but these errors were encountered: