-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should multicharacter parameters do? #3
Comments
It seems to me like there's a simple answer here: instead of taking trim characters (which then begs the question of, what about code units? grapheme clusters? etc), why not follow string padding and take a trim string? See https://tc39.es/ecma262/#sec-stringpad - iow, whatever string is passed in is just repeated to pad the string. Similarly, the trim methods could take a trim string that just removes all copies of that complete string. |
Personally, I'm ok with that. It is more intuitive. And honestly, I would think so when I saw this. But It would cause more difference with other languages and existed libs. When you have some experience about other languages or you have used existed libs like This means that when I am a user, I want it to work like |
This is a fair point. My rebuttal would be that either option would cause confusion for different groups of people. If you wrongly assume the behavior is |
Another point about Unicode that I suppose is:
Which means we have to correctly support code point (eg: single emoji). But about combining character (eg: with skin modifier), I found this:
|
That's the code unit/code point/grapheme problem, and the language doesn't really handle graphemes holistically. |
Okay I agreed. I realized that seems there's nothing handled about code unit/code point/grapheme in string's built-in methods before. (Unfortunately) So, the Unicode should not be a part of major point of this proposal anymore. We have two options here thought:
|
A lot of discussion on this issue has already happened on this. other, unrelated issue. Refer to it for more context and information.
The question is how we should handle it when .trim/.trimStart/.trimEnd receives a string with multiple characters. Here's some options:
"abaabc".trimStart("ab") === c
(most languages seem to follow this approach)"ababac".trimStart("ab") === 'ac'
Here are some of my thoughts on the matter:
@jamiebuilds mentioned previously of an informal survey he did, where he asked Twitter followers which behavior they would expect. The overwhelming majority said they would expect option number 2. I honestly would too - I wouldn't be surprised if I had made the wrong assumption in python a couple of times, and wrote incorrect code like this
'https://example.com'.lstrip('https://')
which appears to work - until the first character of the domain starts with "h", "t", "p", or "s". it feels weird to me to treat a string as an unordered set of characters. If we want that kind of behavior, then we should have this parameter accept an optional array of characters, not a string (this means we could support both behaviors if wanted - if it's a string, do option 2, if it's an array, do option 1). It also means we can support an array of multi-character strings if wanted, thus doing both options 1 and 2 at the same time. I'm not necessarily advocating for this hybrid approach, just mentioning that it's an option.What's more, I feel like option 2 fits much better with unicode. If we're using option 1, and this function isn't unicode aware, then the following footgun may happen:
The "👋🏾" emoji is composed of four characters, two for the hand, and two for the brown skin modifier. "👌🏾" is also composed of four characters, including the exact same brown-skin modifier. Thus, half of the "👌🏾" emoji can be used to remove the brown skin from the "👋🏾" emoji.
I presume there's ways to make this unicode aware, but option 2 (or the hybrid approach) does not exhibit this issue. Plus, there's benefits to making this function not unicode aware, for example, maybe your string contains binary data, and you want
.trimEnd()
to operate on the bytes.For those who's devices don't render the above unicode characters correctly, here's a screenshot of the presented code snippet:
The text was updated successfully, but these errors were encountered: