-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix nextind bug for invalid UTF-8 #25214
Conversation
I use this code to test it:
and |
I decided that it made sense for |
Is this really useful, given that you can use |
|
Yeah, |
The whole point of
Sure, but the opposite does make sense: knowing that an index is in bounds but not knowing if it's valid. That's why these aren't perfectly coupled. |
Right.
I'm not sure. In what situations would you be sure the index is in bounds but not whether it's valid? |
Split a string in half by types at the closest character: |
After thinking about it I would support @nalimilan, i.e.:
if we were sure that in performance sensitive code Checking the sources
|
I think this is not the right design. Checking for out-of-bounds errors versus index error have very different threat models: if you accidentally access out of bounds memory, your program may crash; if you look for a character at an in-bounds invalid index, you get a garbled character value. We need to be very vigilant against the former, we do not need to be nearly so vigilant about the latter. We don't generally automatically add I also think the proposal doesn't actually protect against any realistic user error. Here are the possible modes of failure in the current design:
The first scenario we can't really do anything about – bugs in string type implementations will lead to garbled characters no matter what we do. The second scenario seems likely and we defend against it. The third scenario just seems pretty unlikely. If the user is advanced enough to use |
Thank you for a detailed explanation (actually it is very good to have it put down in writing 👍). |
Reported by @bkamins in #24420. More comprehensive tests to come next week.