-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: fix lastIndexOf and indexOf in various edge cases #6511
buffer: fix lastIndexOf and indexOf in various edge cases #6511
Conversation
You beat me to it. Had the fix ready to push lol. |
@jasnell … sorry. :-/ Sure. |
fb4ec11
to
2d5f775
Compare
Updated it. CI: https://ci.nodejs.org/job/node-test-commit/3131/ |
Having trouble getting to github from my laptop so I can't test your patch just yet. Can you give this one a try with your patch? var b = Buffer.from([1,2]);
b.lastIndexOf('€a', 0, 'ucs2') |
$ ./node
> var b = Buffer.from([1,2]);
undefined
> b.lastIndexOf('€a', 0, 'ucs2')
-1 And yep, without this patch the same code crashes Node. |
Ok. Might want to add a few tests with characters that definitely expand out to multi-bytes under utf8 also. Just to be extra safe. |
Looking good so far. @trevnorris ... PTAL |
Hmmmm, you’re right – but the other way around: Multi-byte characters with UTF-8 are fine, but ASCII characters with UTF-16 fail because |
Yep, I was just noticing the same thing. In v5.11.0 the following dies:
|
Updated again, assuming |
+1 ... I'm going to look at this a bit more in the morning to find more ways of breaking this. It definitely looks like v5.x is having some related issues also so we'll need to look at backporting. I haven't tested v4 yet but I suspect it's an issue there also.
|
Do that, please. That specific last one you posted would be fixed with this, too, but it seems to imply that there’s something more going wrong, doesn’t it? In any case, you have write perms to my repo if you want to add something here. |
Yeah, that's what I'm afraid of. Really appreciate you looking at this also |
26596a1
to
dc4b946
Compare
@jasnell Heh, that last one you found is funny. Node v5.11.0: Buffer.from("abc").indexOf("ab","ucs2") === 2
Buffer.from("abcd").indexOf("ab","ucs2") === -1
Buffer.from("abcde").indexOf("ab","ucs2") === 4
Buffer.from("abcdef").indexOf("ab","ucs2") === -1
Buffer.from("abcdefg").indexOf("ab","ucs2") === 6 This seems to be because the Updating again with a fix for that, too, and reordered the commits so that the ones that may be eligible for backporting come first in this PR. |
dc4b946
to
414877c
Compare
nice. well, at least it's consistently wrong :-) when I look at this more tomorrow I'm definitely going to test things out on v4 also. I think you're right that this has likely been a problem for quite some time. |
414877c
to
a06edb1
Compare
@addaleax ... ok, ran this through a few paces and couldn't find anywhere else it breaks down. The abort on that ascii/ucs2 case definitely happens in v4 also so we'll need to backport at least part of this to both v5 and v4. Also can confirm that the following fails in v4:
|
The first two commits here cherry-pick cleanly to the v4.x and v5.x branches, so yep, sounds like a plan. |
Awesome. keep in mind tho that |
// Extended latin-1 characters are 2 bytes in Utf8. | ||
const size_t needle_length = | ||
enc == BINARY ? needle->Length() : needle->Utf8Length(); | ||
// Round down to the neares multiple of 2 in case of UCS2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/neares/nearest
Fix `buffer.indexOf` for the case that the haystack has odd length and the needle is not found in it. `StringSearch()` would return the length of the buffer in multiples of `sizeof(uint16_t)`, but checking that against `haystack_length` would not work if the latter one was odd. PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Fix `buffer.lastIndexOf()` for the case that the first character of the needle is contained in the haystack, but in a location that makes it impossible to be part of a full match. For example, when searching for 'abc' in 'abcdef', only the 'cdef' part needs to be searched for 'c', because earlier locations can be excluded by index calculations alone. Previously, such a search would result in an assertion failure. This applies only to Node.js v6, as `lastIndexOf` was added in it. PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Return -1 in `Buffer.lastIndexOf` if the needle is longer than the haystack. The previous check only tested the corresponding condition for forward searches. This applies only to Node.js v6, as `lastIndexOf` was added in it. Fixes: #6510 PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Use `StringBytes::Size` to determine the needle string length instead of assuming latin-1 or UTF-8. Previously, `Buffer.indexOf` could fail with an assertion failure when the needle's byte length, but not its character count, exceeded the haystack's byte length. PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Fix `buffer.indexOf` for the case that the haystack has odd length and the needle is not found in it. `StringSearch()` would return the length of the buffer in multiples of `sizeof(uint16_t)`, but checking that against `haystack_length` would not work if the latter one was odd. PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
- **buffer**: fix lastIndexOf and indexOf in various edge cases (Anna Henningsen) [#6511](#6511) - **child_process**: use /system/bin/sh on android (Ben Noordhuis) [#6745](#6745) - **deps**: - upgrade npm to 3.8.9 (Rebecca Turner) [#6664](#6664) - upgrade to V8 5.0.71.47 (Ali Ijaz Sheikh) [#6572](#6572) - upgrade libuv to 1.9.1 (Saúl Ibarra Corretgé) [#6796](#6796) - Intl: ICU 57 bump (Steven R. Loomis) [#6088](#6088) - **repl**: - copying tabs shouldn't trigger completion (Eugene Obrezkov) [#5958](#5958) - exports `Recoverable` (Blake Embrey) [#3488](#3488) - **src**: add O_NOATIME constant (Rich Trott) [#6492](#6492) - **src,module**: add --preserve-symlinks command line flag (James M Snell) [#6537](#6537) - **util**: adhere to `noDeprecation` set at runtime (Anna Henningsen) [#6683](#6683)
- **buffer**: fix lastIndexOf and indexOf in various edge cases (Anna Henningsen) [#6511](#6511) - **child_process**: use /system/bin/sh on android (Ben Noordhuis) [#6745](#6745) - **deps**: - upgrade npm to 3.8.9 (Rebecca Turner) [#6664](#6664) - upgrade to V8 5.0.71.47 (Ali Ijaz Sheikh) [#6572](#6572) - upgrade libuv to 1.9.1 (Saúl Ibarra Corretgé) [#6796](#6796) - Intl: ICU 57 bump (Steven R. Loomis) [#6088](#6088) - **repl**: - copying tabs shouldn't trigger completion (Eugene Obrezkov) [#5958](#5958) - exports `Recoverable` (Blake Embrey) [#3488](#3488) - **src**: add O_NOATIME constant (Rich Trott) [#6492](#6492) - **src,module**: add --preserve-symlinks command line flag (James M Snell) [#6537](#6537) - **util**: adhere to `noDeprecation` set at runtime (Anna Henningsen) [#6683](#6683) As of this release the 6.X line now includes 64-bit binaries for Linux on Power Systems running in big endian mode in addition to the existing 64-bit binaries for running in little endian mode. PR-URL: #6810
- **buffer**: fix lastIndexOf and indexOf in various edge cases (Anna Henningsen) [#6511](#6511) - **child_process**: use /system/bin/sh on android (Ben Noordhuis) [#6745](#6745) - **deps**: - upgrade npm to 3.8.9 (Rebecca Turner) [#6664](#6664) - upgrade to V8 5.0.71.47 (Ali Ijaz Sheikh) [#6572](#6572) - upgrade libuv to 1.9.1 (Saúl Ibarra Corretgé) [#6796](#6796) - Intl: ICU 57 bump (Steven R. Loomis) [#6088](#6088) - **repl**: - copying tabs shouldn't trigger completion (Eugene Obrezkov) [#5958](#5958) - exports `Recoverable` (Blake Embrey) [#3488](#3488) - **src**: add O_NOATIME constant (Rich Trott) [#6492](#6492) - **src,module**: add --preserve-symlinks command line flag (James M Snell) [#6537](#6537) - **util**: adhere to `noDeprecation` set at runtime (Anna Henningsen) [#6683](#6683) As of this release the 6.X line now includes 64-bit binaries for Linux on Power Systems running in big endian mode in addition to the existing 64-bit binaries for running in little endian mode. PR-URL: #6810
Use `StringBytes::Size` to determine the needle string length instead of assuming latin-1 or UTF-8. Previously, `Buffer.indexOf` could fail with an assertion failure when the needle's byte length, but not its character count, exceeded the haystack's byte length. PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Fix `buffer.indexOf` for the case that the haystack has odd length and the needle is not found in it. `StringSearch()` would return the length of the buffer in multiples of `sizeof(uint16_t)`, but checking that against `haystack_length` would not work if the latter one was odd. PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
PR-URL: #6511 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Notable changes: * **buffer**: * Buffer no longer errors if you call lastIndexOf with a search term longer than the buffer (Anna Henningsen) #6511 * deps: * update npm to 2.15.5 (Rebecca Turner) #6663 * http: * Invalid status codes can no longer be sent. Limited to 3 digit numbers between 100 - 999 (Brian White) #6291
👍 good work |
Notable changes: * **buffer**: * Buffer no longer errors if you call lastIndexOf with a search term longer than the buffer (Anna Henningsen) #6511 * contextify: * Context objects are now properly garbage collected, this solves a problem some individuals were experiencing with extreme memory growth (Ali Ijaz Sheikh) #6871 * deps: * update npm to 2.15.5 (Rebecca Turner) #6663 * http: * Invalid status codes can no longer be sent. Limited to 3 digit numbers between 100 - 999 (Brian White) #6291
Notable changes: * **buffer**: * Buffer no longer errors if you call lastIndexOf with a search term longer than the buffer (Anna Henningsen) #6511 * contextify: * Context objects are now properly garbage collected, this solves a problem some individuals were experiencing with extreme memory growth (Ali Ijaz Sheikh) #6871 * deps: * update npm to 2.15.5 (Rebecca Turner) #6663 * http: * Invalid status codes can no longer be sent. Limited to 3 digit numbers between 100 - 999 (Brian White) #6291
Notable changes: * **buffer**: * Buffer no longer errors if you call lastIndexOf with a search term longer than the buffer (Anna Henningsen) #6511 * contextify: * Context objects are now properly garbage collected, this solves a problem some individuals were experiencing with extreme memory growth (Ali Ijaz Sheikh) #6871 * deps: * update npm to 2.15.5 (Rebecca Turner) #6663 * http: * Invalid status codes can no longer be sent. Limited to 3 digit numbers between 100 - 999 (Brian White) #6291
Notable changes: * **buffer**: * Buffer no longer errors if you call lastIndexOf with a search term longer than the buffer (Anna Henningsen) #6511 * contextify: * Context objects are now properly garbage collected, this solves a problem some individuals were experiencing with extreme memory growth (Ali Ijaz Sheikh) #6871 * deps: * update npm to 2.15.5 (Rebecca Turner) #6663 * http: * Invalid status codes can no longer be sent. Limited to 3 digit numbers between 100 - 999 (Brian White) #6291
Checklist
Affected core subsystem(s)
buffer
Description of change
Return -1 in
Buffer.lastIndexOf
if the needle is longer than the haystack. The previous check only tested the corresponding condition for forward searches.Fixes: #6510