Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results for binary search on large arrays due to miscomputation of midpoint #261

Open
yurivish opened this issue Aug 30, 2022 · 11 comments

Comments

@yurivish
Copy link

yurivish commented Aug 30, 2022

I think I've spotted a bug in d3's binary search implementations that arises due to the use of unsigned bit shifts for the midpoint computation, which is a good approach when the numbers are actually unsigned integers, but not when they're floats:

https://observablehq.com/d/229d8bcf448308b9

If I've understood the problem correctly, the fix is simple, replacing

const mid = (lo + hi) >>> 1;

in bisector.left and bisector.right with

const mid = Math.floor(lo + (hi - lo) / 2)
@Fil
Copy link
Member

Fil commented Aug 31, 2022

@Fil
Copy link
Member

Fil commented Aug 31, 2022

Fun! Does this manifest in practice? The largest array I've been able to generate has 2,145,386,496 Uint8 elements, and it seems that d3 bisects it correctly.

@yurivish
Copy link
Author

yurivish commented Aug 31, 2022

In Safari the following code freezes the tab, which I think is compatible with the idea described here.

image

a = new Uint8Array(2147483649)

d3.bisectRight(a, 0)

https://github.com/d3/d3-array/blob/main/src/bisector.js#L34-L44

The comparison will always be zero, so lo will be incremented, even though mid is no longer in [lo, hi).

@Fil
Copy link
Member

Fil commented Aug 31, 2022 via email

@yurivish
Copy link
Author

Confirmed with Safari's debugger – after a few iterations, e is the left endpoint, i is the (exclusive) right endpoint, and r is the midpoint (zero!)

image

@Fil
Copy link
Member

Fil commented Aug 31, 2022 via email

@yurivish
Copy link
Author

yurivish commented Aug 31, 2022

All good questions.

It looks like this isn't a common issue right now, but given the direction things are heading I wouldn't be surprised if Chrome lifts the limit at some point in the next few years. Binary search has a rich history of bugs like this, which weren't discovered until many years later when data sizes increased. :)

I took a quick look and d3 uses the x >>> 1 idiom in other places, some of which would probably want a similar update.

FWIW I noticed this while experimenting with techniques for interactively zooming and panning ~billion-point datasets on the client side, which is a fairly niche use case right now.

@yurivish
Copy link
Author

yurivish commented Nov 26, 2022

Actually, I think we could just use

const mid = lo + (hi - lo) >>> 1;

as a strict improvement, which will work correctly for a larger range of values.

I was also reminded today of the existence of Math.trunc, which would work for this case and is said to be faster than Math.floor, though I haven't benchmarked it:

const mid = lo + Math.trunc((hi - lo) / 2);

@mbostock
Copy link
Member

mbostock commented Dec 3, 2022

Is there any reason not to use the simpler form?

const mid = Math.trunc((lo + hi) / 2);

I mean sure, you could run out of integer precision when adding lo + hi, but that should only happen if the combined index is greater than or equal to 2^53 (9,007,199,254,740,992). Which at a minimum would represent 9 petabytes of memory. And if d3-array is being used on a supercomputer of that size… well, they’d probably find other problems lol.

@yurivish
Copy link
Author

yurivish commented Dec 3, 2022

That's a good point. I don't see any issues with the simpler form.

Which at a minimum would represent 9 petabytes of memory. And if d3-array is being used on a supercomputer of that size… well, they’d probably find other problems lol.

😄

@mbostock
Copy link
Member

mbostock commented Dec 7, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants