[NO QA] Fix random numbers again #10700

tylerkaraszewski · 2022-08-30T19:51:19Z

A bit more testing required. Feel free to review.

Thread is here: https://expensify.slack.com/archives/C02HWMSMZEC/p1661439005224499

Details

Upon discussion with @quinthar, I got to testing the existing implementation here and found it resulting in a significant number of collisions (5707 collisions in 20,000,000 IDs).

David had a suggestion for a simpler implementation. it seems like it does almost exactly the same thing but does not result in the very high number of collisions. I have not yet discovered the problem with the older algorithm that resulted in the collisions.

Let me explain the new code:

return (Math.floor(Math.random() * (2 ** 21)) * (2 ** 32)) + Math.floor(Math.random() * (2 ** 32));

We take:

Math.floor(Math.random() * (2 ** 21)) * (2 ** 32)

And add:

Math.floor(Math.random() * (2 ** 32))

And return it.

The second part is pretty straightforward, we add a random 32-bit number. What about the first part? First we generate a 21-bit number (we're aiming for 53 total bits, remember. 32 + 21 = 53):

Math.random() * (2 ** 21))

Then we multiply by:

(2 ** 32)

This has the effect of moving the 21 bits we generated 32 places to the left. If you're bad at binary math, think of it with decimal. Let's say we want an 8-digit number, but we can only generate 6 digits at a time (because 32 digits is a lot to type). Remember that a bit is a binary digit.

Anyway, we want an 8 digit decimal number. First we generate a two digit decimal number:

let randomNum = Math.random() * 100; // say we get 42.

Now we over it over to the left by 6 places:

randomNum *= (10 ** 6);

10^6 is 1,000,000. Multiplying our example value 42 by 1000000 gives 42000000. See, we've moved 42 6 places to the left. Now binary is the same, except instead of raising 10 to the 6th power for six digits, we raise 2 to the 32nd power for 32 binary digits.

Then we add the rest to the end. Generate a second 6-digit number:

randomNum += Math.random() * 1000000;

It doesn't matter what number this generates, it will fit in the six 0s at the right of randomNum, so we can just add it to 42,000,000 and we'll have some random 8-digit decimal number.

We've done the exact same thing in binary but with longer numbers.

Fixed Issues

$ https://github.com/Expensify/Expensify/issues/224792

Tests

Verify that no errors appear in the JS console

Here's the elaborate testing process that was used for this:

Using the following test page, generate 1,000,000 random reportIDs in chrome. Repeat this 20x by refreshing, each time copying the output to the end of a text file:

<!DOCTYPE html>
<html> <head>
<script type="text/javascript">
function generateReportID() {
    return (Math.floor(Math.random() * (2 ** 21)) * (2 ** 32)) + Math.floor(Math.random() * (2 ** 32));
}

// generate a million random 53-bit values.
let result = [];
for (var i = 0; i < 1000000;i++) {
    result.push(generateReportID());
}

window.onload = function() {
    let r = "";
    for (let i = 0; i < 1000000; i++) {
        r += result[i] + "\n";
    }
    document.getElementById("randomhere").innerText = r;
}
</script> </head> <body> <pre id="randomhere"> </pre> </body> </html>

I saved the output in final_20m_test.txt. I was going to attach it here, but it's 337mb, so I put it on bastion1.sjc if you want to get it, it's in /home/tyler/final_20m_test.txt.zip for you to look at.

Then run the following command to sort the output from your random number generator:

> sort final_20m_test.txt > final_20m_test_sorted.txt

This takes a few minutes (it's 337mb of text) and saves all the ids in (lexically, not numerically, but it doesn't matter) sorted order.

Once sorted, run it through uniq:

> uniq -d final_20m_test_sorted.txt

This will output any line that occurs twice. If there's no output, there's no collisions. The above set I generated has no collisions. I've done this twice to verify. You're welcome to repeat.

Note: this has only been tried with Chrome's Math.random() and nothing is guaranteed from any other implementation, but this seems to work correctly.

Simple test run making sure web doesn't seem broken:

PR Review Checklist

Contributor (PR Author) Checklist

PR Reviewer Checklist

The Contributor+ will copy/paste it into a new comment and complete it after the author checklist is completed

QA Steps

Verify that no errors appear in the JS console

Screenshots

Web

Mobile Web

Desktop

iOS

Android

mountiny

This looks way simpler and great write up.

Definitely curious to see where the previous approach fails.

roryabraham

Also, is this implementation is vulnerable to something I just learned about called a "modulo bias"?

roryabraham · 2022-08-30T23:26:47Z

src/libs/ReportUtils.js

-    }
-
-    return result;
+    return (Math.floor(Math.random() * (2 ** 21)) * (2 ** 32)) + Math.floor(Math.random() * (2 ** 32));


I'd like to see this moved over to NumberUtils.js instead of ReportUtils.js

We could do that, but it really shouldn't be used for anything but report ID's, everything else will be 64-bits. I don't have a hugely strong opinion, but leaving it here at least implies not to use it for other IDs.

We'll be using it for IOUReportIDs too, since those are integers as well.

I agree that leaving it in reportUtils is fine since it's only used for reportIDs

tylerkaraszewski · 2022-08-31T07:11:29Z

Also, is this implementation is vulnerable to something I just learned about called a "modulo bias"?

* https://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator/10984975#10984975

* https://crypto.stackexchange.com/questions/39186/what-does-it-mean-for-a-random-number-generator-to-be-cryptographically-secure/39188#39188?newreg=6e4d42032f4f452f8257c311c81413bd

* https://pthree.org/2018/06/13/why-the-multiply-and-floor-rng-method-is-biased/

* https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/

Yes, probably. It's probably vulnerable to probably most RNG problems. It's specifically not cryptographically secure per the spec for Math.random(), which is why Crypto.getRandomValues() exists.

In theory, if we generate 32 bit numbers and only ever multiply them by even powers of two, then the modulo problem shouldn't be a thing. I.e., if we generate an underlying 32 bit random number and scale it to fit in 21 bits (like we do here), then this shouldn't be an issue, since these numbers always divide into each other easily. For example, 2^21 / 2^32 = exactly 2048. Each number in a 2^21 range should have exactly the same chance (2048 of the total 4294967296 possible options) of getting generated from 32 bits of randomness.

But we don't know the details of every Math.random() implementation we're using, so we're not really sure. It shouldn't matter critically that we have slightly less than perfectly even distributions here, since the amount of possible options is still so high. If you have a megajillion options, and some of them have a 90-bazillion/megajillion chance of getting chosen, and the rest of them have a (90-bazillion - 1)/megajillion it's not that huge of a deal.

There's lots of details to care about with RNGs, but this 53-bit generator just has to be good enough to get us through until we can deprecate oldDot with the fewest number of collisions in the meantime (then we need to worry about our 64-bit generator).

The "real" solution is probably to use Crypto.getRandomValues() but Hermes currently doesn't support it. We can switch to it later though easily enough, or switch implementations that support it.

The real goal here is just to be able to generate numbers that don't overlap for a year or two, so we don't need to be as good as NSA encryption keys.

Julesssss

Thanks for explaining!

roryabraham · 2022-08-31T07:35:26Z

Yes, thank's for explaining!

In theory, if we generate 32 bit numbers and only ever multiply them by even powers of two, then the modulo problem shouldn't be a thing

So a potential problem is that we don't know how many bits are used in Math.random across JS engines here? Seems like it's 32 on chrome but we don't know for sure in other browsers ... could be 53?

OSBotify · 2022-08-31T07:36:58Z

✋ This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release.

tylerkaraszewski · 2022-08-31T07:46:20Z

So a potential problem is that we don't know how many bits are used in Math.random across JS engines here? Seems like it's 32 on chrome but we don't know for sure in other browsers ... could be 53?

We're basically basing our data on this 7-year-old SO post: https://stackoverflow.com/questions/3344447/precision-of-math-random#:~:text=It's%20browser%2FJavaScript%20engine%20dependent,16%20decimals%2C%20see%20Sly1024's%20answer

There's not really a way to know without looking at the source for Math.random() in each implementation.

tylerkaraszewski · 2022-08-31T07:47:51Z

There's actually a comment in there saying:

Chrome seems to have upgraded their precision, I can no longer observe the 21 consistently 0 bits. –
Daniel Vestøl
Aug 11, 2019 at 16:25

That might be true, I haven't looked at recent Chrome source.

OSBotify · 2022-08-31T09:32:12Z

🚀 Deployed to staging by @roryabraham in version: 1.1.95-0 🚀

platform	result
🤖 android 🤖	success ✅
🖥 desktop 🖥	success ✅
🍎 iOS 🍎	success ✅
🕸 web 🕸	success ✅

This seems to work with one round of testing

3665c27

tylerkaraszewski requested a review from a team as a code owner August 30, 2022 19:51

melvin-bot bot requested review from ctkochan22 and removed request for a team August 30, 2022 19:51

tylerkaraszewski requested a review from roryabraham August 30, 2022 19:56

tylerkaraszewski assigned luacmartins Aug 30, 2022

tylerkaraszewski requested review from luacmartins, flodnv, stitesExpensify and Julesssss and removed request for ctkochan22 August 30, 2022 19:56

tylerkaraszewski added 2 commits August 30, 2022 21:59

Appease the linter

57debb2

Appease the linter

73071fd

tylerkaraszewski assigned tylerkaraszewski and unassigned luacmartins Aug 30, 2022

tylerkaraszewski changed the title ~~[WIP] Fix random numbers again~~ Fix random numbers again Aug 30, 2022

tylerkaraszewski changed the title ~~Fix random numbers again~~ [NO QA] Fix random numbers again Aug 30, 2022

mountiny approved these changes Aug 30, 2022

View reviewed changes

roryabraham reviewed Aug 31, 2022

View reviewed changes

tgolen approved these changes Aug 31, 2022

View reviewed changes

luacmartins approved these changes Aug 31, 2022

View reviewed changes

stitesExpensify approved these changes Aug 31, 2022

View reviewed changes

roryabraham approved these changes Aug 31, 2022

View reviewed changes

roryabraham merged commit 928edfc into main Aug 31, 2022

roryabraham deleted the tyler-fix-random-ids-again branch August 31, 2022 07:33

Julesssss reviewed Aug 31, 2022

View reviewed changes

OSBotify mentioned this pull request Aug 31, 2022

Deploy Checklist: New Expensify 2022-08-31 #10712

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NO QA] Fix random numbers again #10700

[NO QA] Fix random numbers again #10700

tylerkaraszewski commented Aug 30, 2022 •

edited

Loading

mountiny left a comment

roryabraham left a comment

roryabraham Aug 30, 2022

tylerkaraszewski Aug 31, 2022

luacmartins Aug 31, 2022

stitesExpensify Aug 31, 2022

tylerkaraszewski commented Aug 31, 2022

Julesssss left a comment

roryabraham commented Aug 31, 2022

OSBotify commented Aug 31, 2022

tylerkaraszewski commented Aug 31, 2022

tylerkaraszewski commented Aug 31, 2022

OSBotify commented Aug 31, 2022

[NO QA] Fix random numbers again #10700

[NO QA] Fix random numbers again #10700

Conversation

tylerkaraszewski commented Aug 30, 2022 • edited Loading

Details

Fixed Issues

Tests

PR Review Checklist

Contributor (PR Author) Checklist

PR Reviewer Checklist

QA Steps

Screenshots

Web

Mobile Web

Desktop

iOS

Android

mountiny left a comment

Choose a reason for hiding this comment

roryabraham left a comment

Choose a reason for hiding this comment

roryabraham Aug 30, 2022

Choose a reason for hiding this comment

tylerkaraszewski Aug 31, 2022

Choose a reason for hiding this comment

luacmartins Aug 31, 2022

Choose a reason for hiding this comment

stitesExpensify Aug 31, 2022

Choose a reason for hiding this comment

tylerkaraszewski commented Aug 31, 2022

Julesssss left a comment

Choose a reason for hiding this comment

roryabraham commented Aug 31, 2022

OSBotify commented Aug 31, 2022

tylerkaraszewski commented Aug 31, 2022

tylerkaraszewski commented Aug 31, 2022

OSBotify commented Aug 31, 2022

tylerkaraszewski commented Aug 30, 2022 •

edited

Loading