-
-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use NaN-boxing on value::InnerValue #4091
base: main
Are you sure you want to change the base?
Conversation
All tests in |
Still failing some tests:
Broken tests (10):
But the bigger issue that I see is performance. I ran the benchmarks locally and the performance was worse in all benchmarks, but in some up to almost 40%. I though a performance tradeoff vs lower memory usage seemed reasonable and expected for this change, but this seems way to extreme. Benchmarks before: RESULT Richards 78.5
RESULT DeltaBlue 71.8
RESULT Crypto 80.3
RESULT RayTrace 182
RESULT EarleyBoyer 227
RESULT RegExp 43.7
RESULT Splay 326
RESULT NavierStokes 187
SCORE 122 After: RESULT Richards 49.9
RESULT DeltaBlue 43.7
RESULT Crypto 77.7
RESULT RayTrace 138
RESULT EarleyBoyer 161
RESULT RegExp 38.9
RESULT Splay 214
RESULT NavierStokes 163
SCORE 92.0 |
@raskad @hansl As you probably already know, not all memory optimizations will yield (or even preserve) performance gains. It may be that way sometimes, but there's this balance between memory and speed that big memory optimizations (such as this one) will have to deal with. The first step is to make every test work, then we'd have to optimize repeatedly to recover all the lost performance, and hopefully we'll have an engine that's as fast as before but using a fraction of the memory. Having said that, we should see what the overall gains in memory are, because this should at least save a considerable amount of memory to be worth putting more work on. |
There's no rush for this, since the actual breaking change already happened :) I'll run benchmarks on my own, and I'll try to check a profiler for memory. In the next 2 weeks or so. It's also possible that I'm missing a few easy inline, and adding some bit magic might make things faster down the line. First things first, I'll make sure test262 has no regressions. |
@raskad BTW this should cover both; it should take less memory, AND be more performant. But I guess we pass JsValue more by reference than by value. Now that we can move JsValue for free, maybe we should reconsider some other parts of hte code as well. Later work. This article shows a 28% increase in performance for a tagged union of pointers, in JavaScriptCore (Mozilla's engine). Tough TBF we weren't using pointers. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4091 +/- ##
==========================================
+ Coverage 47.24% 53.85% +6.60%
==========================================
Files 476 486 +10
Lines 46892 48534 +1642
==========================================
+ Hits 22154 26136 +3982
+ Misses 24738 22398 -2340 ☔ View full report in Codecov by Sentry. |
@hansl I did a quick perf run on the |
That makes a lot of sense actually, because clones for The "proper" solution would be to make |
@raskad If we nan boxed Let me make sure we don't regress on test262 then I'll look into the performance. |
Okay this one is fascinating... If I only create 1 value in an expression, e.g. If I create multiple negative zeroes in an expression, only the first one keeps the parity. Cloning the negative zero keeps the parity, copying is not supported, ... It seems the bug is not in my code, but that's the only thing I changed... Any idea? |
There's something funny with the parser and/or optimizer, but in any case I'm still planning on running the benchmark before the end of the year. |
Latest on my laptop: Before:
After:
So not as dramatic as @raskad's result (though I didn't run the benchmark at that commit). This is with the latest changes of using ranges and const bit masks. |
After the last fix which should fix everything, this is the latest benchmark:
So a bit better. The next steps would be to remove the extra deref of pointers (bigint, object, symbol and string), which I'll argue is out of scope for this PR. I'll cleanup some more (some magic numbers still in), but this should be ready to review soon. |
Test262 conformance changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already have some comments, but I still have to look at the real inner value code. Only skimmed it until now.
All in all I really nice work!
I ran the benchmarks again and noticed that while there are some significant variants between the individual benchmarks, the overall score is now almost on par with the main branch. Specifically the Crypto
and NavierStokes
benchmarks got faster. @hansl Is this trend also also something that you can confirm when you run them? Otherwise I'll have to take a look why this might be the case on my setup.
One general question regarding the feature flag: Do we plan to remove it depending on some certain features, or do we want to keep that flag? If we want to keep it, I think we probably need to think about how we can test the code with / without to be sure that everything works with both features. Also we would have to choose a default feature for our benchmarks etc.
Based on just the additional effort with all the tooling, I would tend to remove it at some point. Any other thoughts?
} else if let Some(rational) = self.0.as_float64() { | ||
// Use this poor-man's check as `[f64::fract]` isn't const. | ||
if rational == ((rational as i32) as f64) { | ||
Some(rational as i32) | ||
} else { | ||
None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove this branch now, right? I ran the 262 tests locally with the f64 part removed all tests passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to explain to me what code you removed. I removed that let Some(rational) = self.0.as_float64()
branch and got a lot of tests to fail. Is that the branch you were meaning?
My plan when adding the feature flag was to be quickly able to switch between the two implementations to run benchmarks and tests for this PR. We might decide to merge the flag in the |
Benchmarks (using Enum-based
NaN-Boxed
|
On the latest commit, I changed the pointer tagging to using the top bits of the pointer. This means that we lose bits on the pointer value itself (after doing some research 48 bits should be enough), but we gain on performance by having non overlapping ranges for pointers. Seems like it's speeding it up a bit overall:
The final score is getting real close to before this PR on my machine. |
//! | `BigInt` | `7FF8:PPPP:PPPP:PPPP` | 49-bits pointer. Assumes non-null pointer. | | ||
//! | `Object` | `7FFA:PPPP:PPPP:PPPP` | 49-bits pointer. | | ||
//! | `Symbol` | `7FFC:PPPP:PPPP:PPPP` | 49-bits pointer. | | ||
//! | `String` | `7FFE:PPPP:PPPP:PPPP` | 49-bits pointer. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the pointer space we only have 1 unused slot left right which is 7FF8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7FF8
(or b0111_1111_1111_1000
) is used by bigint if the pointer is non-null. The 16th MSB isn't used (yet?), so pointers are actually 48 bits (I'll correct the comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the commit helps clarifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry I meant to ask if 7FFF is unused and reserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unused and reserved for now.
This adds a new
InnerValue
that uses NaN-boxing to allowJsValue
to ultimately only take 64-bits (8 bytes). This allowsJsValue
to fit into one x86_64 register which should greatly improve performance.For more details on NaN-boxing (and its alternative, Pointer tagging), see https://piotrduperas.com/posts/nan-boxing which is a great tutorial (in C) on how to get there. A proper explanation of each tags/range of values is described in the header of the
inner.rs
module.Fixes #1373.