-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasmparser
: Performance regressions since v0.100.2
#1701
Comments
Do you have a link to the module with the largest regression? That'd probably be the easiest to start with in terms of evaluating validation performance on that module and seeing what sticks out in perf |
Do you mean Wasm module? From what I saw
The UpdateI just found one reason why it regressed: I benchmarked without setting the |
Ok I've been poking at this a bit and so far what I've found is that #701, the first commit after 0.100.0, led to a precipitous drop in performance. That was known at the time however. There are other smaller regressions here and there but they're looking to be more difficult to pin down. Profiling the code itself does not show any unexpected hot spots. That means that it's just sort of spread out a lot. Doing various tweaks seems to help here and there but nothing is a major improvement. For example outlining string-based errors to their own One improvement that has shown promise is skipping initialization checking for locals entirely. This was first added in the function-references proposal and later proposals like gc rely on it as well. This yields a ~5% gain (ish) but requires completely removing the check. That means that only way to achieve this would be to add compile-time features to wasmparser. Overall my guess is that the wasm language has just gotten more complicated over time which has slowed down the validator. I don't know of a great way to handle this other than to add a bunch of Cargo compile-time features for each wasm proposal. Doing that is not something I think would be easy to design but I think would provide the wins necessary here. |
Thank you for the investigation @alexcrichton ! If your assessment is correct I have to admit that it is a bit worrysome to me that the Wasm spec seems to develop into a direction that may not be suitable for all of its original users and stake holders. I remember that you said that My assumption of how Wasm features are intended to work was that they should (ideally) not significantly influence the performance if they are unused. So only if I actually were to use the Conditional compilation wouldn't solve the problem from my point of view since eventually all Wasm runtimes (including Wasmi) want to adopt all stabilized features anyways. The Wasm standard is not modularised at the moment and thus does not allow to be picky. |
In my last performance assessment I made a mistake by not also enabling We see a performance regression between 15-45%. Still, |
I would be hesitant to generalize wasm-tools's implementation to the entire spec development and other engines in the sense that we could just perhaps be bitten by our own decisions. This could be wasmparser-specific and not affect other implementations, I'm not sure. When I've benchmarked historically, for example, I believe that v8's validator (via I also saw similar hot spots as you locally, but I was unable to figure out how best to optimize them and local attempts weren't bearing much fruit. The initialization checking looks like this. That's an array lookup plus a boolean test of what's in the array. My guess is that everything is a well-predicted branch and in-cache since locals typically go to the same local. The 5% perf win I got (ish) was removing those checks entirely. That's not spec-compliant so we can't land it but it was the best I could do. |
You are right that this might very well be just an implementation detail that can be fixed. Thanks a lot for the link to the |
If it helps, here is the implementation in SpiderMonkey [1]. We do an upfront check for if the local index is less than where the first non-nullable local is and skip any boolean checking after that. Then we consult a bitmap. For setting a local, we also skip any consultation of the bitmap in the case where the local has been set [2]. So for applications that are not using any GC types, the only cost on local.get/set is the upfront check against the first non-nullable local index (which will always fail). [1] https://searchfox.org/mozilla-central/rev/dd421ae14997e3bebb9c08634633a4a3e3edeffc/js/src/wasm/WasmOpIter.h#347-352 |
Thanks so much for sharing this information! That's invaluable. :) |
Indeed thanks @eqrion! I suspect that both SpiderMonkey (and possibly v8 too) would be good sources of inspiration for algorithmic changes to the validator in wasmparser. Around locals specifically another thing that wasmparser does is it doesn't maintain Another possibility is that there is currently zero |
I updated the Wasmi PR to update Despite the performance regressions I consider merging the PR to finally make use of the newer |
FWIW I'm poking around in #1845 about slimming down wasmparser optionally at compile-time and one thing I was thinking of was adding a GC-like feature which might help recover some performance here. I'm not 100% sure it'll work though nor if it's worth it, but figured it's worth poking around at least. |
Oh thanks a lot for the information! I like #1845 a lot even if it won't improve performance, reducing compile times is also a huge gain already. :) |
I opened #1870 for minor gains in the However, as the Wasmi benchmarks suggest, most of the performance regressions are in the Wasm validation parts outside the |
@alexcrichton I reran It is the From what I can see, the following sub-parts take the most time in the benchmarks:
All of which seem to be related to the In comparison the sub-parts that take the most time for
It seems that the (computationally expensive) I.e. use an efficient way to register types that also detects (potentially) recursive references within them, and only use the expensive This way users only pay for what they use. |
Thanks for measuring this @Robbepop! That intuitively makes sense to me and we historically have spent most of our time looking at optimizing the operator validator rather than the general module validator. Would you be able to extract an example module or two that showcases this slowdown? For example a module that only has a type section should be sufficient to showcase the slowdown and additionally track improvements. |
(sorry still recovering from jetlag so responding to the rest of your comment now too)
Makes sense to me! I'd prefer to lean on "detect a recursive type" if possible and only use |
There already is a
I too think it is better not to use the |
All that sounds reasonable to me yeah. I haven't dug into why this is so much slower than before myself so I can't speak too much as to whether it seems like the best route forward, but I'm happy to defer to you as well. |
So far I have only digged into |
@alexcrichton I have been looking into the type section validation of the I implemented specialized type section validation here: #1906 |
I just tested this PR on Wasmi's
|
Benchmark | v0.100 | v0.220 | PR |
---|---|---|---|
eager/checked |
76.3µs | 83.9µs (+10.0%) | 82.6µs (+8.2%) |
eager/checked+metered |
78.3µs | 87.3µs (+11.3%) | 85.0µs (+8.3%) |
eager/unchecked |
58.5µs | 58.5µs | 59.3µs |
lazy-translation/checked |
32.6µs | 40.8µs (+21.8%) | 38.3µs (+14.3%) |
lazy/checked |
6.3µs | 8.8µs (+35.5%) | 7.2µs (+11.3% |
lazy/unchecked |
3.6µs | 3.9µs (+6.1%) | 3.8µs (+5.9%) |
spidermonkey.wasm
Benchmark | v0.100 | v0.220 | PR |
---|---|---|---|
eager/checked |
35.7ms | 38ms (+5.6%) | 38.5ms (+5.9%) |
eager/checked+metered |
37.0ms | 40.1ms (+6.6%) | 40.1ms (+6.6% |
eager/unchecked |
28.5ms | 28.9ms | 29.1ms |
lazy-translation/checked |
15.2ms | 16.9ms (+10.0%) | 16.8ms (+9.8%) |
lazy/checked |
1.0ms | 1.1ms (+6.0%) | 1.0ms |
lazy/unchecked |
725µs | 740µs | 745µs |
Summary: Still not ideal for small Wasm files, but much better compared to v0.220.
Due to technical reasons I had to use the (by now) ancient
wasmparser 0.100.2
in Wasmi and was using a fork calledwasmparser-nostd
. The fork had no significant impact on the performance.Recently I updated Wasmi to use the newest
wasmparser v0.214.0
and unfortunately saw massive performance regressions, especially for Wasm validation. Many Wasmi benchmarks regressed by 15-35% and some even by 45% compared to the Wasmi that uses the ancientwasmparser v0.100.2
.The Wasmi upgrade PR is here: wasmi-labs/wasmi#1141
Maybe I did some mistakes when upgrading which explain the performance regressions.
Since Wasmi is very focused on speedy translation times those performance regressions unfortunately are a show stopper for using the new
wasmparser
versions.The text was updated successfully, but these errors were encountered: