-
Notifications
You must be signed in to change notification settings - Fork 113
Potential high CPU usage causing stuttering UI #802
Comments
@pgebheim do you have any other WebAssembly running at the same time as Mesh? I noticed a big discrepancy in the profile data that you sent me compared to running the example in the Mesh repo. Here's our example. This is a profile captured over a time period of 30 seconds after Mesh has already warmed up (it was running for about 60 seconds prior to taking the capture). I used the Task Manager feature of Chrome to keep an eye on the CPU usage. It typically stayed around 7-10% with occasional spikes to 25% when there is a new batch of incoming orders. (Update: I did see it spike to 60% after letting it run for a lot longer. It sunk back down to 5% after about 1 second, which I believe is the polling rate of the Task Manager). This is on mainnet with around 2k-4k orders. Two things to note:
Here is the profile data that you sent me: While there are a lot of similarities, the top entry when sorted by "Self Time" is something called "wasm-unnamed". I have not seen this before when looking at Mesh profiles, and I think it may be unrelated to Mesh. Is that possible? |
I captured my own performance profile by following the instructions from #803 and building and running Augur locally. I waited a little while for the application to warm up and then captured 30 seconds. Interestingly, I don't see anything called "wasm-unnamed". I wonder if this is just a difference in Chrome versions or Chrome settings? Here's what it looks like for me: Note that the top entry is something called "Function Call" which unfortunately doesn't tell us much. Digging into the flame graph gives a better picture of what is going on: The red marks are pointing out function calls that took longer than usual. Looks like there is one every few seconds. Let's zoom into one of these calls: I highlighted in red the portion of CPU time that is coming from Mesh. It looks like most of the time comes from Here's the snippet of code that I believe corresponds this particular "Function Call": private notifyNewBlockEvent = async (blockNumber: number, logs: ParsedLog[]): Promise<void> => {
let lowestBlock = await (await this
.db).syncStatus.getLowestSyncingBlockForAllDBs();
if (lowestBlock === -1) {
lowestBlock = blockNumber;
}
const blocksBehindCurrent = blockNumber - lowestBlock;
const percentSynced = ((lowestBlock / blockNumber) * 100).toFixed(4);
const timestamp = await this.augur.getTimestamp();
let stats: ZeroXStats = {peers: 0, orders: 0};
if(this.augur.zeroX) {
stats = await this.augur.zeroX.getStats();
}
this.augur.events.emit(SubscriptionEventName.NewBlock, {
eventName: SubscriptionEventName.NewBlock,
highestAvailableBlockNumber: blockNumber,
lastSyncedBlockNumber: lowestBlock,
blocksBehindCurrent,
percentSynced,
timestamp: timestamp.toNumber(),
logs,
...stats
});
}; Digging into the code further it looks like However, as far as I can tell, this is only taking up about ~25% of the CPU time associated with what the Chrome profiler calls "Function Call". If you want to reduce the stress on the CPU, based on these results the first place to look should be the rest of the If you do need to reduce this ~25% CPU time that Mesh is using without waiting for our database upgrade, we have a few options for temporary workarounds. The most straightforward would be simply not calling |
Note that I am not trying to measure CPU time on startup or during the warmup period while Mesh is getting orders for the first time. The reason is that there are a lot of variables that come into play (how good your browser is at compiling WebAssembly, how many orders are in the network, what's in the cache, etc.). I also think #803 is a confounding factor and could potentially be affecting the results. We can take a closer look at CPU time during startup after #803 is addressed. |
Oh, we can remove the call to |
@pgebheim I removed the call to It's very similar to before except that the part of the CPU time that Mesh was responsible for is gone. I think you should take a look at what else is going on inside of |
Just updating this issue for the sake of consistency. While it is true that under the conditions described above Mesh is not responsible for a significant portion of the CPU time (after removing the unnecessary call
@jalextowle please feel free to update as needed. |
@pgebheim The main focus right now has been on optimizing the current JSON schema validation. This is how order filters are used to validate orders, and it's become apparent that our current JSON schema validation logic is quite slow when it is executed in the browser. I'm currently working on attempting to use a JSON schema validator library called Another optimization that I have worked on is attempting to release the main thread using go-routines. This seems to have been a dead end, but we are communicating with the Go team in order to see if there is a way to accomplish this goal in Go that has been compiled to WebAssembly. I'm interested to hear how things are working when you are running the browser node in a Web Worker. |
@jalextowle -- Thanks for the update. We're working on getting an initial cut of the PR in. According to the PR author this makes the UI not freeze up at all -- though the syncing is still rather slow. |
Any news? |
@loca555 We have recently released version 9.4.0 of the Mesh network. This release includes a performance optimization that reduces the CPU usage of one hot spot that we identified while digging into this issue. We are now working on several more optimizations, and I'll post any progress that we make here. |
See #820 for another optimization. |
#840 includes an optimization for |
#846 includes an optimization for |
After a few rounds of optimization, we have greatly reduced CPU usage in the browser. In Chrome, we are seeing that Mesh does not hold onto the main thread for longer than 100ms at a time in most circumstances. In addition, Augur is now using Mesh in a Web Worker which eliminates stuttering altogether. There is still room for further CPU optimization if needed, but I think at this point we have solved the immediate problem of UI stuttering. @pgebheim is it okay if we close this? |
Context
Augur has reported this issue on Discord. They are seeing UI stutters which they believe may be due to Mesh taking up too much CPU resources.
In the browser. Specifically Chrome. @pgebheim Do you know if this affects more browsers?
8.0.1
instead oflatest
or8
).@pgebheim can you answer this?
Expected Behavior
After an initial spike while compiling the Wasm, Mesh should not be using too much CPU.
Current Behavior
Mesh is apparently taking up a lot of CPU time and causing the UI to stutter.
Failure Information (for bugs)
See below regarding logs.
Steps to Reproduce
@pgebheim can you answer this? You mentioned that you are using a script to create Kovan orders? Can you send that our way so we can reproduce exactly what you are seeing?
Failure Logs
Logs and profiler output from Chrome have been sent to myself and @jalextowle but are too big for GitHub. Please reach out to me if you need them.
The text was updated successfully, but these errors were encountered: