-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Timeline log with performance counters #1011
Comments
So latest crazy idea: Suppose that we would analyze Snabb performance by looking at individual breaths rather than whole end-to-end benchmarks. Then each benchmark run would produce not one metric (e.g. overall average throughput) but more like 100,000 metrics (performance of a sample of breaths). This way when Hydra runs an intense benchmark (a machine-week or so) we would have around a billion data points to analyze instead of the 10,000 or so that we have now. The analysis could be done using models like in #1007 (comment). Potential advantages:
So still a pipe dream for now but it could be very interesting to turn a timeline log into a million-row CSV file and see what R can make of it. Incidentally the data above ^^^ is a little interesting. This breath is from snabbnfv between two VMs doing iperf with jumbo frames. (MTU 9000). This is fun because when we are copying packets to the VMs we are probably using at least 2MB of cache per 100 packets. So we see quite a bit of activity in terms of L3 hits and even L3 misses (RAM access). On the other hand the overall performance is excellent at 20 bits of throughput per CPU cycle. So even if the engine is perhaps not optimally tuned for jumbo frames they are still a very easy workload and with ~ 100 packets per breath it seems like we could do 20 Gbps of traffic for each 1 GHz of CPU. This armchair analysis might be much more satisfying as a formal model fitted to the data though... |
Related idea: We could extend LuaJIT with a global counter |
Just an idea that I wanted to share: I experimented with adding CPU performance monitoring counters to the timeline (#916) log entries.
In this mode each log messages records not only the elapsed time (cycles) but also the number of L1/L2/L3/RAM accesses. It also records the number of instructions executed and the effective clock speed (adjusted for frequency scaling, Turbo Boost, AVX2 frequency penalty, etc).
Here is a little demo in the Snabb Studio prototype. First you see a list of breaths where you can select one that looks interesting:
Then you can see the detailed processing steps for that breath, each annotated with performance counter deltas and useful metrics like Turbo level and Instructions Per Cycle:
The idea is that this tooling could take some of the mystery out of performance analysis. Perhaps we could come up with more systematic ways to optimize application performance:
The challenge and opportunity is how to make sense of these logs when we have a million-or-so entries. Experience is needed here... can we skim them by hand, do we need special visualizations, can we capture the important details with a few key metrics. I don't know yet. This is why I am tending to keep experimenting rather than pushing the prototype tools on other people for the moment :).
One more important direction is being able to deal with logs files from executions where a lot of different things happened. For example it would be wonderful to be able to torture a Snabb process with many different non-deterministic workloads and then extract well-defined performance results directly from the timeline files.
End braindump.
The text was updated successfully, but these errors were encountered: