Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max's async Snabb changes + public packet_t #997

Merged
merged 10 commits into from
Aug 26, 2016
Merged

Max's async Snabb changes + public packet_t #997

merged 10 commits into from
Aug 26, 2016

Conversation

kbara
Copy link
Contributor

@kbara kbara commented Aug 23, 2016

From #950 (async changes, in accordance with Luke's vision in #656) and #996 (making packet_t public, in accordance with documentation and to decrease redundant ctype pollution).

@kbara kbara changed the title Max's async Snabb changes Max's async Snabb changes + public packet_t Aug 23, 2016
@lukego
Copy link
Member

lukego commented Aug 24, 2016

@kbara Very interesting benchmarks results for this branch on next-tributary. Mixed but broadly positive. I have defined a new next-tributary-murren jobset to benchmark more configurations on generic hardware too.

Can be that the effect of increasing the struct link size is related to the effect of increasing the virtio vring size with a patched QEMU in #976 (comment).

Let me just dig a bit to make sure the performance impact is an overall net positive. I'd love to understand what causes the differences. Plug: Snabb Studio prototype demo that may become a game changer here :).

@kbara
Copy link
Contributor Author

kbara commented Aug 24, 2016

Wow, it looks like tooling is getting really exciting. :-)

@lukego lukego self-assigned this Aug 25, 2016
@lukego
Copy link
Member

lukego commented Aug 25, 2016

Gee it is fun browsing the benchmark datasets that Hydra produces automagically :).

Here is a full matrix benchmark report that includes kbara-next. This is testing many different configurations (NFV features for iperf, Virtio-net features for l2fwd) on generic hardware (Hetzner "murren" servers w/o hardware NIC).

Observations:

  1. Overall summary looks good. Seems like performance is becoming better and more consistent: the kbara-next (red) line has a new hump that seems to have been formed by some scores increasing (moving right).
  2. Split by benchmark shows different effects for iperf (more zero scores i.e. test failures) and for l2fwd (more high scores at ~6.5 Mpps).
  3. iperf by configuration seems to show a speedup on both the base and l2tpv3 cases (good) but a 100% failure rate on the filter case (don't worry, innocent explanation below).
  4. l2fwd by configuration shows definite speedups on packet forwarding with the base and noind configurations (less clear on nomrg). Good stuff and closely related to the QEMU vring patch that I have been exploring (in fact I will include kbara-next in those tests to see this side-by-side and in combination with the patched QEMU too).

So overall this looks great. The failures with iperf when packet filtering is enabled would in principle be a blocker but I know this is because a specific fix (commit 2a98aa5) has not landed in kbara-next yet:

$ git merge-base --is-ancestor 2a98aa5 snabbco/kbara-next ; echo $?
1

So I am confident that merging this code into next will improve performance overall. Great stuff 👍

Couple more things I want to look at before pressing merge:

  • How come the basic1 benchmark score went down? any idea @eugeneia? This is not a deal-breaker since it is a very synthetic benchmark but could be a sign of a real problem so I don't want to take a regression there too lightly.
  • Gather my thoughts on the packet_t API change.

Note: In a perfect world my next-tributary jobset would not be testing the tributary branches directly but rather test next with each tributary branch merged. Then the results would tell me what the consequences are of merging the branch with next and e.g. we would not see the iperf "regression" because the fix is already present on next and would be included in the test. However, Hydra doesn't work that way at the moment, and it's not the end of the world because we will at least find problems once they land on next i.e. before release.

@kbara
Copy link
Contributor Author

kbara commented Aug 25, 2016

Given that the changes get merged onto next, it sounds like the easy solution to that problem is to simply make sure that kbara-next (and presuambly max-next) frequently merge next into themselves?

@eugeneia
Copy link
Member

eugeneia commented Aug 25, 2016

@lukego I think basic1 score went down because it is processing less packets per breath due to being limited to engine.pull_npackets (see https://github.com/snabbco/snabb/pull/997/files#diff-9f48c75e1c64e2dff09340bf4a6d9815L22). Before it would have filled up the whole link on each breath.

@eugeneia
Copy link
Member

Might actually make sense to revert the change I referred to above and see how that affects basic1. A case could be made for the original behavior, since Source, being a load generator, should keep its output links saturated? On the other hand, limiting to engine.pull_npackets emulates the behavior of a NIC app and might yield more realistic results. We can now also tweak engine.pull_npackets and see how that affects the various benchmarks.

@lukego
Copy link
Member

lukego commented Aug 26, 2016

@eugeneia Thanks for that explanation. I think it is okay for basic1 benchmark scores to decrease if the reason is that it is processing fewer packets per breath. That benchmark mostly exists to catch regressions on basic operations like packet allocation, link transmit/receive, and so on. Could even be positive that it is processing smaller batches because then it will also be more sensitive to the performance of the engine itself e.g. switching between apps, polling timers, etc.

@lukego lukego merged commit cbed511 into next Aug 26, 2016
lukego added a commit that referenced this pull request Aug 26, 2016
@lukego
Copy link
Member

lukego commented Aug 26, 2016

Merged, thanks! And sorry about the wait.

I am actually not sold on putting packet_t into the API and wonder if we should revert that:

  • Redundancy: We can already refer to the packet struct by the name struct packet.
  • Consistency: The name struct packet works in Lua, C, and assembly.
  • Necessity: Does this really solve an "excessive Ctypes" problem? I don't think that referencing an existing struct by name creates a new ctype.

Here is a quick experiment suggesting to me that ffi.typeof() will return the same Ctype for multiple calls if you are referencing an existing definition (rather than e.g. defining an anonymous struct):

$ sudo ./snabb snsh -i
Snabb> ffi = require("ffi")
Snabb> a = ffi.typeof("struct packet")
Snabb> b = ffi.typeof("struct packet")
Snabb> =a
ctype<struct packet>
Snabb> =b
ctype<struct packet>
Snabb> =(a==b)
true

So I wonder if there is really a problematic case that will be solved with packet_t?

@eugeneia
Copy link
Member

Good analysis. My mistake thinking ffi.typeof would create a new ctype every time. I guess reverting cbed511 and updating src/README to refer to struct packet instead of packet.packet_t is the way forward then?

@lukego
Copy link
Member

lukego commented Aug 26, 2016

I can do that over on next.

@lukego
Copy link
Member

lukego commented Aug 26, 2016

btw it is very good that you have your eyes peeled for "ctype diversity" problems :). This is relatively easy to fix once you know what to look for but boy did @alexandergall do a lot of work to discover this from first principles over in #612. Finding such issues in submitted PRs seems like a really valuable bit of review.

@kbara
Copy link
Contributor Author

kbara commented Aug 27, 2016

Oops, should've verified that it actually caused a ctype diversity problem. I'm happy to merge a reversion.

lukego added a commit that referenced this pull request Aug 29, 2016
Grabbing revert of the packet_t type since we decided to go with
'struct packet' directly instead.

See discussion at #997 (comment)
@lukego
Copy link
Member

lukego commented Aug 29, 2016

Given that the changes get merged onto next, it sounds like the easy solution to that problem is to simply make sure that kbara-next (and presuambly max-next) frequently merge next into themselves?

@kbara Interesting thought. This seems like it would work well from my perspective. I suppose we would have to try it to understand all the implications. I would be happy to pull from next-tributary branches that have merged from next.

(Separately I still think it makes sense for new features and fixes to be based on master rather than next when possible, but that is the matter of creating new changes rather then merging and integrating them.)

@kbara
Copy link
Contributor Author

kbara commented Aug 30, 2016

Sounds reasonable. I figure it has handy side effects, like Hydra performance evaluations being based on the combination of patches we'll actually merge; usually it won't matter, sometimes it will.

wingo pushed a commit that referenced this pull request Nov 28, 2017
Rename ExprAlarm to CallbackAlarm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants