Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add leaky=false mode to RateLimiter. #793

Closed
wants to merge 3 commits into from

Conversation

wingo
Copy link
Contributor

@wingo wingo commented Mar 1, 2016

This adds a configuration argument to RateLimiter, leaky. If it's true, you have the previous mode of operation where incoming traffic that's above the threshold is dropped. Otherwise it is left on the incoming ring buffer.

As a functional change, this new parameter defaults to false, meaning that users of RateLimiter should update their configurations.

Also, RateLimiter now provides default values for rate and bucket_capacity.

I have tested this code in the lwaftr test harness (Igalia#281), and it seems to work correctly. I have not tested the change to the NFV. @eugeneia would you mind having a look? Thanks in advance :) (This is a follow-on to #782.)

wingo added 2 commits March 1, 2016 14:14
This adds a configuration argument to RateLimiter, leaky.  If it's true,
you have the previous mode of operation where incoming traffic that's
above the threshold is dropped.  Otherwise it is left on the incoming
ring buffer.

As a functional change, this new parameter defaults to false, meaning
that users of RateLimiter should update their configurations.
@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @lukego, @eugeneia and @altexy to be potential reviewers

--- By default, limit to 10 Mbps, just to have a default.
conf.rate = conf.rate or (10e6 / 8)
-- By default, allow for 255 standard packets in the queue.
conf.bucket_capacity = conf.bucket_capacity or (255 * 1500)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Just to have a default” is not an argument for having a default. I don't see a case where a default is useful, and not having an error when accidentally failing to supply rate and bucket_capacity is dangerous. Please revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what sort of error could occur if the user fails to supply one of these parameters. For what it's worth, we have been using this bucket capacity to be a fine default in the lwaftr's load tester, across a broad range of throughputs, and there has been no problem. As for rate, if you use the set_rate() api exclusively, as we do in the load tester via the promise-based transients, it doesn't matter what the initial rate you set is because you will overwrite it from the beginning.

I think the burden is rather on you to specify the way in which these are harmful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Apologies, my message sounded rude; I didn't intend that. My question was, what is the nature of the bug you are worried about when adding these defaults?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No offense taken. If in the future, I add a typo to nfvconfig it could lead to a hard to detect bug where Snabb NFV is broken because it fails to configure RateLimiter. In my mind a default rate is always wrong because there is no sensible default rate, and the whole “raison d'etre” of RateLimiter is to select a rate. I see using set_rate in any case other than to dynamically change the rate as a hack to avoid the declarative nature of app networks. Why do you need a procedural interface to something that appears to be solvable in a declarative fashion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So, we need to be able to create load transients -- loads whose applied throughput varies over time. We use the set_rate() interface for this. Because a timeline naturally has a starting point, the natural thing for it to do is to call set_rate() in the beginning, and so it doesn't make sense to manage the limit of the rate limiter outside the timeline.

Regarding typos, I agree that it can be a problem, and it goes for anything that takes tables of options. I think rather than forbidding defaults, we should assert that options that are specified are known. See https://github.com/SnabbCo/snabbswitch/blob/next/src/lib/ctable.lua#L74 for one possible solution.

Regarding the bucket capacity -- while I think it can make sense to force the user to know about the algorithm, it's also OK to provide a default that just works. It would seem that we can provide one, especially for synthetic workloads where we are not dropping packets. Maybe it's just that I'm mostly concerned with that workload, and that with real traffic you want to think more about the bucket capacity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The I don't see why you can't supply the default values in your application. I will not accept this particular change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a reason. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify. If you think my use case is invalid, that is fine, I can go back to use the specialized app. please let me know, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case is perfectly valid and I am completely on board with this PR, just not with the particular change of adding default values to rate and bucket_size as there are no sensible defaults. You can set default values in your program when instantiating the app without pushing the defaults onto other users of the app. Rejecting the default values does not conflict with your use case as far as I can tell.

@eugeneia eugeneia self-assigned this Mar 1, 2016
@lukego
Copy link
Member

lukego commented Mar 2, 2016

Sorry to be late to the party here...

The big question: Do we rely on flow control (back pressure) in the Snabb app network or not?

If we do then this change makes sense (rate limiting via app composition) but if we don't then #782 makes sense instead (rate limiting as a feature of synthetic packet source apps).

My mental model is that we don't want to use back pressure in the app network (#656) and in that case the original PR #782 makes more sense. This PR seems like a good test case for whether there is agreement on that idea or not.

@wingo
Copy link
Contributor Author

wingo commented Mar 2, 2016

Morning :)

I agree that in general we do not want to apply back-pressure. However for synthetic loads, often you do. I would agree with the sentiment @lukego that this functionality could belong in a specialized app, but there are a few of these apps already -- the repeater (when joined with the pcap reader), the packet generator in snabb lwaftr, and I think Marcel has one in Snabb VMx. This functionality should exist in one place. Whether it's an app or a library is not so important to me, but adding the functionality to RateLimiter does seem to work.

@lukego
Copy link
Member

lukego commented Mar 2, 2016

@wingo Reasonable. I can accept that making all the upstream apps consistent in their buffering behavior is a separate piece of work (potentially containing some surprises) and until that is done it should not block adding new functionality like this.

I am also skeptical of the defaults. The throttle rate seems naturally like a mandatory parameter to me too. I would also expect the "leaky" behavior to be the default for two reasons: it's more consistent with the plan to avoid back-pressure in apps (i.e. config option is to switch to the unusual behavior) and I also think it is what you normally want in production (where early and deliberate packet drops tend to have better end-to-end behavior than nebulous attempts at link-layer reliability at least in my experience.)

@wingo
Copy link
Contributor Author

wingo commented Mar 2, 2016

OK. I will change this code to require rate and bucket size even when it's not necessary; a true grumble here but whatever. (I feel like my use case is not being considered as being as good or as important, and that the existing use case is forcing my code to go through ceremony for no benefit. The bucket size in particular is a useless knob for synthetic workloads.)

@lukego
Copy link
Member

lukego commented Mar 2, 2016

FWIW I have no objection to a default bucket size.

I do think that defaults should favor production environments rather than test environments, or else we should have entirely separate apps like in the other PR (which I personally liked better).

I may be overly sensitive because I have spent several years of my life compensating for the side-effects of well-meaning attempts to prevent packet loss on links carrying ethernet traffic, particularly side-effects introduced by ARQ in mobile networks.

(Lots of other people are likely sensitive because of experience with buffer bloat too. I am probably over-reacting in this case but I do think one could paraphrase jwz: "A programmer had a problem with packet loss. I know, he said, I will eliminate this by making the links buffer packets. Now he had two problems.")

@kbara
Copy link
Contributor

kbara commented Mar 2, 2016

My biases mirror those of @lukego on this, for what it's worth. Buffer bloat reached ridiculous proportions in some systems.

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

Welp, I guess we'll just keep this module local to the lwaftr then; I don't need this on the blocking path to a merge and we can move things around later I guess.

@lukego
Copy link
Member

lukego commented Mar 3, 2016

@wingo Understandable :).

This discussion has been valuable in terms of coming towards a common understanding of how to handle default parameters and ratification of the "forward-or-drop-but-don't-hold" app behavior.

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

I guess, if you say so ;) To me the discussion has shown me a few things. I should never treat a merge as being a no-brainer; that there are many people who can say no, and no one to say yes; that there are many use cases and the ones that are already in git come first; that if one reviewer suggests A and you do it, that another will come later and say that the original thing was better; and so on :)

@wingo wingo closed this Mar 3, 2016
@lukego
Copy link
Member

lukego commented Mar 3, 2016

@wingo Could be that I should have waited with my input for when Max sent the change upstream to me. Then I could have haggled with him about the changes that I want e.g. grumbled about introducing backpressure into the app network and asking him to push back on that in the future.

I chimed in early because I thought you might want the chance to address the feedback earlier since you have previously expressed concern about people changing your code after the initial merge. On the other hand, that is also something that could be addressed with a follow-on PR to next if you wanted to propose undoing that.

Still working all this stuff out. Sorry to meddle.

@lukego
Copy link
Member

lukego commented Mar 3, 2016

(I am sure that it is a newbie maintainer mistake to over-estimate how interested people will be in your feedback. Forgive me these small missteps :))

@eugeneia
Copy link
Member

eugeneia commented Mar 3, 2016

I think there was a misunderstanding and I do not think this PR should be closed.

The issue about buffering/dropping packets is solved to my satisfaction in this PR, and can be reviewed later by a more general initiative about buffering in app networks (I do not like that this PR changes a default for the sake of avoiding a boolean true default value, but this is not a blocker for me).

On the issue of the default rate I stand firmly as explained in the respective sub-thread, but I do not think that this small change conflicts with the PR as a whole.

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

I think @lukego that your input was timely, much better now than later. To me I see the fundamental conflict as being that Snabb was made with some apps shaping its development, and that extensions proposed during the course of new development might not be as well-suited to the needs of those apps. In this concrete case, apps which generate synthetic test workloads do not have the same requirements as apps which limit real traffic. They differ in approaches to backpressure but also as we see in default values (acceptable in one use case, apparently not in the other).

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

Of course it's harder to understand how a new use case could affect existing core code when you can't see how it affects the new code as it's not in git yet. I think it's a definite best practice to only touch your app/ and program/ directories, and I think we would have avoided a lot of problems in the lwaftr upstreaming if we did that.

@lukego
Copy link
Member

lukego commented Mar 3, 2016

@wingo Here are the invariants I am ham-fistedly trying to protect:

  • Backpressure is not used in the Snabb app network. (That is a legacy misfeature since Buffering in the app network: when to hold and when to drop #656).
  • Apps designed for live traffic should not have defaults chosen for test traffic.
  • Defaults should only be provided when they are broadly applicable e.g. for the bucket size (who cares) but not for the maximim traffic rate (epsilon chance that the default is appropriate).

Sorry that I have evidently not done this very well.

@lukego
Copy link
Member

lukego commented Mar 3, 2016

s/protect/establish/

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

@lukego is backpressure never appropriate for test generators? Seems silly to include rate-limiting functionality in all test generating apps. Dunno.

@lukego
Copy link
Member

lukego commented Mar 3, 2016

@lukego is backpressure never appropriate for test generators?

That is an important question that can only be answered through discussions like this :).

I would argue that the core simplicity of never using backpressure more than compensates for putting rate limiting into synthetic load generation apps. (End-to-end principle: simplify the core at the expense of the edges.) However, I feel that I need to introduce a PR to pitch that idea in terms of the existing codebase before I push too hard on this with new apps.

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

@lukego ok :) bear in mind that the future of Snabb includes a RateLimitingPcapReader and a RateLimitingAndRepeatingPcapReader :)

@eugeneia
Copy link
Member

eugeneia commented Mar 3, 2016

bucket size (who cares)

This is a misrepresentation according to the API.

— Key bucket_capacity

Required. Bucket capacity in bytes. Should be equal or greater than rate. Otherwise the effective rate may be limted.

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

@eugeneia are you sure? 255*1500 suffices to rate-limit to anything between 0 and 10Gb in our test harness.

@wingo
Copy link
Contributor Author

wingo commented Mar 3, 2016

I think rather that a bigger bucket just allows for spikier traffic. E.g. if you limit to 1Gb and your bucket is 1Gb, you could push 10Gb for 0.1s and 0 for the rest, and you still fit under the limit.

@lukego
Copy link
Member

lukego commented Mar 4, 2016

@eugeneia Good catch to flag the difference between comments here vs documentation in the rate limiter readme. Certainly the docs should reflect whatever we really think.

Requiring the bucket size as a parameter does seem like a poor user interface to me. My intuition is that, as @wingo says, there will be a sensible default value (either a constant or a function of the rate) and it would be better for the app to know/calculate this than expect the user to do so. The reason it is a required parameter is probably that the person writing the code was not confident of the answer and so punted the issue to the user (reasonable, and certainly better than choosing a bad default, but not optimal).

I would intuitively imagine that around 1 millisecond worth of bucket capacity would be adequate but [citation needed].

@lukego
Copy link
Member

lukego commented Mar 4, 2016

bear in mind that the future of Snabb includes a RateLimitingPcapReader and a RateLimitingAndRepeatingPcapReader :)

@wingo :-)

On reflection I see this "leaky vs non-leaky apps" question as like "eager vs lazy functions" in a programming language. Lazy functions do permit some really lovely formulations of certain problems, but it can also have surprises consequences (e.g. if you try to sum an infinite list), and if you mix both kinds of functions then it's a whole new can of worms about what happens when lazy and eager functions call each other.

So with that analogy it seems like in Snabb we started off without committing to lazy vs eager, then we wrote a bunch of code that is lazy, then we got a vaguely uncomfortable feeling that we should probably pick one way or the other (probably eager), and now on this PR we start to add library code that takes a parameter saying whether to behave eager or lazy.

This seems to open a whole can of worms where users have to understand both evaluation models and the consequences of mixing things together. "How come my router is showing egress packet drops after the latest update?" "Well, your RateLimiter app is now applying backpressure to your Intel10G app, which is causing the NIC to run out of receive buffers, which is causing it to send Ethernet PAUSE frames, which is engaging flow-control on your router port, which is causing the router to drop packets the packets that your rate limiter doesn't want." with addendum: "Be careful not to add a leaky app between the NIC and the rate limiter, like an L2 switch, because then the backpressure will stop there and your router behavior will change again."

Too hard. Better to just pick eager evaluation for its predictability and live with the limitations.

The PR that I am imagining formulating btw is to replace this "lazy" idiom:

for i = 1, link.nwritable(l) do ... end

with this "eager" alternative:

for i = 1, link.burst(l) do ... end

and for a simple example link.burst() could return a constant e.g. 1/10th of the size of the ring buffer for the link (e.g. bump link capacity up to 1000 elements and burst 100 packets). That way on a breath each app would burst up to 100 packets (by convention), these would all be processed by the app network before the next pull(), and no links should overflow because the ring buffers are generously dimensioned.

Taking this a step further we could have a related idiom for any app that generates synthetic load:

for i = 1, ratelimiter:clamp(l) do ... end

which does not seem so bad. Kind of like how in an eager language every function that returns a list needs to take a length argument saying how many elements you want.

@lukego
Copy link
Member

lukego commented Mar 4, 2016

... So when RateLimitedPcapReader comes along then hopefully we will see a clear pattern and be able to simply add a rate parameter to PcapReader and every other app that generates packets.

This could also happen asynchronously with follow-on PRs if it starts to bother us that we have too many RateLimitedFoo apps in the code.

@wingo
Copy link
Contributor Author

wingo commented Mar 4, 2016

Hi :) Good morning, I like the musings. I admit that it can be frustrating (to me) when a PR causes a bunch of musings ;) but I prefer that we share a common understanding of the goals and patterns of the project and this seems an OK way to come to it.

@lukego i think you will find that the concrete approach you are suggesting for rate limiting won't work, because the rate limit is on bytes and not packets.

I think the logical end of your hack-direction is to remove the repeater also, and any other synthetic-test-related middlebox. It could be this is the right thing for production data-planes, but it will make writing test applications much harder -- backpressure is perfectly fine in those situations and building app networks is an expressive way to compose a compound network function. Removing backpressure entirely will leave a hole that should be filled by some pattern :)

@lukego
Copy link
Member

lukego commented Mar 4, 2016

@wingo So next step for me is to put my code where my mouth is and propose my design in a PR, if I can really formulate it, and see if it gets shot down. Sounds good :).

Musing and bikeshedding are a major potential problem. My mindset here is that the foreground discussion is between the submitter (@wingo) and the upstream (@eugeneia) and everything else is parenthetical background discussion that you can cherry-pick ideas from but otherwise ignore. If this is hard in practice, because the chatter is distracting, then we could declare that general musings are off-topic for incoming PRs and should be raised separately e.g. as their own PRs.

Success here is if we land a steady stream of improvements upstream on master. If musings end up derailing this then they should be curbed somehow e.g. expressing ideas as [sketch] PRs rather than comments on other work.

wingo added a commit to Igalia/snabb that referenced this pull request Mar 4, 2016
@eugeneia
Copy link
Member

eugeneia commented Mar 7, 2016

Musing and bikeshedding are a major potential problem. My mindset here is that the foreground discussion is between the submitter (@wingo) and the upstream (@eugeneia) and everything else is parenthetical background discussion that you can cherry-pick ideas from but otherwise ignore. If this is hard in practice, because the chatter is distracting, then we could declare that general musings are off-topic for incoming PRs and should be raised separately e.g. as their own PRs.

This! In my opinion this PR should have landed (minus arbitrary defaults, and ideally retaining default behavior). It adds a little feature (back pressure mode in RateLimiter) that we don't understand very well but that is proven to be useful and a natural way forward in our orthogonal app ecosystem, and most of all its not intrusive (given it retains the default).

It's perfectly fine to replace this feature with something else once we figured out a general strategy, but right now I think it can't hurt to spell out a requirement (composable RateLimiter) for that strategy in used code. Now that @wingo moved this feature back into a private API it is less visible that it could and should be IMHO.

So I think a good a good etiquette would be to note musings as comments pointing to an issue that will be further discussion of these musings. Also, don't let anyone other than the assignee talk you into closing a PR, what did he review the changes for if they just get closed under his nose? ;-)

lukego pushed a commit to lukego/snabb that referenced this pull request May 22, 2017
Allow list statements without "key"s when "config false" is set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants