-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Buffer debloating blog post #524
base: asf-site
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Section 1 (What is the article about?)
- "instabilities", do you mean "imbalances" or "varying conditions" (instabilities refer rather to failures than to stable conditions)
- why "minimum possible overhead", do you mean "minimum latency" (typical duo is "maximum throughput" and/vs "minimum latency")
- simplify example: instead of record1, record2 etc. It's not clear to me what the example should illustrate.
- What is the problem that buffer debloating tries to solve? (Also it is not mentioned that "buffer debloating" refers to network buffers and how they can bloat.)
Section 2
- For Java non-networking/Netty affiliated people, make clear that the term buffer is here used as a data structure to store records but also as the unit that is transfered via network.
- I don't understand "network buffer is a good tread off". You mean trade-off ... a buffer cannot be a trade-off.
Section 3 (What else can go wrong?)
- Here, you describe the actual problem. This is too late. The problem (on some abstract level) should be described in the summary at the top. Up to here, I actually don't know what it's all about. Then, step by step break it down and decsribe aspects like the relation to backpressure. At the finest level, describe the concrete mechanics as an example but explain on an abstract level. This makes it possible to understand. (Example: The problem is X. The root cause of X is Y and how it interacts with Z. For example, as illustrated, if y1 and y2 happen, z1 and z2 cause y3 ... and finally X.)
Section 4 & 5
- Seems okay so far. Maybe some details which we can address later.
General:
- Remove title "What is this article about?"
- The text is very verbose. It should be more concise.
- Add spaces before parentheses
- Don't use "you" form if not really necessary and in this text "you" is not necessary (instead speak about the application or the system not about the reader "you")
- Also don't switch between forms (I vs. you), avoid both
Let's discuss it later but I think that
latency in general is not so important here. The latency for aligned checkpoints and memory usage for unaligned checkpoints are important. So I try to find somehow the correct word for both of these cases. Perhaps, something like "maximum throughput" vs "minimum effective memory"( or "memory overhead")
It is exactly what is described in |
With instabilities you're in fact right because the parameters are not stable. Nevertheless, I find it confusing because instability usually refers to some "collapse" scenario which is not the main aspect here, right? I find "varying conditions" a bit more precise here because varying conditions require to adapt which buffer debloating tries to approach. "gather records" and "smallest unit of sending data" is a bit confusing/non-technical. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It like it and the structure looks really good. I have some comments below and I think the rest is then a bit tuning formulations. :)
Summary
- Looks good.
Network stack
- Looks good.
Buffer debloating
- Introduce the term "buffer debloating" earlier in this section, e.g., after the explanation about "time instead of amount of data".
- Question here: For the case of backpressure and no backpressure, what is exactly meant? Is it backpressure to the upstream task or from downstream task? I assume the first, but I would suggest to make this explicit here.
How does it work?
- I really like that you open up the section from a top level perspective: two parameters, current implementation ajusts via buffer size only.
- Is it important to note that, in theory, one could adjust both parameters? Or is there a conceptual reason to adjust only buffer size? If not, I would remove that sentence.
- What I would add here is the implementation detail that it doesn't change the actual buffer size but limits the usage of each buffer to some adaptable maximum. As a consequence, it does not reduce the actual heap memory usage but reduces the amount of data and, hence, comes with the mentioned benefits.
- Maybe introduce numbered lists to describe steps, e.g.:
- Consistently calculate throughput.
- Calculate buffer size
- Adapt smoothly.
- I would remove the last sentence "It is why some specifc cases can be ...". I understand that you introduce the next section but it's not necessary. The headline of the next section does that.
Usage
- Looks good but why is 'Multiple inputs and unions' section from the linked manual missing?
I didn't get the question. It is backpressure to the upstream task from downstream task(there is no any
I think it is important to know that since if the user know about two parameters(buffer size, buffer numbers) it is obvious question what and how buffer debloating change. Ideally, we should change both these parameters but by simplicity reason(at least as first implementation) we decided to change only buffer size. Perhaps, we can just simplify this sentence like
We actually already implemented that so we should fix the original documentation instead. at least we need to check what was implemented already. I will do that. |
To checkpointing:
You're right I find my question confusing myself. What I actually mean is, why do you distinguish here backpressure and no backpressure scenarios? That's not so intuitive to me because the benefits could theoretically (I admit only theoretically and probably never in practice) be visible also if there is no backpressure, i.e., if input buffers have high fill level whilst never being full with no backpressure such that buffer debloating limits data in the buffers and reduces checkpoint time and checkpoint size likewise. Nevertheless, if I get your idea right I would suggest to have a little transition like and make your intuition explicit, e.g.: The benefits of buffer debloating are best visible when backpressure occurs. The reason is that no backpressure means that the downstream task is processing input faster than (or equally fast as) data arrives such that its input buffer is barely filled and, consequently, buffer debloating has almost no visible effect. This is the opposite when backpressure occurs as it means input buffers are frequently filled. Since buffer debloating limits the amount of data in the input buffer, we observe the following benefits with regard to checkpointing: ... To buffer debloating mechanism:
I would add a side phrase with exactly that. For simplicity, buffer debloating currently only caps the maximal used buffer size ... and at the end of the paragraph mention Nevertheless, the benefits of buffer debloating with regard to checkpointing are effective as described before. Other comments:I think you accidentally reverted some changes, e.g., in the summary: 2f5be59 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think remaining are some typos and grammar issues. Besides that, we need few corrections in the GIFs:
- buffer-debloat.gif: Keep the arrows with buffer size update longer because they disappear too fast.
- checkpoint-barrier.png: It's a static image. Is that as you wanted it?
- simple_problem.gif is currently included twice in img/blog/ and second in content/img/blog. Also the label "Network/Local connection" oscillates with and without a linebreak. And the label "Waiting for network" is not at the same height for the subtasks.
@akalash Didn't you want to ultimately complete this blog post? |
Oh, it is so a good point. I surely have to find time for this. |
No description provided.