Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove atomic ops in APM as updates and calculations are serialized. #753

Merged
merged 9 commits into from
Jun 30, 2017

Conversation

keshonok
Copy link
Contributor

No description provided.

keshonok added 7 commits June 18, 2017 14:02
With the change of the concept atomic ops had became unnecessary.
Please see the comment here:
#712 (comment)
Make sure the function is not called when r equals zero (the first
range the left boundary of which is fixed at 1ms).
After the counters are coalesced to the left half of the range,
zero out the right half of the range when the range is extended.
The left boundary of a range expands using a special algorithm.
When RTT value is big enough (but still within the range of
unsigned short), the next value of the left boundary may exceed
both the value of RTT and the maximum value of unsigned short.
In that case a wraparound occurs which leads to an endless loop.
The percentiles were calculated only if there were updates of stats
data. That lead to situations where percentiles were "stuck" at
some previous values even though the time frame has moved forward.
TfwPcntCtl{} has supported RTT values up to about 65 seconds.
It's possible that higher values may be seen in the wild.
This patch makes higher values acceptable.
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems well done, there are mostly questions to discuss in the chat and small adjustments.

rng->cnt[TFW_STATS_RLAST][2 * i]
+ rng->cnt[TFW_STATS_RLAST][2 * i + 1];
for (i = TFW_STATS_BCKTS / 2; i < TFW_STATS_BCKTS; ++i)
rng->cnt[TFW_STATS_RLAST][i] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix.

If cnt isn't atomic, then why not to use memset()?

Please update t/unit/user_space/percentiles.c. The reason why I ask you to update the test, is that the change is not obvious (I remember that I thought about zeroing), so it'd much easier to see a test which shows the need of zeroing.

One more question, also good to be covered by a test, is that in the loop above we increment order of the range, i.e. we can grow it 4, 8 or more times, but we coalesce only halves of the reange, which is correct for order 2 growing only. Seems like a bug, right?

*r_time = end = prend;
if (end == pc->end)
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to chop r_time value? Now end is 32-bit unsigned integer and can handle large values. Meantime, in fact, we can easily face real life scenario with response time larger than 65 seconds. Moreover, it'd be good to understand why do we have so long response time in our simple wrk workload from #752. I think it's due to too aggressive and unlimited server pipelining, but maybe I'm wrong.

unsigned long prend, cnt = 0, sum = 0, max = 0, i_max = 0;

if (unlikely(r == 0))
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never called with r = 0, so there could be BUG_ON() or just remove the check.

keshonok added 2 commits June 28, 2017 23:50
As the value of 'order' increases, the range may grow by 2, 4, 8,
etc. times. Yet just a half of the range was coalesced.

Also, this patch removes the code that was intended to deal with
an overflow of TfwPcntCtl{}->end. That's unnecessary now that it
is of type unsigned int.
Remove atomic ops. Fix all bugs that have been found so far.
@krizhanovsky
Copy link
Contributor

Good to merge

@keshonok keshonok merged commit d791708 into master Jun 30, 2017
@keshonok keshonok deleted the ab-nonatomic-apm branch June 30, 2017 09:47
@keshonok keshonok mentioned this pull request Jun 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants