-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stream memory tracking #4067
Conversation
src/core/compact_object.cc
Outdated
while (raxNext(&ri)) { | ||
auto* lp = (unsigned char*)ri.data; | ||
lpsize += zmalloc_size(lp); | ||
} | ||
raxStop(&ri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure how streams work, but I guess this operation is not O(1), right?
All other types' memory usage is very quick, and I think(?) this is a requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope what you see here is not O(1) as it requires traversing the radix tree. Valkey approximates this with sampling.
All other types' memory usage is very quick, and I think(?) this is a requirement
Yes I agree for other types is more trivial but it's a data structure we don't control (we would need to change the radix tree to include how much memory it consumes and make this a constant operation).
I am open to suggestions on how we can improve here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, you should verify if this is indeed a requirement. I think it is, but it doesn't mean I'm right :)
Look in which flows we call this, and if it could be in hot paths etc.
If it is, then perhaps we should keep some cache for this value, and update it on the modification operations. We do something similar in Json, look for JsonMemTracker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a requirement. but this PR is not a high priority compared to other tasks (big value serialization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is, then perhaps we should keep some cache for this value, and update it on the modification operations. We do something similar in Json, look for JsonMemTracker.
Yep +1
it is a requirement. but this PR is not a high priority compared to other tasks (big value serialization)
+1
@@ -811,6 +810,13 @@ void CompactObj::SetJsonSize(int64_t size) { | |||
} | |||
} | |||
|
|||
void CompactObj::SetStreamSize(int64_t size) { | |||
if (size < 0) { | |||
DCHECK(static_cast<int64_t>(u_.r_obj.Size()) >= size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining this? I don't understand, and a negative size isn't intuitive to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It's similar to Json, maybe we deallocate some part of the stream (let's say because you deleted a consumer), so the net will be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I realize that this function accepts a delta and not a real size. I'd strongly recommend renaming it to reflect that. Definitely the argument (call it delta
), but also the name (maybe AddStreamSize()
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes or IncrementStreamSize
although Increment
means "increasing" and it;s not very accurate 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed increment, to me, sounds like it's ever increasing. You could Add()
negative numbers though
src/server/stream_family.cc
Outdated
// If the diff is 0 it means the object use the same memory as before. No action needed. | ||
if (diff == 0) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? It doesn't seem like setting the same size has any performance implications, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, just me overengineering this condition when the function below covers it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FREAKING NICE! It found a VERY subtle bug. Nice catch dude! 💯
# There is a big rss spike when this test is ran in one the gh runners (not the self hosted) | ||
# and it fails. This rss spike is not observed locally or on our self host runner so | ||
# this adjustment is mostly for CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to break this down to another PR, but this to me means we still don't track all stream usage.. no?
It could be something ephemeral which makes it harder to catch locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this to me means we still don't track all stream usage.
I would say maybe because the RSS spike is not necessarily correlated with us not properly tracking memory but also due some pages not being properly released by the allocator. I tried MEMORY DECOMMIT
but it did not have an effect. The difference in memory on both my machine and on our own gh runner
is less than 50mb so based on that I would say we do track it pretty accurately. But I could be wrong as I have been many times in the past 😄
Stream memory tracking was a noop. This PR fixes that.
Resolves #4028