Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stream memory tracking #4067

Merged
merged 8 commits into from
Nov 27, 2024
Merged

fix: stream memory tracking #4067

merged 8 commits into from
Nov 27, 2024

Conversation

kostasrim
Copy link
Contributor

Stream memory tracking was a noop. This PR fixes that.

Resolves #4028

src/core/compact_object.cc Outdated Show resolved Hide resolved
@adiholden adiholden requested review from chakaz and removed request for adiholden November 10, 2024 08:22
Comment on lines 121 to 125
while (raxNext(&ri)) {
auto* lp = (unsigned char*)ri.data;
lpsize += zmalloc_size(lp);
}
raxStop(&ri);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure how streams work, but I guess this operation is not O(1), right?
All other types' memory usage is very quick, and I think(?) this is a requirement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope what you see here is not O(1) as it requires traversing the radix tree. Valkey approximates this with sampling.

All other types' memory usage is very quick, and I think(?) this is a requirement
Yes I agree for other types is more trivial but it's a data structure we don't control (we would need to change the radix tree to include how much memory it consumes and make this a constant operation).

I am open to suggestions on how we can improve here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, you should verify if this is indeed a requirement. I think it is, but it doesn't mean I'm right :)
Look in which flows we call this, and if it could be in hot paths etc.

If it is, then perhaps we should keep some cache for this value, and update it on the modification operations. We do something similar in Json, look for JsonMemTracker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a requirement. but this PR is not a high priority compared to other tasks (big value serialization)

Copy link
Contributor Author

@kostasrim kostasrim Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is, then perhaps we should keep some cache for this value, and update it on the modification operations. We do something similar in Json, look for JsonMemTracker.

Yep +1

it is a requirement. but this PR is not a high priority compared to other tasks (big value serialization)

+1

@@ -811,6 +810,13 @@ void CompactObj::SetJsonSize(int64_t size) {
}
}

void CompactObj::SetStreamSize(int64_t size) {
if (size < 0) {
DCHECK(static_cast<int64_t>(u_.r_obj.Size()) >= size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining this? I don't understand, and a negative size isn't intuitive to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It's similar to Json, maybe we deallocate some part of the stream (let's say because you deleted a consumer), so the net will be negative.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I realize that this function accepts a delta and not a real size. I'd strongly recommend renaming it to reflect that. Definitely the argument (call it delta), but also the name (maybe AddStreamSize()?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes or IncrementStreamSize although Increment means "increasing" and it;s not very accurate 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed increment, to me, sounds like it's ever increasing. You could Add() negative numbers though

Comment on lines 615 to 618
// If the diff is 0 it means the object use the same memory as before. No action needed.
if (diff == 0) {
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? It doesn't seem like setting the same size has any performance implications, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, just me overengineering this condition when the function below covers it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FREAKING NICE! It found a VERY subtle bug. Nice catch dude! 💯

src/server/stream_family.cc Show resolved Hide resolved
@kostasrim kostasrim requested a review from chakaz November 27, 2024 09:36
Comment on lines +32 to +34
# There is a big rss spike when this test is ran in one the gh runners (not the self hosted)
# and it fails. This rss spike is not observed locally or on our self host runner so
# this adjustment is mostly for CI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to break this down to another PR, but this to me means we still don't track all stream usage.. no?
It could be something ephemeral which makes it harder to catch locally

Copy link
Contributor Author

@kostasrim kostasrim Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this to me means we still don't track all stream usage.

I would say maybe because the RSS spike is not necessarily correlated with us not properly tracking memory but also due some pages not being properly released by the allocator. I tried MEMORY DECOMMIT but it did not have an effect. The difference in memory on both my machine and on our own gh runner is less than 50mb so based on that I would say we do track it pretty accurately. But I could be wrong as I have been many times in the past 😄

@kostasrim kostasrim merged commit 66e0fd0 into main Nov 27, 2024
9 checks passed
@kostasrim kostasrim deleted the kpr1 branch November 27, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MEMORY USAGE for STREAM keys always return (integer) 0
3 participants