-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlstats/insights: fix mem leaks on session close #128400
Conversation
I think we might want to add a memory monitor to the insights system but that work seems a bit more involved (gotta inject some contexts into some funcs) so I think I'd rather get this in first for backportability. LMK what y'all think though. |
8365aab
to
179d1cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 3 of 11 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kyle-a-wong and @xinhaoz)
-- commits
line 7 at r1:
This wording is a bit vague regarding impact. Can you provide a bit more clarity on why this pool is safe to remove? Is it possible that an idiosyncratic workload could generate an insights object for every execution? For instance, if a lot of failures accumulate, do we generate an insight for every single one?
There's a prior issue here that led to some memory pooling changes: #89918
Can we run the same benchmark and confirm that we're not making this worse?
-- commits
line 14 at r1:
nit: release note
-- commits
line 36 at r2:
nit: space
pkg/sql/sqlstats/insights/ingester.go
line 148 at r2 (raw file):
} else if e.transaction != nil { i.registry.ObserveTransaction(e.sessionID, e.transaction) } else if e.sessionID != (clusterunique.ID{}) {
There should be a comment here clarifying that this is a "synthetic" event generated in ClearSession
, otherwise it's confusing as to why this clause can be triggered.
pkg/sql/sqlstats/insights/provider.go
line 50 at r2 (raw file):
// ActiveSessions returns the IDs of all sessions that are currently being observed. // Used for testing. func (p *Provider) ActiveSessions() []clusterunique.ID {
nit: I would rename this to include a ForTesting
suffix so that it's not misused in the future.
pkg/sql/sqlstats/insights/registry.go
line 31 at r2 (raw file):
// statements can be accessed concurrently in testing. statements map[clusterunique.ID]*statementBuf
I'm not sure if the test is adequate motivation to introduce a mutex here. One of the features of the lockingRegistry
seems to be that it's accessed in a single async task and invoked in batches. This change makes that nature confusing.
Is there a way that the tests can be modified to avoid this?
Previously, dhartunian (David Hartunian) wrote…
One alternative I was thinking about is to introduce a testing knob callback that we can call when the clear session event is processed. We can manage the sync structure in the test, then and not in the locking registry directly. WDYT? |
Previously, dhartunian (David Hartunian) wrote…
The fix for that issue was applied when we generated insights objects at the statement level (it wrapped each stmt insight object). Now we generate an insight object per txn. Initial reasoning for deleting this was that we don't pool the allocation for Statement and Transaction objects, which are created more or just as frequently as this object is. The PR addressing the initial memory issue added pooling in some other areas so it could be that this specific one was not as impactful. Maybe it does not hurt to keep it around for the problematic workloads you described? I think we should be pooling the Statement and Transaction objects in the insights package if we are doing it here for this, thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)
Previously, xinhaoz (Xin Hao Zhang) wrote…
The fix for that issue was applied when we generated insights objects at the statement level (it wrapped each stmt insight object). Now we generate an insight object per txn. Initial reasoning for deleting this was that we don't pool the allocation for Statement and Transaction objects, which are created more or just as frequently as this object is. The PR addressing the initial memory issue added pooling in some other areas so it could be that this specific one was not as impactful. Maybe it does not hurt to keep it around for the problematic workloads you described? I think we should be pooling the Statement and Transaction objects in the insights package if we are doing it here for this, thoughts?
Thought about a little more and ran some ycsbA to look at the heap. I think we should pursue pooling the insight Statement and Transaction objects. Going to do that in a separate PR and keep this PR ot just clearing the object when we return it to the pool, without adding or removing things :)
pkg/sql/sqlstats/insights/registry.go
line 31 at r2 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
One alternative I was thinking about is to introduce a testing knob callback that we can call when the clear session event is processed. We can manage the sync structure in the test, then and not in the locking registry directly. WDYT?
Going to pursue this.
179d1cb
to
c469bc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)
Previously, dhartunian (David Hartunian) wrote…
nit: release note
Done
Previously, dhartunian (David Hartunian) wrote…
nit: space
Done.
pkg/sql/sqlstats/insights/ingester.go
line 148 at r2 (raw file):
Previously, dhartunian (David Hartunian) wrote…
There should be a comment here clarifying that this is a "synthetic" event generated in
ClearSession
, otherwise it's confusing as to why this clause can be triggered.
Done.
f0739a6
to
578c250
Compare
578c250
to
7ac19b6
Compare
Ensure that when we return objects into the Insight pool that we release the statmeents in the slice. This fixes an issue in the logic that returned this object to the pool which did not nil the `Statements` slice, making it possible for these slices to grow in capacity in the node's lifetime, holding onto garbage Statement objects. This is a lead up to cockroachdb#128199 which will likely remove this pool and reuse the existing statmentBuf pool. Epic: none Release note: None
7ac19b6
to
b727ee1
Compare
Add insights specific testing knobs. We'll add some knobs in later commts. Epic: none Release note: None
The insights locking registry buffers statement insight objects by session id until receiving a transaction insight, when the buffer is emptied. These buffers can leak if the session is closed midway through a transaction since the registry will never receive a transaction event for the session. This commit ensures we clear any memory allocated in insights for a session by sending an event to clear the container's session entry, if it exists, on session close. A testing knob was added, OnCloseSession, which is called when the locking registry clears a session. Epic: none Fixes: cockroachdb#128213 Release note (bug fix): Fixes a memory leak where statement insight objects may leak if the session is closed without the transaction finishing.
b727ee1
to
d8c9e2d
Compare
Move some insights testing knobs that were on the sqlstats testing knobs to insights pkg. Epic: none Release note: None
d8c9e2d
to
2da6863
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work. thanks for the separate commits.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kyle-a-wong)
TFTR! |
128400: sqlstats/insights: fix mem leaks on session close r=xinhaoz a=xinhaoz insights: ensure releasing to Insight pool clears slice Ensure that when we return objects into the Insight pool that we release the statmeents in the slice. This fixes an issue in the logic that returned this object to the pool which did not nil the `Statements` slice, making it possible for these slices to grow in capacity in the node's lifetime, holding onto garbage Statement objects. This is a lead up to cockroachdb#128199 which will likely remove this pool and reuse the existing statmentBuf pool. Epic: none Release note: None insights: add testing knobs Add insights specific testing knobs. We'll add some knobs in later commts. Epic: none Release note: None sqlstats/insights: free memory allocated per session on session close The insights locking registry buffers statement insight objects by session id until receiving a transaction insight, when the buffer is emptied. These buffers can leak if the session is closed midway through a transaction since the registry will never receive a transaction event for the session. This commit ensures we clear any memory allocated in insights for a session by sending an event to clear the container's session entry, if it exists, on session close. A testing knob was added, OnCloseSession, which is called when the locking registry clears a session. Epic: none Fixes: cockroachdb#128213 Release note (bug fix): Fixes a memory leak where statement insight objects may leak if the session is closed without the transaction finishing. insights: move insights testing knobs from sqlstats Move some insights testing knobs that were on the sqlstats testing knobs to insights pkg. Epic: none Release note: None Co-authored-by: Xin Hao Zhang <[email protected]>
In cockroachdb#128400 we nil'd the entries in the Insight.Statements slice on `releaseInsight`, which is called when evicting insights from the cache. This caused a race leading to a nil pointer dereference: The ListExecutionInsights status server grpc was making shallow copies of the insights objects from the cache. After releasing the RLock we marshal the shallow copies into the grpc response. If insights are simultaneously getting evicted during that process, they clear the shared `Statements` slice leading to a nil pointer dereference in the marhsaling code. We address this by creating a new slice for each insight when creating the grpc response. Additionally, Insights retrieved from the pool are modified so that they reuse the Statements slice that was prepped during `releaseInsight`, something it was not previously doing. Epic: none Fixes: cockroachdb#130290 Fixes: cockroachdb#129936 Release note: None
insights: ensure releasing to Insight pool clears slice
Ensure that when we return objects into the Insight pool
that we release the statmeents in the slice. This fixes
an issue in the logic that returned this object to the pool
which did not nil the
Statements
slice, making it possiblefor these slices to grow in capacity in the node's lifetime,
holding onto garbage Statement objects.
This is a lead up to #128199 which will likely remove
this pool and reuse the existing statmentBuf pool.
Epic: none
Release note: None
insights: add testing knobs
Add insights specific testing knobs. We'll add some
knobs in later commts.
Epic: none
Release note: None
sqlstats/insights: free memory allocated per session on session close
The insights locking registry buffers statement insight objects
by session id until receiving a transaction insight, when the
buffer is emptied. These buffers can leak if the session is closed
midway through a transaction since the registry will never
receive a transaction event for the session. This commit ensures
we clear any memory allocated in insights for a session by
sending an event to clear the container's session entry, if it
exists, on session close.
A testing knob was added, OnCloseSession, which is called
when the locking registry clears a session.
Epic: none
Fixes: #128213
Release note (bug fix): Fixes a memory leak where statement
insight objects may leak if the session is closed without
the transaction finishing.
insights: move insights testing knobs from sqlstats
Move some insights testing knobs that were on the
sqlstats testing knobs to insights pkg.
Epic: none
Release note: None