-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce chunk write queue memory usage #131
Conversation
6448d4b
to
20f3746
Compare
8b4bfa2
to
7cb970a
Compare
I don't know how to make this CI pass... It seems to be complaining about stuff which I haven't changed. I don't want to fix everything it complains about as part of this PR because doing so would blow up the scope unnecessarily. |
@codesome @pracucci @bboreham if you get a chance I'd appreciate some feedback on this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality seems fine; I had some comments about naming / definition.
tsdb/chunks/chunk_write_queue.go
Outdated
@@ -121,6 +168,8 @@ func (c *chunkWriteQueue) addJob(job chunkWriteJob) (err error) { | |||
} | |||
}() | |||
|
|||
// c.isRunningMtx serializes the adding of jobs to the c.chunkRefMap, if c.jobs is full then c.addJob() will block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this statement needs to go on the definition of isRunningMtx
.
Ideally things should be named after what they do, so perhaps change the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the comment. Kept the name isRunningMtx
because it's also protecting isRunning
field.
tsdb/chunks/chunk_write_queue.go
Outdated
// freeChunkRefMap checks whether the conditions to free the chunkRefMap are met, | ||
// if so it re-initializes it and frees the memory which it currently uses. | ||
// The chunkRefMapMtx must be held when calling this method. | ||
func (c *chunkWriteQueue) freeChunkRefMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me "free" is an implementation detail; what you are trying to do is "shrink" the data structure.
(Possibly needs a note that current Go runtime never releases the internal memory used for a map, which is why you are discarding the old one and making a new one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed "free" to "shrink" everywhere, and added comment about Go runtime.
I'm fine with them being constants; suggest putting a note in the diary to review the stats after 1 week in dev and after a few weeks in prod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method looks good
I just had another thought: In the upstream PR, we thought that because addJob would block if channel is full, the map would not grow beyond. That seems to be right, but there was a subtle bug there. Is this below patch enough and we don't require complex recycling since we always need to be ready for the peak usage: diff --git a/tsdb/chunks/chunk_write_queue.go b/tsdb/chunks/chunk_write_queue.go
index 5cdd2e81f..6502c570e 100644
--- a/tsdb/chunks/chunk_write_queue.go
+++ b/tsdb/chunks/chunk_write_queue.go
@@ -128,12 +128,12 @@ func (c *chunkWriteQueue) addJob(job chunkWriteJob) (err error) {
return errors.New("queue is not started")
}
+ c.jobs <- job
+
c.chunkRefMapMtx.Lock()
c.chunkRefMap[job.ref] = job.chk
c.chunkRefMapMtx.Unlock()
- c.jobs <- job
-
return nil
} (basically moves the blocking channel add before we update the map, otherwise we just update the map and then wait for the channel now) |
I don't think we should apply that patch, for two reasons:
So if another thread calls
|
Oh right, missed the other lock, makes sense. |
I've addressed review feedback, and would like to get this merged. Can you please re-review the PR? I have a doubt about using 10 minutes timeout for shrinking the map. It seems too big to me. If map is empty, we can shrink it faster (eg. after 1 min) and pay the allocation price instead after this time. My other concern is that PR only focuses on map size, but doesn't address channel size. Allocating channel with buffer of size 1000000 will allocate all memory upfront and never release it. Since But let's see how this works in practice and decide if we need to address timeout and/or channel size based on our experience. |
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
…ed comments and threshold to 1000.
d7a75bc
to
56e3a4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems plausible.
Given the title of the PR, I would expect some data on results, i.e. memory usage measured before and after.
Signed-off-by: Mauro Stettler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up @pstibrany !
Your changes look good to me.
(I can't approve my own PR, otherwise I would)
Testing on Mimir ingesters with 7 tenants shows drop of memory allocated by |
#247 addresses memory usage of channel. |
Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.
Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.
Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.
* Update Prometheus with async chunk mapper changes. Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These result is lower memory usage by chunk mapper. Signed-off-by: Peter Štibraný <[email protected]>
Sent to upstream Prometheus as prometheus/prometheus#10873. |
This avoids wasting memory on the
c.chunkRefMap
by re-initializing it regularly. When re-initializing it, it gets initialized with a size which is half of the peak usage of the time period since the last re-init event, for this we also track the peak usage and reset it on every re-init event.Very frequent re-initialization of the map would cause unnecessary allocations, to avoid that there are two factors which limit the frequency of the re-initializations:
10
(objects inc.chunkRefMap
).When re-initializing it we initialize it to half of the peak usage since the last re-init event to try to hit the sweet spot in the trade-off between initializing it to a very low size potentially resulting in many allocations to grow it, and initializing it to a large size potentially resulting in unused allocated memory.
With this solution we have the following advantages: