-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics to track time compaction jobs are queued #4980
Conversation
This adds 2 new groups of stats to track information about queued compaction jobs. The first stat is a timer that keeps track of when jobs are being polled and give information on how often/fast jobs are exiting the queue. The second group of stats is a min/max/avg and is tracking age information about how long jobs are waiting on the queue. This closes apache#4945
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we record a time of 0 if there's a job immediately available in both poll()?
I do not understand this question.
Maybe we just record a time of 0 for async when a future is completed?
I think that would be a good thing to do when future is successfully completed. That will cause the avg and count associated with the timer to reflect what is actually happening.
...r/src/main/java/org/apache/accumulo/manager/compaction/queue/CompactionJobPriorityQueue.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/accumulo/manager/compaction/queue/CompactionJobPriorityQueue.java
Show resolved
Hide resolved
...r/src/main/java/org/apache/accumulo/manager/compaction/queue/CompactionJobPriorityQueue.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/accumulo/manager/compaction/queue/CompactionJobPriorityQueue.java
Outdated
Show resolved
Hide resolved
Oops, I think that was supposed to say "Should we record a time of 0 if there's a job immediately in poll()" but that doesn't make sense either because we are using the jobAges map to record the delay so you can just ignore that question. |
The best way I know to test these types of things is to add the ability to inject a time source. That is somewhat doable in unit test, but not realistically achievable in the ITs. |
@keith-turner - I made a few changes and I think this is ready for review again:
|
One thing to note as I was working on this was I started to wonder if it made sense to get rid of the jobAges map and just move the Timer into the TabletJobs object as it's always a 1 to 1 relationship that we have to keep in sync. This would make managing it easier, but the primary downside is being unable to iterate over the timers to collect the current stats without locking. Right now with the separate map it's a concurrent hashmap so we can iterate and collect data as a snapshot in the memoized field and I think that makes it worthwhile to keep separate, even if it's more work to make things correct. |
This adds 2 new groups of stats to track information about queued compaction jobs. The first stat is a timer that keeps track of when jobs are being polled and give information on how often/fast jobs are exiting the queue. The second group of stats is a min/max/avg and is tracking age information about how long jobs are waiting on the queue.
This closes #4945
This is a draft for now because there is a couple outstanding things to do and it probably still needs a little bit of polishing but I wanted to post what I had to get feedback. The tests could also be improved a bit I think. I was trying to think of a good way to check that the stats are correct but the values are going to be non-deterministic as it's all timing in based so it would be hard to check exact values.
Todo/questions: