Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mongo] Add WiredTiger metrics #1825

Closed
wants to merge 1 commit into from
Closed

Conversation

benmccann
Copy link
Contributor

This adds new metrics from MongoDB 3.0 that aren't already included in #1798

@benmccann benmccann closed this Aug 11, 2015
@benmccann benmccann reopened this Aug 11, 2015
@benmccann benmccann force-pushed the wiredtiger branch 3 times, most recently from 2296edd to 5b04379 Compare August 12, 2015 18:28
@benmccann benmccann changed the title MongoDB: Add WiredTiger metrics [mongo] Add WiredTiger metrics Aug 12, 2015
@yannmh
Copy link
Member

yannmh commented Aug 12, 2015

It looks great, thanks a lot @benmccann !

We froze the 5.5.0 agent state in order to focus on polishing the release, and fixing the last bugs. As you are introducing a lot of new metrics, we think it'd be more reasonable to schedule it for the 5.6.0 agent release. We'll likely add more tests, in the meantime, to assert that all metrics are properly collected.

Of course, you can still run this check as a custom one to replace the default, until everything is released.

@yannmh yannmh added this to the 5.6.0 milestone Aug 12, 2015
@yannmh yannmh self-assigned this Aug 12, 2015
@benmccann
Copy link
Contributor Author

Happy to help. I understand waiting, though I certainly wouldn't mind getting this in 5.5.0 either, so if there's anything I can do as far as testing goes or anything else that would give you confidence in the change to get it in now, I'd be happy to do so. It doesn't seem particular high risk compared to other changed going in, so happy to help vet it. I've already deployed the change to our own servers and it's working well there

@benmccann
Copy link
Contributor Author

I'm so in love with Datadog right now. Just got weeks of persistent outages figured out due to the data we gathered with it. Some of the data in this PR was a big part of that, so really hoping we can get it in :-) How long do you have between releases typically?

@yannmh
Copy link
Member

yannmh commented Aug 14, 2015

Thank you so much for your support @benmccann, it's much appreciated. I'll transmit your words to the team 😃

We typically try to keep a 6-7 weeks release schedule. We mainly focus on our new features during the first 3-4 weeks, then 'freeze' the release to polish and bring bugfixes.

As your changes are adding a lot of metrics, we think it's' more reasonable to postpone it to the 5.6.0 agent release. It'd give us some time to assess the changes.
In particular, do you think we should add all of these new metrics by default ? Right now the check covers most common usage. I am worried that adding 150+ metrics could be confusing.
A first alternative would be to have some options in the YAML configuration file to 'enable' different levels of metrics. We could go even further and have 'custom metric' section, so everyone would be able to add the desired metrics .

@irabinovitch any thoughts ?

Best,

@benmccann
Copy link
Contributor Author

Ok, I'll trim this down to the most useful. Posting the whole list here in case we want to revisit in the future

    "wiredTiger.LSM.sleep for LSM checkpoint throttle",
    "wiredTiger.LSM.sleep for LSM merge throttle",
    "wiredTiger.LSM.rows merged in an LSM tree",
    "wiredTiger.LSM.application work units currently queued",
    "wiredTiger.LSM.merge work units currently queued",
    "wiredTiger.LSM.tree queue hit maximum",
    "wiredTiger.LSM.switch work units currently queued",
    "wiredTiger.LSM.tree maintenance operations scheduled",
    "wiredTiger.LSM.tree maintenance operations discarded",
    "wiredTiger.LSM.tree maintenance operations executed",
    "wiredTiger.async.number of allocation state races",
    "wiredTiger.async.number of operation slots viewed for allocation",
    "wiredTiger.async.current work queue length",
    "wiredTiger.async.number of flush calls",
    "wiredTiger.async.number of times operation allocation failed",
    "wiredTiger.async.maximum work queue length",
    "wiredTiger.async.number of times worker found no work",
    "wiredTiger.async.total allocations",
    "wiredTiger.async.total compact calls",
    "wiredTiger.async.total insert calls",
    "wiredTiger.async.total remove calls",
    "wiredTiger.async.total search calls",
    "wiredTiger.async.total update calls",
    "wiredTiger.block-manager.mapped bytes read",
    "wiredTiger.block-manager.bytes read",
    "wiredTiger.block-manager.bytes written",
    "wiredTiger.block-manager.mapped blocks read",
    "wiredTiger.block-manager.blocks pre-loaded",
    "wiredTiger.block-manager.blocks read",
    "wiredTiger.block-manager.blocks written",
    "wiredTiger.cache.tracked dirty bytes in the cache",
    "wiredTiger.cache.tracked bytes belonging to internal pages in the cache",
    "wiredTiger.cache.bytes currently in the cache",
    "wiredTiger.cache.tracked bytes belonging to leaf pages in the cache",
    "wiredTiger.cache.maximum bytes configured",
    "wiredTiger.cache.tracked bytes belonging to overflow pages in the cache",
    "wiredTiger.cache.bytes read into cache",
    "wiredTiger.cache.bytes written from cache",
    "wiredTiger.cache.pages evicted by application threads",
    "wiredTiger.cache.checkpoint blocked page eviction",
    "wiredTiger.cache.unmodified pages evicted",
    "wiredTiger.cache.page split during eviction deepened the tree",
    "wiredTiger.cache.modified pages evicted",
    "wiredTiger.cache.pages selected for eviction unable to be evicted",
    "wiredTiger.cache.pages evicted because they exceeded the in-memory maximum",
    "wiredTiger.cache.pages evicted because they had chains of deleted items",
    "wiredTiger.cache.failed eviction of pages that exceeded the in-memory maximum",
    "wiredTiger.cache.hazard pointer blocked page eviction",
    "wiredTiger.cache.internal pages evicted",
    "wiredTiger.cache.maximum page size at eviction",
    "wiredTiger.cache.eviction server candidate queue empty when topping up",
    "wiredTiger.cache.eviction server candidate queue not empty when topping up",
    "wiredTiger.cache.eviction server evicting pages",
    "wiredTiger.cache.eviction server populating queue, but not evicting pages",
    "wiredTiger.cache.eviction server unable to reach eviction goal",
    "wiredTiger.cache.pages split during eviction",
    "wiredTiger.cache.pages walked for eviction",
    "wiredTiger.cache.eviction worker thread evicting pages",
    "wiredTiger.cache.in-memory page splits",
    "wiredTiger.cache.percentage overhead",
    "wiredTiger.cache.tracked dirty pages in the cache",
    "wiredTiger.cache.pages currently held in the cache",
    "wiredTiger.cache.pages read into cache",
    "wiredTiger.cache.pages written from cache",
    "wiredTiger.connection.pthread mutex condition wait calls",
    "wiredTiger.connection.files currently open",
    "wiredTiger.connection.memory allocations",
    "wiredTiger.connection.memory frees",
    "wiredTiger.connection.memory re-allocations",
    "wiredTiger.connection.total read I/Os",
    "wiredTiger.connection.pthread mutex shared lock read-lock calls",
    "wiredTiger.connection.pthread mutex shared lock write-lock calls",
    "wiredTiger.connection.total write I/Os",
    "wiredTiger.cursor.cursor create calls",
    "wiredTiger.cursor.cursor insert calls",
    "wiredTiger.cursor.cursor next calls",
    "wiredTiger.cursor.cursor prev calls",
    "wiredTiger.cursor.cursor remove calls",
    "wiredTiger.cursor.cursor reset calls",
    "wiredTiger.cursor.cursor search calls",
    "wiredTiger.cursor.cursor search near calls",
    "wiredTiger.cursor.cursor update calls",
    "wiredTiger.data-handle.connection dhandles swept",
    "wiredTiger.data-handle.connection candidate referenced",
    "wiredTiger.data-handle.connection sweeps",
    "wiredTiger.data-handle.connection time-of-death sets",
    "wiredTiger.data-handle.session dhandles swept",
    "wiredTiger.data-handle.session sweep attempts",
    "wiredTiger.log.log buffer size increases",
    "wiredTiger.log.total log buffer size",
    "wiredTiger.log.log bytes of payload data",
    "wiredTiger.log.log bytes written",
    "wiredTiger.log.yields waiting for previous log file close",
    "wiredTiger.log.total size of compressed records",
    "wiredTiger.log.total in-memory size of compressed records",
    "wiredTiger.log.log records too small to compress",
    "wiredTiger.log.log records not compressed",
    "wiredTiger.log.log records compressed",
    "wiredTiger.log.maximum log file size",
    "wiredTiger.log.pre-allocated log files prepared",
    "wiredTiger.log.number of pre-allocated log files to create",
    "wiredTiger.log.pre-allocated log files used",
    "wiredTiger.log.log read operations",
    "wiredTiger.log.log release advances write LSN",
    "wiredTiger.log.records processed by log scan",
    "wiredTiger.log.log scan records requiring two reads",
    "wiredTiger.log.log scan operations",
    "wiredTiger.log.consolidated slot closures",
    "wiredTiger.log.logging bytes consolidated",
    "wiredTiger.log.consolidated slot joins",
    "wiredTiger.log.consolidated slot join races",
    "wiredTiger.log.slots selected for switching that were unavailable",
    "wiredTiger.log.record size exceeded maximum",
    "wiredTiger.log.failed to find a slot large enough for record",
    "wiredTiger.log.consolidated slot join transitions",
    "wiredTiger.log.log sync operations",
    "wiredTiger.log.log sync_dir operations",
    "wiredTiger.log.log server thread advances write LSN",
    "wiredTiger.log.log write operations",
    "wiredTiger.reconciliation.page reconciliation calls",
    "wiredTiger.reconciliation.page reconciliation calls for eviction",
    "wiredTiger.reconciliation.split bytes currently awaiting free",
    "wiredTiger.reconciliation.split objects currently awaiting free",
    "wiredTiger.session.open cursor count",
    "wiredTiger.session.open session count",
    "wiredTiger.thread-yield.page acquire busy blocked",
    "wiredTiger.thread-yield.page acquire eviction blocked",
    "wiredTiger.thread-yield.page acquire locked blocked",
    "wiredTiger.thread-yield.page acquire read blocked",
    "wiredTiger.thread-yield.page acquire time sleeping (usecs)",
    "wiredTiger.transaction.transaction begins",
    "wiredTiger.transaction.transaction checkpoints",
    "wiredTiger.transaction.transaction checkpoint generation",
    "wiredTiger.transaction.transaction checkpoint currently running",
    "wiredTiger.transaction.transaction checkpoint max time (msecs)",
    "wiredTiger.transaction.transaction checkpoint min time (msecs)",
    "wiredTiger.transaction.transaction checkpoint most recent time (msecs)",
    "wiredTiger.transaction.transaction checkpoint total time (msecs)",
    "wiredTiger.transaction.transactions committed",
    "wiredTiger.transaction.transaction failures due to cache overflow",
    "wiredTiger.transaction.transaction range of IDs currently pinned by a checkpoint",
    "wiredTiger.transaction.transaction range of IDs currently pinned",
    "wiredTiger.transaction.transactions rolled back",
    "wiredTiger.concurrentTransactions.write.out",
    "wiredTiger.concurrentTransactions.write.available",
    "wiredTiger.concurrentTransactions.write.totalTickets",
    "wiredTiger.concurrentTransactions.read.out",
    "wiredTiger.concurrentTransactions.read.available",
    "wiredTiger.concurrentTransactions.read.totalTickets",

@benmccann
Copy link
Contributor Author

@yannmh I toned this down a lot and only added the really useful ones. Many of the others seem to be always 0 or monotonically increasing counters in the form of # of times operation x has been done since the server was started. If Datadog can transform that such that each minute it looks at what the increase is from the last datapoint into operations per minute then that may be more useful. But as it stands now most of them don't provide value in this form.

@benmccann benmccann force-pushed the wiredtiger branch 3 times, most recently from 780e286 to 7cc307d Compare August 20, 2015 18:22
@benmccann
Copy link
Contributor Author

Any thoughts about getting this in now that 5.5 has been released? I'd like to get it in while it's still somewhat fresh on my mind

@yannmh
Copy link
Member

yannmh commented Oct 16, 2015

I rebased your work to sync with #1979. Please let me know what you think about it 😄
I opened a new #1980 PR.

@yannmh yannmh closed this Oct 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants