[hdfs_jmx] An HDFS agent that uses JMX #2235

zachradtka · 2016-01-29T18:57:55Z

An agent check for HDFS that uses the JMX interface to gather HDFS metrics from individual HDFS data nodes.

The metrics collected are

hdfs.dfs_remaining                  The remaining disk space left in bytes
hdfs.storage_info                   Path to HDFS storage location
hdfs.dfs_capacity                   Disk capacity in bytes
hdfs.dfs_used                       Disk usage in bytes
hdfs.cache_capacity                 Cache capacity in bytes
hdfs.num_failed_volumes             Number of failed volumes
hdfs.last_volume_failure_date       Date the last volume failed
hdfs.estimated_capacity_lost_total  The estimated capacity lost in bytes
hdfs.num_blocks_cached              The number of blocks cached
hdfs.num_blocks_failed_to_cache     The number of blocks that failed to cache
hdfs.num_blocks_failed_to_uncache   The number of failed blocks to remove from cache

Authors:
@zachradtka
@wjsl

olivielpeau · 2016-02-01T17:01:46Z

Thanks @zachradtka and @wjsl for your contributions!

We'll review your PRs in depth soon.

zachradtka · 2016-02-02T14:49:06Z

I just pushed a quick change that added a few metrics for the HDFS namenode and split the agent-check for the datanodes and for the name node.

The metrics for the namenode are as follows.

hdfs.namenode.capacity_total                    Total disk capacity in bytes
hdfs.namenode.capacity_used                     Disk usage in bytes
hdfs.namenode.capacity_remaining                Remaining disk space left in bytes
hdfs.namenode.total_load                        Total load on the file system
hdfs.namenode.fs_lock_queue_length              Lock queue length
hdfs.namenode.blocks_total                      Total number of blocks
hdfs.namenode.max_objects                       Maximum number of files HDFS supports
hdfs.namenode.files_total                       Total number of files
hdfs.namenode.pending_replication_blocks        Number of blocks pending replication
hdfs.namenode.under_replicated_blocks           Number of under replicated blocks
hdfs.namenode.scheduled_replication_blocks      Number of blocks scheduled for replication
hdfs.namenode.pending_deletion_blocks           Number of pending deletion blocks
hdfs.namenode.num_live_data_nodes               Total number of live data nodes
hdfs.namenode.num_dead_data_nodes               Total number of dead data nodes
hdfs.namenode.num_decom_live_data_nodes         Number of decommissioning live data nodes
hdfs.namenode.num_decom_dead_data_nodes         Number of decommissioning dead data nodes
hdfs.namenode.volume_failures_total             Total volume failures
hdfs.namenode.estimated_capacity_lost_total     Estimated capacity lost in bytes
hdfs.namenode.num_decommissioning_data_nodes    Number of decommissioning data nodes
hdfs.namenode.num_stale_data_nodes              Number of stale data nodes
hdfs.namenode.num_stale_storages                Number of stale storages

Sorry for the late add, but I really felt these metrics would be helpful.

olivielpeau · 2016-02-02T18:31:53Z

tests/checks/mock/test_hdfs_datanode.py

+        return MockResponse(body, 200)
+
+class HDFSDataNode(AgentCheckTest):
+    CHECK_NAME = 'hdfs_jmx'


Don't forget to update CHECK_NAME to hdfs_datanode ;)

olivielpeau · 2016-02-04T21:18:07Z

Added a bunch of comments, most of them are nitpicks. The checks looks good overall, thanks!

I've only added comments on the namenode check, but since the datanode check is pretty similar could you also take them into account for the datanode check?

One last thing: we try to keep our metrics' and service checks' prefixes similar to the checks' names, so could you rename all the metrics and service checks to hdfs_namenode.[...] and hdfs_datanode respectively?

Thanks again!

zachradtka · 2016-02-05T02:28:15Z

Thanks for the comments, I completed all of them on both the datanode and namenode agent checks.

olivielpeau · 2016-02-05T15:31:14Z

checks.d/hdfs_datanode.py

+
+        # Add query_params as arguments
+        if query_params:
+            query = '&'.join(['{}={}'.format(key, value) for key, value in query_params.iteritems()])


We support python 2.6 so you have to use '{0}={1}'.format(key, value) here (i.e. number the fields)

olivielpeau · 2016-02-05T16:24:26Z

Thanks for addressing my comments! I've added in a few comments and a question, once they're addressed the check should be good to go.

zachradtka · 2016-02-05T21:06:17Z

All comments are always welcome!

I have fixed all of the concerns for both the datanode and namenode agents. Let me know what to do next.

olivielpeau · 2016-02-05T22:23:50Z

Added one comment, once it's addressed could you squash your commits into one?

Thanks!

zachradtka · 2016-02-05T22:56:37Z

OK, All commits squashed and rebased to the latest master.

olivielpeau · 2016-02-05T23:45:54Z

Thanks again!

Looks good, I'll merge once the CI passes.

[hdfs_jmx] An HDFS agent that uses JMX

olivielpeau added checks new integration community labels Feb 1, 2016

olivielpeau added this to the Triage milestone Feb 1, 2016

olivielpeau reviewed Feb 2, 2016
View reviewed changes

olivielpeau reviewed Feb 5, 2016
View reviewed changes

[hdfs_jmx] An HDFS agent that uses JMX

8a05e2f

zachradtka force-pushed the minerkasch/hdfs-jmx branch from 5ee743f to 8a05e2f Compare February 5, 2016 22:56

olivielpeau modified the milestones: 5.7.0, Triage Feb 5, 2016

olivielpeau added a commit that referenced this pull request Feb 8, 2016

Merge pull request #2235 from MinerKasch/minerkasch/hdfs-jmx

7f95b83

[hdfs_jmx] An HDFS agent that uses JMX

olivielpeau merged commit 7f95b83 into DataDog:master Feb 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hdfs_jmx] An HDFS agent that uses JMX #2235

[hdfs_jmx] An HDFS agent that uses JMX #2235

zachradtka commented Jan 29, 2016

olivielpeau commented Feb 1, 2016

zachradtka commented Feb 2, 2016

olivielpeau Feb 2, 2016

olivielpeau commented Feb 4, 2016

zachradtka commented Feb 5, 2016

olivielpeau Feb 5, 2016

olivielpeau commented Feb 5, 2016

zachradtka commented Feb 5, 2016

olivielpeau commented Feb 5, 2016

zachradtka commented Feb 5, 2016

olivielpeau commented Feb 5, 2016

[hdfs_jmx] An HDFS agent that uses JMX #2235

[hdfs_jmx] An HDFS agent that uses JMX #2235

Conversation

zachradtka commented Jan 29, 2016

olivielpeau commented Feb 1, 2016

zachradtka commented Feb 2, 2016

olivielpeau Feb 2, 2016

Choose a reason for hiding this comment

olivielpeau commented Feb 4, 2016

zachradtka commented Feb 5, 2016

olivielpeau Feb 5, 2016

Choose a reason for hiding this comment

olivielpeau commented Feb 5, 2016

zachradtka commented Feb 5, 2016

olivielpeau commented Feb 5, 2016

zachradtka commented Feb 5, 2016

olivielpeau commented Feb 5, 2016