Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat] Raid metricset with expanded disk states, using /sys/block #11613

Merged
merged 16 commits into from
Apr 23, 2019

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Apr 2, 2019

I suggest y'all read #11292 first, to catch up on the conversation around expanding the RAID metricset data, as well as #5600, the original issue around expanding the data collection for RAID. This PR is meant as a sort of comparison to #11292, and addresses some issues that the earlier PR doesn't.

Summary: our current implementation of RAID metricset gathering uses /proc/mdstat which has a number of limitations:

  • this is only available on linux. The procfs compat layer for FreeBSD does not include mdstat.
  • This is meant as a human-readable interface, meaning it is fairly unstable (I found a number of different reporting forms between different /proc/mdstat samples, presumably from different kernel versions) and not well documented as far as machine readability. Also, lots of regex. Ew.
  • Compared to other interfaces, the data we can get from it is limited.

My previous PR (#1192) uses ioctl to gather expanded disk metrics, in line with the request made in #5600. While this works, it adds an extra dependency, and creating a mock test harness around ioctl is rather ugly.

In addition to expanding the disk metrics we get, this implementation uses /sys/block and only /sys/block. No mdstat. This means we have one dependency that can be easily mock-tested using regular files. It also means we trade regex rules for lots of file operations and directory traversal. It also provides us with a much richer source of data compared to mdstat.

Unlike the previous PR, I wrote this with a longing glace towards cross-compatibility. Like our current implementation, this is linux-only. However, I did try to address a complicated problem, which is how different OSes report RAID status. This is the primary reason why this is a draft PR, as I'd like input from others with regards to how we do this.

Right now, we report two disk states: total and available. This PR adds failed and spare. However, the sysfs subsystem provides a number of different statuses that disks can have:

  • faulty
  • in_sync
  • write mostly
  • blocked
  • spare
  • write_error
  • want_replacement
  • replacement

You can read more about the states here.

FreeBSD has a number of states, mostly different from linux (usual caveat, I'm not a FreeBSD expert)
- NODISK
- NONE
- NEW
- SYNCHRONIZING
- DISCONNECTED
- OFFLINE
- DISABLED
- FAILED
- STALE_FAILED
- SPARE
- STALE
- ACTIVE
- DESTROY
- INVALID

(see /sys/geom on freebsd)

I spent a lot of time thinking about how we want to handle this. My current solution (which I'm not married to, by any means) is to "standardize" them into active, total, spare, and failed, and provide an array with the original states for the sake of data preservation. This is somewhat problematic, as it requires some editorializing as to what constitutes "active" and "failed" and so on. It can also be problematic if kernel updates add new statuses. (Do we assume an unknown status is good or bad?)

The other solution is to keep the strings the same, and just reduce them, so {"in_sync","in_sync","spare"} becomes {"in_sync":2, "spare":1} in the final mapping. This has some implications for mapping, and could make alerting/visualizing harder. On the other hand, it's completely un-opinionated. This doesn't take into account MacOS and Windows, where I have no idea how RAID status is reported. Although it's not included in this initial commit, adding FreeBSD support to this shouldn't be too difficult.

Some other things to take note of:

  • activity_state was turned into status and the string status codes are different from mdstat.
  • Aside from this, I did try to make sure we where backwards-compatible with how the mdstat metricset worked.
  • This doesn't need to use only sysfs. We could use /sys/block for expanded disk states, and continue to use mdstat for backwards compatibility or other things. I have no problems with this.
  • There's a lot more data we could mine from /sys/block if we wanted. Error counters, and other things. Check out the above docs link.
  • We added a sync_action field, which further expands on the data gathered.
  • I added a little mock sysfs filesystem to our testdata. In real life, I've tested this extensively on my home lab, which has an x86 node with raid0 and raid1 arrays.

@ruflin
Copy link
Collaborator

ruflin commented Apr 3, 2019

Thanks for the detailed PR description, much appreciated.

I like the idea of having the standardised fields + the originals. For the data structure I prefer your suggestion with "states": {"in_sync":2, "spare":1} over repeating the entries, this will make querying easier.

We then ca have something like "raw_states": {"foo":2, "bar":7}. Template wise we can solve it with dynamic templates as we know all the fields inside will be integers (correct?).

For our visualisations I would only use the standardised states and if we have a state that we don't know, we count it as unknown.

@fearful-symmetry
Copy link
Contributor Author

@ruflin So you're saying we only use the standardized states inside visualizations, and all reporting is just the 'raw' states we get from the subsystems?

All the values should be integers yah. Don't think I'm familiar with dynamic templates.

@ruflin
Copy link
Collaborator

ruflin commented Apr 3, 2019

I think we should also report the standardised states in the event itself and these are the ones we use for the visualisation, but the same doc will also contain the raw states.

For dynamic templates, have a look here: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html And for an example in Beats: https://github.com/elastic/beats/blob/master/metricbeat/module/docker/cpu/_meta/fields.yml#L39

@fearful-symmetry
Copy link
Contributor Author

Ahh, alright.

So we have standardized states, then keep the raw_states the same, but move it to a dict. If we're keeping the original status strings in a dict, do we want to keep our current standardized states (working, total, failed, spare) as-is?

@ruflin
Copy link
Collaborator

ruflin commented Apr 4, 2019

Yes, assuming what you mean by dict is looking something like this:

{
  ...
  "states": {
    "in_sync":2,
    "spare":1
  }
  "raw_states": {
    "foo":2, 
    "bar":7
  }
  ...
}

To make it a non breaking change, we could keep states on the level it is at the moment, this would probably be a bit more user friendly even though not 100% clean.

Do the current state names match with what you have in mind for standardisation?

@fearful-symmetry
Copy link
Contributor Author

@ruflin The current standardized names are total, working, failed or spare and I'm hoping that states reported by other OSes can be crammed into one of those categories.

@fearful-symmetry
Copy link
Contributor Author

Part of me still wants to forgo the standardized fields and just use raw names (maybe keeping the total/active fields), particularly as we're now leaning towards just having two maps reporting nearly the same data.

@ruflin
Copy link
Collaborator

ruflin commented Apr 4, 2019

In any case for the existing platforms we need to keep the old names around for a bit as otherwise it would be a breaking change. We could try for the additional platforms to only use the raw values. The problem I see with that is that if we want to build a dashboard, we need to build one for each platform?

@fearful-symmetry
Copy link
Contributor Author

Right now our only existing disk states are total and working, and keeping those around should be fine. But yah, I see your point, this makes a handful of other things harder.

@ruflin
Copy link
Collaborator

ruflin commented Apr 4, 2019

One more idea: We standardise but make the raw states opt-in through a config flag. Like this the base even stays simple.

@fearful-symmetry
Copy link
Contributor Author

That could work!

The thing I'm more worried about is the standardized fields. I'm having trouble finding docs for FreeBSD's geom raid states, and some other platforms might be just as bad. We also probably won't find out about new states in kernel updates until someone files an issue. Considering the other things we have going on across the systems module, that's probably not too bad though?

@fearful-symmetry
Copy link
Contributor Author

I suppose that could also be a "cross that bridge when we get to it" problem, as right now this is still linux-only.

@fearful-symmetry fearful-symmetry added the Team:Integrations Label for the Integrations team label Apr 4, 2019
@ruflin
Copy link
Collaborator

ruflin commented Apr 5, 2019

+1 on "cross that bridge when we get there" and apply the 80/20 rule :-)

@fearful-symmetry
Copy link
Contributor Author

@ruflin the latest commit I put in last night moves the raw states to a disk, the end data now looks like this:

   "system": {
     "raid": {
       "status": "clean",
       "disks": {
         "active": 2,
         "total": 3,
         "spare": 1,
         "failed": 0,
         "states": {
           "in_sync": 2,
           "spare": 1
         }
       },
       "blocks": {
         "synced": 4189184,
         "total": 4189184
       },
       "sync_action": "idle",
       "name": "md0"
     }
   }

@ruflin
Copy link
Collaborator

ruflin commented Apr 8, 2019

@fearful-symmetry What you have in the above comment LGTM.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like we can get all the info from a single point! This is looking good. I have added some comments. I'd be fine with continuing with this idea, and we can leave support for other implementations for the future.

// Fetch fetches one event for each device
func (m *MetricSet) Fetch(r mb.ReporterV2) {
stats, err := m.fs.ParseMDStat()
devices, err := blockinfo.ListAllMDDevices(m.sysfs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas to make this more independant of the OS and the raid subsystem:

  • Make blockinfo an object that implements an interface
  • This object lists directly the objects, so nothing like GetMDDevice is needed later

So at the end you can have something like:

	devices, err := m.blockinfo.ListAll()
	...
	for _, device := range devices {
	    disks := device.Disks()
	    ...
	}

Also take into account that even on Linux there can be different RAID implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make blockinfo an object that implements an interface

I actually had this idea fairly early on, but ended up not implementing it for the sake of getting a working PoC down. I should do that.

@jsoriano
Copy link
Member

jsoriano commented Apr 8, 2019

The current standardized names are total, working, failed or spare and I'm hoping that states reported by other OSes can be crammed into one of those categories.

+1 to these categories, I am only missing something for synced/synchronizing, can we get this per disk in Linux? are disks being synchronized included in working?

@fearful-symmetry
Copy link
Contributor Author

I am only missing something for synced/synchronizing, can we get this per disk in Linux? are disks being synchronized included in working?

I believe so, I can't find anything suggesting an explicit resync state. If you force a repair on the array, the disks will stay in_sync unless something goes wrong.

@fearful-symmetry
Copy link
Contributor Author

Just pushed a commit that fixes most of the issues Jaime mentioned. Will implement this next:

Make blockinfo an object that implements an interface
This object lists directly the objects, so nothing like GetMDDevice is needed later

@fearful-symmetry
Copy link
Contributor Author

Just pushed another commit to simply the API to actually read from sysfs, so now we have a single array that's returned from a ListAll call.

I'm not sure if there's a way to further 'reduce' this for the sake of cross-compatibility. Most of the cross-compatible system metricsets end up by populating a struct, and we have a fairly large amount of fields that need to eventually make it's way to the metricset.

@fearful-symmetry
Copy link
Contributor Author

Okay, just made another commit that simplifies the whole interface into /sys/block

Looking at how other parts of the system module do cross-compatibility, it looks like gosigur declares all its structs in a global file, with method implementations in per-target files. This is a step towards that, where we have two files that are just the linux implementation, and a global file that declares that structs.

@fearful-symmetry
Copy link
Contributor Author

jenkins, test this

@ruflin
Copy link
Collaborator

ruflin commented Apr 15, 2019

@fearful-symmetry I think it's time to get this out of a "Draft" ;-)

@fearful-symmetry fearful-symmetry marked this pull request as ready for review April 15, 2019 13:09
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner April 15, 2019 13:09
@fearful-symmetry fearful-symmetry requested a review from a team April 15, 2019 13:10
@fearful-symmetry
Copy link
Contributor Author

Finally added an entry to the changelog.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that the unknown count was still around, and a small detail on the docs. For the rest it LGTM.

@fearful-symmetry
Copy link
Contributor Author

jenkins, test this

Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@fearful-symmetry fearful-symmetry merged commit 3d95769 into elastic:master Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants