Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added function to return pretty names on "system" module "process summary" metricset #8275

Merged
merged 4 commits into from
Sep 14, 2018

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Sep 10, 2018

Following this discuss https://discuss.elastic.co/t/errors-in-metric-beat-log/147918/3 It's true that I found that the error message was a bit cryptic.

According to this http://thelinuxstuff.blogspot.com/2012/08/process-state-codes-in-ps-output.html the state 88, which corresponds to X state in a linux process corresponds to dead state.

I thought it could be nice to have a bit more descriptive error message in such case to help user debugging his own problems:

So, in the case of the discuss issue where the message had this format:

Unknown state <88> for process with pid 30794

It will become this:

Unknown or unexpected state <X> (dead) for process with pid 30794

BTW, I couldn't manage to compile any test, system was complaining about some redefined variables:

GOROOT=/usr/lib/go #gosetup
GOPATH=/home/caster/go #gosetup
/usr/lib/go/bin/go test -c -i -tags "darwin freebsd linux windows" -o /tmp/___TestFetch_in_github_com_elastic_beats_metricbeat_module_system_process_summary github.com/elastic/beats/metricbeat/module/system/process_summary #gosetup
# runtime
/usr/lib/go/src/runtime/lock_sema.go:30:20: active_spin redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:30:20
/usr/lib/go/src/runtime/lock_sema.go:31:20: active_spin_cnt redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:31:20
/usr/lib/go/src/runtime/lock_sema.go:32:20: passive_spin redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:32:20
/usr/lib/go/src/runtime/lock_sema.go:35:14: lock redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:46:14
/usr/lib/go/src/runtime/lock_sema.go:94:16: unlock redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:106:16
/usr/lib/go/src/runtime/lock_sema.go:124:19: noteclear redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:126:19
/usr/lib/go/src/runtime/lock_sema.go:128:20: notewakeup redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:130:20
/usr/lib/go/src/runtime/lock_sema.go:151:19: notesleep redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:139:19
/usr/lib/go/src/runtime/lock_sema.go:180:68: notetsleep_internal redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:164:45
/usr/lib/go/src/runtime/lock_sema.go:263:36: notetsleep redeclared in this block
	previous declaration at /usr/lib/go/src/runtime/lock_futex.go:210:36
/usr/lib/go/src/runtime/lock_sema.go:263:36: too many errors

@jsoriano
Copy link
Member

If it is possible to have "dead" processes, as it seems by the discuss thread and this stackexchange comment, I'd propose to count or ignore them and in any case don't report them as unexpected state to don't flood logs with something that is not really an error.

@@ -96,7 +96,7 @@ func (m *MetricSet) Fetch() (common.MapStr, error) {
case 'Z':
summary.zombie++
default:
logp.Err("Unknown state <%v> for process with pid %d", state.State, pid)
logp.Err("Unknown or unexpected state <%c> (%s) for process with pid %d", state.State, prettyNameOf(state.State), pid)
Copy link
Member

@jsoriano jsoriano Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the change to print the character value, this is much better than a random number 😄 But I'd keep this error log line only for unknown values, and not for not-so-expected states as could be the dead one.

@jsoriano
Copy link
Member

Regarding the error on tests, how are you running them? this seems to be adding all platforms as tags, what will include the implementations for all platforms, this is why you see redefined declarations. You should be able to run the tests for this metricset with

go test ./metricbeat/module/system/process_summary/...

@sayden
Copy link
Contributor Author

sayden commented Sep 11, 2018

Found the solution to the test error (in the provided log you can see that it's trying to compile it to all platforms, I just needed to only set the platform linux). It was Goland that sets a lot of flags when running tests through the UI. Running them the easy way through the command line works fine.

Ok, so:

  • We stay with the character instead of the ASCII number.
  • I wouldn't hide the log. I understand that it is a hassle to flood the logs with this messages but it also helps users with debugging.
  • dead is, currently, an unexpected process. I wouldn't like to hide it (means, to say that it's unknown). Instead I'd prefer to count it too and report it but this causes a change in the common.MapStr schema that I don't know if it can affect somewhere (I guess it won't but I prefer to be cautious)

I still have to dig a bit more in the jenkins error, I'm not sure about what's the problem

@jsoriano
Copy link
Member

jsoriano commented Sep 11, 2018

I wouldn't hide the log. I understand that it is a hassle to flood the logs with this messages but it also helps users with debugging.

I meant to hide the log for the case of dead, that is a possible known case, for unknown cases we should continue logging them. My concern is that we are logging as error something that is a possible value.

dead is, currently, an unexpected process. I wouldn't like to hide it (means, to say that it's unknown). Instead I'd prefer to count it too and report it but this causes a change in the common.MapStr schema that I don't know if it can affect somewhere (I guess it won't but I prefer to be cautious)

Yes, this was one of my proposals, counting them as any other state, but in any case a process is this state is going to disappear very soon, so I find it good too to just ignore them (that is not the same as unknown, because I agree that unknown states should be logged).

Regarding changes in the fields, adding them don't break compatibility.

@@ -109,6 +112,7 @@ func (m *MetricSet) Fetch() (common.MapStr, error) {
"stopped": summary.stopped,
"zombie": summary.zombie,
"unknown": summary.unknown,
"dead": summary.dead,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to add this new field to metricbeat/module/system/process_summary/_meta/fields.yml, after adding it you will also need to run make update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e4c9bab8e90c4151813763c544210eb2a08f491e

@@ -109,6 +112,7 @@ func (m *MetricSet) Fetch() (common.MapStr, error) {
"stopped": summary.stopped,
"zombie": summary.zombie,
"unknown": summary.unknown,
"dead": summary.dead,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add also a changelog entry regarding this new field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e4c9bab8e90c4151813763c544210eb2a08f491e

@@ -119,6 +119,7 @@ https://github.com/elastic/beats/compare/v6.4.0...master[Check the HEAD diff]
- Add experimental socket summary metricset to system module {pull}6782[6782]
- Increase ignore_above for system.process.cmdline to 2048. {pull}8101[8100]
- Add support to renamed fields planned for redis 5.0. {pull}8167[8167]
- Added 'died' PID state to process_system metricset on system module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also a reference to the pull request as in other entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you will also need to update the branch with master, there is a conflict in the changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed both in 503c282

@sayden sayden force-pushed the added-pretty-output-to-system-states branch from e4c9bab to 503c282 Compare September 14, 2018 08:01
@sayden sayden merged commit 457a784 into elastic:master Sep 14, 2018
@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. and removed needs tests labels Sep 14, 2018
sayden added a commit to sayden/beats that referenced this pull request Sep 14, 2018
…mary" metricset (elastic#8275)

Added function to return pretty names on "system" module "process summary" metricset

(cherry picked from commit 457a784)
@sayden sayden added v6.5.0 and removed needs_backport PR is waiting to be backported to other branches. labels Sep 14, 2018
sayden added a commit that referenced this pull request Sep 19, 2018
…mary" metricset (#8275) (#8312)

Added function to return pretty names on "system" module "process summary" metricset
@sayden sayden deleted the added-pretty-output-to-system-states branch September 27, 2018 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants