Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a queue metrics capture method to container manager #206

Conversation

yaacov
Copy link
Member

@yaacov yaacov commented Jan 7, 2018

Description
Add a method to queue reading of metrics.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1527877
Target: 5.9.1

@yaacov yaacov force-pushed the add-a-queue_metrics_capture-method-to-container-manager branch from 5a6b780 to b5d7103 Compare January 7, 2018 12:49
@yaacov
Copy link
Member Author

yaacov commented Jan 7, 2018

@miq-bot add_label bug, metrics, gaprindashvili/yes

@cben @moolitayer @zeari @Ladas please review

@yaacov yaacov force-pushed the add-a-queue_metrics_capture-method-to-container-manager branch from b5d7103 to 096d2c3 Compare January 7, 2018 12:54
Copy link
Contributor

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

begin
target.perf_capture_queue(interval_name, options)
rescue StandardError => err
_log.warn("Failed to queue perf_capture for target [#{target.class.name}], [#{target.id}], [#{target.name}]: #{err}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 perf_capture_queue may raise exceptions, especially ArgumentError.
However, I expect in most cases where one target raises, all would raise,
so perhaps this function should just let exception propogate?
[Your call, I dont have opinion here]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we should let it propagate and rather do _log.error, there shouldn't be a case where only 1 fails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @cben @Ladas 👍

now it's _log.error and propagating 🌸

@yaacov yaacov force-pushed the add-a-queue_metrics_capture-method-to-container-manager branch 2 times, most recently from 114804c to 3401100 Compare January 8, 2018 07:59
Copy link
Contributor

@Ladas Ladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now

Copy link

@zeari zeari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -266,4 +266,17 @@ def evaluate_alert(_alert_id, _event)
# and these should automatically be translated to alerts.
true
end

def queue_metrics_capture(interval_name = 'realtime', options = {})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is meant for queuing in the UI I would prefer if you remove both parameters


targets.each do |target|
begin
target.perf_capture_queue(interval_name, options)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Metric::CiMixin::Capture.perf_capture_realtime_now
can you use that here?

@yaacov yaacov force-pushed the add-a-queue_metrics_capture-method-to-container-manager branch from 3401100 to 46d118d Compare January 8, 2018 15:50
@miq-bot
Copy link
Member

miq-bot commented Jan 8, 2018

Checked commit yaacov@46d118d with ruby 2.3.3, rubocop 0.47.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 👍

@moolitayer
Copy link

Update: yaacov and myself observed we have some errors in the log post queue_metrics_capture calls.
Seems to be related to subsequent requests where the calculated gap from last refresh is too small.

We also observed some cases where the end_time - start_time for metrics collection is very large (PrometheusCaptureContext.fetch_counters_data) but it's not clear if that is a problem OR related to this PR.

@yaacov
Copy link
Member Author

yaacov commented Jan 9, 2018

Update: yaacov and myself observed we have some errors in the log post queue_metrics_capture calls.

This errors have no connection to this PR.

Seems to be related to subsequent requests where the calculated gap from last refresh is too small.

No, the are related to Erorr/CrashLoop pods/containers that have no metric data for the time slot requested.

We also observed some cases where the end_time - start_time for metrics collection is very large

Regular 'historical' requsts ask fot different time slots depending on gaps and missing data, it's ok to have strange :-) time slots, when queue by system worker.

targets.each do |target|
begin
target.perf_capture_queue('realtime', :priority => MiqQueue::HIGH_PRIORITY)
rescue StandardError => err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cosmetic, LGTM either way] since you now re-raise, you could move this rescue to top level of the function, saving begin..end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I want to have the target that raised the error, If I move the rescue to top level I lose this info :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right. remind me to resist making completely non-important code comments :)

Copy link

@moolitayer moolitayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@moolitayer moolitayer merged commit 084c88c into ManageIQ:master Jan 9, 2018
@moolitayer moolitayer self-assigned this Jan 9, 2018
@moolitayer moolitayer added this to the Sprint 77 Ending Jan 15, 2018 milestone Jan 9, 2018
@simaishi
Copy link

simaishi commented Mar 6, 2018

Removed gaprindashvili/yes as the BZ was closed as not a bug. Do we need to revert this from master branch? If so, please create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants