Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve crm report] Better performance #880

Merged

Conversation

liangxin1300
Copy link
Collaborator

@liangxin1300 liangxin1300 commented Oct 20, 2021

More faster

By using multiprocessing.Process correctly and make all collect related functions running in parallel:

Container number 2 5 10 15 20
Current 6s 14s 30s 48s 1m8s
This PR 4s 5s 10s 16s 24s

In current code, report process is not in parallel:
Screenshot from 2021-10-25 15-28-24

This PR, report process is in parallel:
Screenshot from 2021-10-25 15-15-17

Note: the linear behavior for the time when nodes increasing in this PR, is because the analyze and sanitize function in master should process more data

Changes

  • Dev: crm report: Collect report using multiprocessing correctly
  • Dev: crm report: Consolidate collect functions in collect.py and running them in parallel

@liangxin1300 liangxin1300 force-pushed the 20211020_report_performance branch 4 times, most recently from 9cdee81 to f07c71b Compare October 21, 2021 08:57
for p in process_list:
p.join()
collect_func_list = generate_collect_functions()
pool = multiprocessing.Pool(processes=len(collect_func_list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the following instead,

processes=round(0.8*multiprocessing.cpu_count())

First of all, too many processes larger than the actual cpu_count() creates unnecessary cpu context switches.
Secondly, its makes sense to me to limit crm report a little in a conservative way. Don't let is occupy all CPUs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,
BTW, That remind me to "combine" some "collect_" functions which have similar logic into one, then it can also reduce the number of process

@liangxin1300 liangxin1300 force-pushed the 20211020_report_performance branch from f07c71b to 7ba7467 Compare October 25, 2021 06:57
@liangxin1300 liangxin1300 merged commit 56a913a into ClusterLabs:master Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants