Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/unit-tests/margo-scheduling reports busy spin on abt mutexes #248

Closed
carns opened this issue Feb 13, 2023 · 4 comments · Fixed by #249
Closed

tests/unit-tests/margo-scheduling reports busy spin on abt mutexes #248

carns opened this issue Feb 13, 2023 · 4 comments · Fixed by #249
Assignees

Comments

@carns
Copy link
Member

carns commented Feb 13, 2023

This isn't supposed to happen with current versions of Argobots, see pmodels/argobots#361

carns-x1-7g ~/w/s/m/m/build (main=)> tests/unit-tests/margo-scheduling --no-fork
Running test suite with seed 0x257bf8b0...
/margo/abt_mutex_cpu                 User CPU time used: 5.198903
	detected that Argobots mutexes may busy spin.
[ OK    ] [ 5.20033550 / 5.19891951 CPU ]
1 of 1 (100%) tests successful, 0 (0%) test skipped.

carns-x1-7g ~/w/s/m/m/build (main=)> spack find 
==> In environment mochi-margo-dev-abt-main
==> Root specs
argobots@main  cflags="-g"   json-c  mercury  cflags="-g" 

==> Installed packages
-- linux-ubuntu22.10-skylake / [email protected] -----------------------
argobots@main    [email protected]      [email protected]    [email protected]
[email protected]    [email protected]       [email protected]  [email protected]
[email protected]  [email protected]  [email protected]
==> 11 installed packages

Above test is with current origin/main of mochi-margo and origin/main of argobots.

Need to validate what mutex mode is being used by Argobots.

@carns carns self-assigned this Feb 13, 2023
@carns
Copy link
Member Author

carns commented Feb 13, 2023

The mutex test is a red herring; the test reports a high cpu usage even if the thread that is meant to block on the mutex is not launched. Something something seems to be making Margo busy spin on it's own in this example.

@mdorier
Copy link
Contributor

mdorier commented Feb 13, 2023

Does this happen with other transport than na+sm?

@carns
Copy link
Member Author

carns commented Feb 13, 2023

Yes, I can test with both na+sm and ofi+tcp on my laptop, and both busy spin without launching the test ULT.

I can stop the busy spinning by commenting out the mii.json_config = "{ \"rpc_thread_count\":1}"; line in the test code, though. Maybe it has something to do with the pool configuration for the RPC pool? I'm still digging.

The monitoring (enabled via export MARGO_ENABLE_MONITORING=1) is immediately very helpful. I thought the most likely culprit was the logic in the progress loop that governs whether to use a timeout with HG_Progress() or not, but runs with and without the external RPC pool show very similar statistics there despite producing very different CPU utilization results. The important thing is how much cumulative time is spent in progress calls with a timeout. In both cases it is about 5 seconds as expected for this artificial test case.

From an example that busy spins:

    "progress_with_timeout": {
      "num": 52,
      "min": 0.092145205,
      "max": 0.10017705,
      "avg": 0.099984201,
      "var": 6.2693e-05,
      "sum": 5.199178457
    },

From an example that does not busy spin:

    "progress_with_timeout": {
      "num": 52,
      "min": 0.087362051,
      "max": 0.10038209,
      "avg": 0.09996797,
      "var": 0.000162242,
      "sum": 5.198334455
    },

@carns
Copy link
Member Author

carns commented Feb 13, 2023

Whoops just edited the above comment, I accidentally pasted the same thing twice, making them appear identical rather than simply similar :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants