-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Add new command line option to distribute single node jobs on multiple cluster nodes #2458
Conversation
…t/alt_flex_alloc
…t/alt_flex_alloc
Hello @ekouts, Thank you for updating! Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide Comment last updated at 2022-04-13 15:57:30 UTC |
Codecov Report
@@ Coverage Diff @@
## master #2458 +/- ##
==========================================
+ Coverage 85.81% 85.87% +0.06%
==========================================
Files 57 58 +1
Lines 10705 10772 +67
==========================================
+ Hits 9186 9250 +64
- Misses 1519 1522 +3
Continue to review full report at Codecov.
|
@vkarak I moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor things still in the code + the following that I discovered by running it.
- You need to define a formatting function for the
$nid
parameter so that it is formatted in a more friendly way. Practically convert the list into an abbreviated host list. - I don't think that we do need the
_D_
prefix/suffix. Since we do change the name based on the partition's name. - The
--distribute
option should assume a default (perhapsidle
) if no argument is passed. - When I run this
--distribute=idle -J nodelist='nid0000[1-3]'
the test is not parametrised on just the three nodes I pass, but on all the idle nodes of the system.
Other than that, this feature is beautiful!
|
I would probably remove it completely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor suggestion still.
Indeed it works with |
…t/alt_flex_alloc
…feat/alt_flex_alloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now!
In this PRs there two new features for the slurm and local partitions:
1. The job object includes the
pin_nodes
attribute.The attribute has an effect only on Slurm. It will pass
--nodelist={}
on the job scripts with an abbreviated string of the nodes.On the other schedulers it is ignored.
2. The user can pass the argument
--distribute
and parameterize all the tests from the cli.By default ReFrame will parameterize all the selected tests to all the nodes of the valid partitions in the requested state, taking into account the cli job options.
-n
,--system
etc) and then it will dynamically create dynamic parameterized tests on each partition for all the nodes.--distribute=idle
we get all nodes in stateidle
. To get all nodes of the partition we need to pass--distribute=all
.Closes #2334 .