-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2098: runtime: add code to generate node IDs #2109
Conversation
3076672
to
5b1a93c
Compare
I think this a much better approach than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks decent, though it would still be nice to have something like the host name, to be able to identify slow nodes
Pipelines resultsPR tests (gcc-12, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (clang-9, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (gcc-9, ubuntu, mpich, zoltan) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (gcc-10, ubuntu, openmpi, no LB) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (gcc-7, ubuntu, mpich, trace runtime, LB) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (clang-11, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (gcc-8, ubuntu, mpich, address sanitizer) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (clang-14, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (clang-10, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (nvidia cuda 11.0, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (clang-13, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (clang-12, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (gcc-11, ubuntu, mpich, json schema test) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (intel icpc, ubuntu, mpich) Build for 0968638 (2023-03-27 22:41:20 UTC)
PR tests (nvidia cuda 11.2, ubuntu, mpich) Build for ( UTC)
|
@lifflander I can confirm that this worked in the following configurations: (a) 48 ranks per node using 8 nodes, where the first node gets ranks 0-47, the second gets ranks 48-95, etc.; |
5b1a93c
to
f33bf5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema validator is failing in one of the pipelines. The switch from a simple string schema type to using hard-coded json snippets in several places (including tests) feels hackish. Would it really be that hard to fix the interface so that we can use the json utility to write all the json?
Looks great to me! |
93ec3f1
to
9edba6e
Compare
As a side note, this branch's JSON schema validator is passing for both the "toy user defined" problem:
as well as for the "challenging problem with fewer tasks":
I would like to suggest @nlslatt that the version of the validator also be reported in the logger output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the node identification code. Maybe Philippe should review the json bits.
int physical_node_id = -1; | ||
int physical_num_nodes = -1; | ||
int physical_node_size = -1; | ||
int physical_node_rank = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these all be suffixed with underscores?
96607e3
to
0968638
Compare
As I don't think we have any version number for this right now, let's defer this to another issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the code to analyze the physical nodes works and that the resulting JSON is as expected.
Fixes #2098