Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let reframe stage the input files for CP2K #207

Closed
wants to merge 2 commits into from

Conversation

smoors
Copy link
Collaborator

@smoors smoors commented Nov 17, 2024

reframe automatically stages all files in the src dir, so let's use that

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, you can set the sourcesdir to something other than the standard src as well. I don't mind the solution of renaming the dir to src either. Either way works. Let me know what you prefer - if you want to keep it like this, that's fine by me.

The other thing is that indeed, the fixture is not strictly needed for CP2K. However, I'd like to have a more general discussion where we draw the line. The CP2K files are 100kb in total. That's not a lot. But, if I run this on Snellius with 2 CP2K modules, and 2 partitions, I get 156 test cases. That's still 10MB of data in the staging dir. Not a huge issue, but my point is: it explodes, rapidly. The downside of a separate staging step is, of course, additional code/logic that needs to be maintained.

I'm wondering if, since again it's a pretty standard operation, we could somehow make this more generic and make it part of the eessi_mixin.py. Then, call something like

stage_session_files = fixture(EESSI_<some_name>_stage_input, scope='session')

in the EESSI_Mixin class and document that users can do something like:

 self.executable_opts += ['-i', f'{self.stage_session_files.stagedir}/input/QS/{os.path.basename(self.bench_name)}.inp']

to use staged files. We'd have to document that this is what they should do for files that they only need once per session, but I think that applies to 80% of the cases. I guess the only exception is if your code also writes to those files?

Anyway, I'm not immediately sure what the implementation should look like. We'd need to somehow generate these classes on the fly. And the child class should contain something like a directory list or file list of the files / dirs (as a property, e.g. stage_files_per_session = ['A/input1', 'A/input2', 'B/input3'] or stage_dirs_per_session = ['A', 'B']) that need to be staged (which the generic staging class then picks up on). Then, we'd need it to programmatically create a staging class for each child class that defines such a file/dir list. That means we need some kind of generator-function that returns a class with a particular name (such as cp2k_staging_session).

I think the CP2K staging class in https://github.com/EESSI/test-suite/pull/205/files is a pretty good example of what that might look like - including a sanity check that all of those files are indeed staged correctly.

I'm a bit worried that if we don't do this deduplication (where possible), the amount of staging data might get out of hand quicker than we think. If ReFrame is good at anything, it's generating a vast amount of test instances, because of the multiplication between test cases, modules that we want to test, and system partitions. Again, the above 10MB might not seem like a problem. But if I add 2 more partitions and 2 more modules I have again another factor of 4 extra in terms of tests, for 40MB of staging data. If I have that for 20 tests, that's already 1 GB of footprint.

Either that, or I have to go back on my original statement and just accept that we do not facilitate running the test suite from a node-local dir. And then we just mark as much as we can as read_only (admittedly, from a technical perspective, this is the easier option).

What do you think?

@smoors
Copy link
Collaborator Author

smoors commented Nov 20, 2024

ok, i think i understand now.

a fixture would indeed be nice if we can implement it in the mixin class.
i want to avoid that every test has to implement its own fixture, as this increases again the barrier to contribution.

can you remind me what was the use case for "running the test suite from a node-local dir"

@casparvl
Copy link
Collaborator

can you remind me what was the use case for "running the test suite from a node-local dir"

Concrete use case is that this is how our CI/run_reframe.sh script currently does things for the regular test runs. It clones the necessary repo's in /tmp, runs stuff, then cleans up. Sure, we could clone in a homedir (and assume/hope that that's big enough on all systems), but if cleanup fails, it leaves trash in your homedir => not very nice.

Also: the fact that a node-local installation doesn't work might be a surprise to users. ReFrame as a whole typically only assumes the staging dir is on shared FS (unless read_only is used, clearly). In fact, it works just fine if the ReFrame install and tests are only present on the node running the runtime.

@smoors
Copy link
Collaborator Author

smoors commented Nov 20, 2024

ok, i'll close this PR as it is clearly not an ideal solution

@smoors smoors closed this Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants