Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe management of session working files #79

Open
11 tasks
eirrgang opened this issue May 29, 2018 · 1 comment
Open
11 tasks

Safe management of session working files #79

eirrgang opened this issue May 29, 2018 · 1 comment
Assignees

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented May 29, 2018

The context needs to correctly determine whether or not the working directories for a session exist, not overwriting previous work. It should also be able to determine whether the work is finished or ready to be restarted, but that will require additional features. This task is about file management.

There are not complete and comprehensive GROMACS tools to deal with this
situation, so I probably need to write them, but with enough of the
original files we should be able to generate a new input file for the
forked run. Note that we should confirm that the checkpoint file used
matches the step number that we think it should.

Several related issues to consider:

GNU filesystem utilities indicate that the process's current working directory is used to resolve paths to produce a file descriptor for fopen(), but it is unclear whether the semantics are universally well-defined for what happens if the current working directory is changed while a file descriptor is held for a file opened by a relative path.

We should specify all input and output files rather than rely on libgromacs default behavior.

We should avoid ambiguity by making sure that we pass absolute paths to libgromacs.

We should cease the practice of changing working directory during Session launch (with the possible exception of dispatching to another Context, which should be done in a separate process).

The Context implementation should handle shuffling of filesystem artifacts for such use cases as forking trajectories.

Independently of further discussions about input and output paradigms, we can achieve predictable behavior in the short term by distinguishing between a trajectory that is a continuation and a trajectory that is initialized as a fork of another trajectory.

As a further simplification of the last point, we can accept that our trajectory forking operation will be a freshly initialized simulation whose zeroth MD microstate is not exactly equal to that from which it was forked. It's time to write some utilities to extract / manipulate input components because right now I don't think there is a way to get a topology that grompp can use back out of a TPR file. For a proof-of-concept trajectory forking, we could wrap the following.

 gmx dump -s old.tpr -om temp.mdp
 gmx grompp -f temp.mdp -p topol.top -c state.cpt -o new.tpr

where old.tpr is available from the original load_tpr operation, temp.mdp is a temporary file managed by the Context implementation, topol.top can be provided as a parameter to the fork_trajectory() operation, state.cpt is already managed by the C++ Session, and new.tpr is an output that becomes the input for the forked md operation. But this is already convoluted enough that I should just make proper API tools.

  • pass absolute paths to libgromacs
  • preempt default file naming to allow abstraction of working directory
  • remove chdir from Session launch
  • track filesystem artifacts in Context
  • create fork_trajectory() operation (proof-of-concept wraps command line)
  • Session should use working directory keyed by WorkSpec unique identifier.
  • Existing directory should not be corrupted.
  • Existing directory should be checked for state.
  • File inputs should be made accessible to the Session.
  • Filesystem artifacts from an element should be accessible by another element.
  • Filesystem artifacts should be made accessible to the client.
@eirrgang
Copy link
Collaborator Author

eirrgang commented Jun 4, 2018

This issue name may be based on a false premise. We probably don't want to think of working directories in the sense of process environment at all. We should start rigorously managing file paths with context resources. The fact that the current ParallelArrayContext does a chdir should probably be considered a bug...

(update: changed issue name)

@eirrgang eirrgang changed the title Safe management of session working directories Safe management of session working files Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant