Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

steps keyword argument has surprising/ambiguous behavior #130

Closed
1 task done
eirrgang opened this issue Jun 28, 2018 · 3 comments
Closed
1 task done

steps keyword argument has surprising/ambiguous behavior #130

eirrgang opened this issue Jun 28, 2018 · 3 comments
Assignees
Labels

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Jun 28, 2018

Initial support for the steps keyword argument in gmx.workflow.from_tpr() has potentially surprising behavior (and will need to exhibit different behavior in the future as noted in #56 ).

Namely, not specifying steps for a simulation that is using a previous simulation's output as input can result in a target step of infinity. (Note: for gmx mdrun, -nsteps -1 is described as having this behavior, and only -1.) This is potentially more surprising to a user because -cpi is implicitly set for MD operations in gmxapi 0.0.6 in an attempt to have consistent behavior after an interruption and to make it easier to continue a trajectory in a

I'm not sure if it is worth a complete resolution now, given longer-term plans. The workaround is to use a clean working directory, or at least to remove state.cpt and set append=False, and to be aware of the semantics.

  • improve documentation for gmx.workflow.from_tpr() steps keyword argument.
@eirrgang eirrgang added the bug label Jun 28, 2018
@eirrgang eirrgang self-assigned this Jun 28, 2018
eirrgang added a commit to eirrgang/gromacs-gmxapi that referenced this issue Jun 28, 2018
Make sure session close() results in verifiably finalized log file.
Make sure to propagate session closure down through API.

* Destroy the gmx::Mdrunner in Session::close().
* Flush fplog file descriptor before returning control to API after mdrunner().
* When fplog managed by runner is closed, set the FILE* pointer to null.
* clean up tests and handle the caveat also described in
kassonlab/gmxapi#130
eirrgang added a commit to eirrgang/gmxapi that referenced this issue Jun 28, 2018
eirrgang added a commit to kassonlab/gromacs-gmxapi that referenced this issue Jun 28, 2018
Make sure session close() results in verifiably finalized log file.
Make sure to propagate session closure down through API.

* Destroy the gmx::Mdrunner in Session::close().
* Flush fplog file descriptor before returning control to API after mdrunner().
* When fplog managed by runner is closed, set the FILE* pointer to null.
* clean up tests and handle the caveat also described in
kassonlab/gmxapi#130
@peterkasson
Copy link
Collaborator

I think here we should replicate the normal mdrun behavior: if -nsteps is not specified, it uses the number of steps in the TPR (counting a checkpoint restart as a continuation). Appending or not doesn't affect this. This should be the desired gmxapi behavior also.

@eirrgang
Copy link
Collaborator Author

I agree that unspecified steps should advance the trajectory to the point specified in the TPR file.

Right now it doesn't do that if there is a checkpoint that has gone past the step specified in the TPR file.

There are at least two cases, it seems, where the command line behavior can lead to infinite trajectories, and at least in the case described here, it seems like unintended behavior that could cause a user to accidentally burn up a lot of SUs, so I want to address this potentially surprising behavior and at least make sure it is well documented to disambiguate it a bit. I think this is achieved in 2be078d

Setting an infinite number of steps could effectively be the way to replace the nsteps stop condition with some other, but it shouldn't be accidental the way it currently is.

I don't think this behavior is affected by whether or not appending is happening or not. I was thinking that appending could not be enabled for the workaround of removing the checkpoint file, but it may not be necessary to explicitly turn it off.

Note, though, that for automatic fault tolerance and reproducibility, we always restart from a checkpoint if one is found and we append to output files by default, but we don't have full workflow checkpointing yet, so these are implementation details that are very much open for discussion. But they are also likely to evolve naturally as we get better handling of simulation inputs and outputs, both with #79 and others. For more on appending / not appending to existing files, see #40 and #126

eirrgang added a commit that referenced this issue Jul 24, 2018
Relates to issue #40 and to bug #130

Resolves task in #130 when merged.
@eirrgang
Copy link
Collaborator Author

eirrgang commented Nov 9, 2018

Several aspects of simulation continuation behavior were recently declared to be GROMACS bugs (see https://redmine.gromacs.org/issues/2717) and behavior has changed in recent GROMACS patches. We need to check again that our behavior is sensible and accurately documented.

See, for example, https://gerrit.gromacs.org/c/8673/4 and https://gerrit.gromacs.org/c/8618/
See also https://redmine.gromacs.org/issues/2717

@eirrgang eirrgang reopened this Nov 9, 2018
eirrgang added a commit to eirrgang/gmxapi that referenced this issue Nov 9, 2018
resolves kassonlab#175

Behavior is well-defined as of GROMACS 2019 beta 3.
If additional documentation updates are necessary, we will address them
in a separate issue.
eirrgang added a commit to eirrgang/gmxapi that referenced this issue Nov 13, 2018
It looks like we should be fine in our tests and everything since we
rely on `end_time` instead of `steps`, but upstream changes to GROMACS
may cause a behavior change in GROMACS 2019 for simulations that use
the argument to override the number of steps specified in the TPR file.
The command line `nsteps` argument will be removed in GROMACS master
after the GROMACS 2019 branch, so we need to prepare for removal of our
version of it. Now issue a deprecation warning when "steps" or "nsteps"
is specified to `gmx.workflow.from_tpr()`

Resolves kassonlab#130
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants