-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the AnalysisBase
compatible with live simulation streams
#4828
Comments
Can this issue be addressed independently from #4827 ? |
Fixing this relies on being able to check if a reader inherits from |
Generally, I advocate for including essential (and lightweight) components from
|
@yuxuanzhuang Thank you for the in-depth look! Overall, I agree with you that we could adapt the reader to be more compatible. As for the initialization of results, my thinking was that we would use python lists to hold intermediate results, and when iteration stops and Maybe we allow slicing, but raise a That might be the best path forward- requiring people to pass in an end frame, defaulting to 0 for the start frame, and failing out if the end frame isn't reached, perhaps with some kind of option to save intermediate results to csv or something. That would also allow us to avoid adding complexity with parsing logic for all the different analysis class' frame-related kwargs like Edit: Say something like |
Many analyses also rely on array assignment within I don’t have a fully fleshed-out solution yet, but I wonder if it might be possible to implement a “block” analysis approach so it could be processed in chunks of a fixed number of frames, and then the results from each block could be concatenated during |
Good idea. I hope that the discussion on this issue leads to a feasible and general solution, similar to how the parallelization was refined through discussions. It‘s important for the future of MDAnalysis. |
This is a really clever idea IMO, if the trajectory could be processed in blocks guaranteeing that |
I haven't had time to think this through fully yet, but one initial comment / thought based on @yuxuanzhuang's comment "Instead of modifying AnalysisBase to be streamable, would it be feasible to make IMDReader “AnalysisBase-compatible”?" I'm not sure if it's in the same spirit, but I am concerned that we are following the path of "let's keep on adding things to AnalysisBase because we can". It sounds to me like enabling analyses on streams is going to need a lot of analysis-specific work. Is this a fair assessment? If so, is it maybe worth consider if we need either a mixin or a separate subclass to make analyses stream compatible? (rather than adding more to the one base class) |
Yes, exactly. With such, I think the existing |
Is your feature request related to a problem?
Related to #4827: for MDAnalysis to function with live simulation streams, the
AnalysisBase
must be modified to account for a few quirks of live-streamed trajectories:len(u.trajectory)
andu.trajectory.n_frames
will fail)u.trajectory[:]
andu.trajectory[::n]
(where n is positive) so they are inherently incompatible with parallelization schemesDescribe the solution you'd like
Like with parallelization, analysis classes that inherit from
AnalysisBase
should "opt-in" to stream usage through a class property similar to theparallelizable
property (likestreamable
or similar). To opt-in, inherited classes must also avoid attempting to access the trajectory's length or slice it. Therms.RMSF
class is already such a class.The
AnalysisBase
must also have a_streamed_run
method that never attempts to slice a trajectory or access its length. The imdclient MDAKit ships such a method already (and monkey patches it in): https://github.com/Becksteinlab/imdclient/blob/main/imdclient/streamanalysis.pyDescribe alternatives you've considered
Don't allow live simulation streaming, or don't allow builtin analysis classes to be used with these streams
The text was updated successfully, but these errors were encountered: