-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable aggregate logic in flow #324
Conversation
The tests passed locally. The test [EDIT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully reviewed this PR (I stopped around line 550 of project.py) but here are some starting comments for improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished my review, yet, but wanted to get some comments out there. There may be some places where we will have to technically change the user API for aggregation. In general, many of these could be made private. I think having that discussion would be helpful for development. One of these #313, has already been talked about.
I have optimized the way we fetch status, now it's much more efficient than the previous implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is really coming along. I have a few comments. I like @bdice's idea regarding using tuples internally.
flow/project.py
Outdated
assert isinstance(jobs, list) | ||
|
||
self._jobs = jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to use tuples for immutability which makes sense here we would want the structure of aggregation to be a generator or list of aggregate tuples. I don't have a strong preference for the assert. Ideally we wouldn't need an assert or explicit conversion.
assert len(self._jobs) == 1 | ||
return f"{self.name}({str(self._jobs[0])})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal class we don't need the len == 1
assumption. We could use this in JobOperation
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware. I more mean, we can do that now given this is an internal class now. We just need this behavior for JobOperation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert should be in place wherever we actually make that assumption. Once that assumption is removed (e.g. in #334), the asserts will automatically alert us to spots in the code that require a revised implementation.
Since the team agrees upon the use of tuples internally rather than jobs, I'll now go ahead and implement this functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the comments you left @kidrahahjo.
assert len(self._jobs) == 1 | ||
return f"{self.name}({str(self._jobs[0])})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware. I more mean, we can do that now given this is an internal class now. We just need this behavior for JobOperation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. When I said to use _verify_aggregate_project
where we use jobs
in the function signature, I meant for the public facing functions. Internally we should be guaranteed not to add jobs not a part of the project. This does include the command line interface so the _main...
functions.
@b-butler I have added the job check for these methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow GitHub dropped my previous review, but comments were still posted.
@csadorf I addressed the comments which were not resolved in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've commented on the f-string issue, would still like to see that addressed, but this is not critical. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few small changes and I think we are good to go on this PR.
…o be implemented in glotzerlab#336
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As for pros/cons on lists versus tuples. I don't think there are any cons with the tuples approach. Once a particular aggregate is generated, the idea is it is used immediately to create a |
@bdice Do you want to have a final look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at open conversations again. I left some unresolved so that they can be remembered in follow-up work. Otherwise this looks good!
* Add the private Aggregate class having fundamental concepts * Tests for _Aggregate class added * Logic for _condition, BaseFlowOperation, FlowGroup added * Logic extended for exec command * Logic extended for next command * Logic for run command * Logic for script command added * Logic for submit and status enabled * Make tests pass, replaced by-job grouping with by-op grouping * Style Fix * Remove print statement * Suggested changes implemented with refactoring of Aggregate class and FlowGroup * Documentation Edit and implement suggested changes * Improve status fetching performance * Use get_id() instead of the id property * Remove the use of Aggregate class and refactored * Minute document change * Apply suggested changes * Deprecate eligible and complete methods of Flow{Group|Operation} * Move property method job to JobOperation only * Apply suggested changes * Apply suggested changes * Apply suggested changes * Change variable name * Insert assert statement to check len(_JobOperation._jobs) == 1 * Internally change the use to lists to tuples * Additional suggested changes from code review * Add suggested changes * Revert accidental changes * Use repr(job) instead of str(job) for __repr__ method and remove instance check * Use f-strings * Remove checks from run, submit, print_status, _generate_id methods. To be implemented in #336
* Add the private Aggregate class having fundamental concepts * Tests for _Aggregate class added * Logic for _condition, BaseFlowOperation, FlowGroup added * Logic extended for exec command * Logic extended for next command * Logic for run command * Logic for script command added * Logic for submit and status enabled * Make tests pass, replaced by-job grouping with by-op grouping * Style Fix * Remove print statement * Suggested changes implemented with refactoring of Aggregate class and FlowGroup * Documentation Edit and implement suggested changes * Improve status fetching performance * Use get_id() instead of the id property * Remove the use of Aggregate class and refactored * Minute document change * Apply suggested changes * Deprecate eligible and complete methods of Flow{Group|Operation} * Move property method job to JobOperation only * Apply suggested changes * Apply suggested changes * Apply suggested changes * Change variable name * Insert assert statement to check len(_JobOperation._jobs) == 1 * Internally change the use to lists to tuples * Additional suggested changes from code review * Add suggested changes * Revert accidental changes * Use repr(job) instead of str(job) for __repr__ method and remove instance check * Use f-strings * Remove checks from run, submit, print_status, _generate_id methods. To be implemented in #336
Enabling aggregate logic in
flow
(enabling internal handling of lists) without any change in the API.Description
The original PR for enabling aggregate operations #289 is too large.
This PR reduces the scope of that PR by handling only aggregates of 1.
Motivation and Context
Suggested by @glotzerlab/signac-gsoc-mentors
Types of Changes
1The change breaks (or has the potential to break) existing functionality.
Checklist:
If necessary: