Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable aggregate logic in flow #324

Merged
merged 38 commits into from
Aug 12, 2020

Conversation

kidrahahjo
Copy link
Collaborator

@kidrahahjo kidrahahjo commented Jul 16, 2020

Enabling aggregate logic in flow (enabling internal handling of lists) without any change in the API.

Description

The original PR for enabling aggregate operations #289 is too large.
This PR reduces the scope of that PR by handling only aggregates of 1.

Motivation and Context

Suggested by @glotzerlab/signac-gsoc-mentors

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog.

@kidrahahjo kidrahahjo requested review from csadorf and a team July 16, 2020 12:31
@kidrahahjo kidrahahjo requested review from a team as code owners July 16, 2020 12:31
@kidrahahjo kidrahahjo requested review from yuanzhou0827 and removed request for a team and yuanzhou0827 July 16, 2020 12:31
@kidrahahjo
Copy link
Collaborator Author

kidrahahjo commented Jul 16, 2020

The tests passed locally. The test test_status_performance passed locally but is failing here (The time on my machine was 6.59 s)
Apart from this, I think we'd want to deprecate the get_job_status method or maybe refactor it. Internally get_job_status is not used now, after this implementation, this method will just float in the API.
Also, I think that this is ready for review.

[EDIT]
In this PR:
Only the internal methods which previously took a single job as an argument now takes in a list of jobs instead.

@kidrahahjo kidrahahjo self-assigned this Jul 16, 2020
@kidrahahjo kidrahahjo added GSoC Google Summer of Code aggregation labels Jul 16, 2020
Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully reviewed this PR (I stopped around line 550 of project.py) but here are some starting comments for improvement.

flow/aggregate.py Outdated Show resolved Hide resolved
flow/aggregate.py Outdated Show resolved Hide resolved
flow/aggregate.py Outdated Show resolved Hide resolved
flow/aggregate.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Show resolved Hide resolved
Copy link
Member

@b-butler b-butler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished my review, yet, but wanted to get some comments out there. There may be some places where we will have to technically change the user API for aggregation. In general, many of these could be made private. I think having that discussion would be helpful for development. One of these #313, has already been talked about.

flow/aggregate.py Outdated Show resolved Hide resolved
flow/aggregate.py Outdated Show resolved Hide resolved
flow/aggregate.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
@kidrahahjo
Copy link
Collaborator Author

I have optimized the way we fetch status, now it's much more efficient than the previous implementation.
Given this, I think now the tests will pass.
I have not resolved the conversations which I feel would be helpful in future to keep a track of the changes we introduced.

Copy link
Member

@b-butler b-butler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is really coming along. I have a few comments. I like @bdice's idea regarding using tuples internally.

flow/project.py Outdated
Comment on lines 280 to 282
assert isinstance(jobs, list)

self._jobs = jobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use tuples for immutability which makes sense here we would want the structure of aggregation to be a generator or list of aggregate tuples. I don't have a strong preference for the assert. Ideally we wouldn't need an assert or explicit conversion.

Comment on lines +307 to +308
assert len(self._jobs) == 1
return f"{self.name}({str(self._jobs[0])})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal class we don't need the len == 1 assumption. We could use this in JobOperation though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we include actual aggregation, this will have to change to something like.
f"{self.name}-{len(self._jobs)}: ({str(self._jobs[0])}-{str(self._jobs[-1])})" for every aggregate (even for length 1).

Please see this comment by @csadorf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware. I more mean, we can do that now given this is an internal class now. We just need this behavior for JobOperation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-butler If we do this, I'll have to change the template-reference-data twice(once here and another in #334 )
This will result in a lot of additional work.
I think this should be addressed in #334

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert should be in place wherever we actually make that assumption. Once that assumption is removed (e.g. in #334), the asserts will automatically alert us to spots in the code that require a revised implementation.

flow/project.py Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
flow/project.py Show resolved Hide resolved
flow/project.py Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
@kidrahahjo kidrahahjo requested a review from csadorf August 10, 2020 16:06
@kidrahahjo
Copy link
Collaborator Author

Since the team agrees upon the use of tuples internally rather than jobs, I'll now go ahead and implement this functionality.

Copy link
Member

@b-butler b-butler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comments you left @kidrahahjo.

Comment on lines +307 to +308
assert len(self._jobs) == 1
return f"{self.name}({str(self._jobs[0])})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware. I more mean, we can do that now given this is an internal class now. We just need this behavior for JobOperation

flow/project.py Outdated Show resolved Hide resolved
flow/project.py Show resolved Hide resolved
flow/project.py Show resolved Hide resolved
@kidrahahjo kidrahahjo requested a review from b-butler August 10, 2020 19:14
flow/project.py Outdated Show resolved Hide resolved
Copy link
Member

@b-butler b-butler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. When I said to use _verify_aggregate_project where we use jobs in the function signature, I meant for the public facing functions. Internally we should be guaranteed not to add jobs not a part of the project. This does include the command line interface so the _main... functions.

flow/project.py Outdated Show resolved Hide resolved
@kidrahahjo
Copy link
Collaborator Author

@b-butler I have added the job check for these methods print_status, run, submit. Other than these, I don't think there's a need to add this check. Please suggest.

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow GitHub dropped my previous review, but comments were still posted.

@kidrahahjo
Copy link
Collaborator Author

@csadorf I addressed the comments which were not resolved in the diff. Please have a look

@kidrahahjo kidrahahjo requested review from b-butler and csadorf August 11, 2020 11:02
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented on the f-string issue, would still like to see that addressed, but this is not critical. LGTM!

Copy link
Member

@b-butler b-butler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few small changes and I think we are good to go on this PR.

flow/project.py Show resolved Hide resolved
flow/project.py Outdated Show resolved Hide resolved
Copy link
Member

@b-butler b-butler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@b-butler
Copy link
Member

As for pros/cons on lists versus tuples. I don't think there are any cons with the tuples approach. Once a particular aggregate is generated, the idea is it is used immediately to create a _JobOperation which is not designed to have its jobs change. A tuple explicitly states this assumption. Any dynamic nature to aggregates is to be found in their generation.

@csadorf
Copy link
Contributor

csadorf commented Aug 11, 2020

@bdice Do you want to have a final look?

Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at open conversations again. I left some unresolved so that they can be remembered in follow-up work. Otherwise this looks good!

@bdice bdice merged commit 3fe6006 into glotzerlab:master Aug 12, 2020
@bdice bdice added this to the v0.11.0 milestone Aug 12, 2020
kidrahahjo added a commit that referenced this pull request Aug 12, 2020
* Add the private Aggregate class having fundamental concepts

* Tests for _Aggregate class added

* Logic for _condition, BaseFlowOperation, FlowGroup added

* Logic extended for exec command

* Logic extended for next command

* Logic for run command

* Logic for script command added

* Logic for submit and status enabled

* Make tests pass, replaced by-job grouping with by-op grouping

* Style Fix

* Remove print statement

* Suggested changes implemented with refactoring of Aggregate class and FlowGroup

* Documentation Edit and implement suggested changes

* Improve status fetching performance

* Use get_id() instead of the id property

* Remove the use of Aggregate class and refactored

* Minute document change

* Apply suggested changes

* Deprecate eligible and complete methods of Flow{Group|Operation}

* Move property method job to JobOperation only

* Apply suggested changes

* Apply suggested changes

* Apply suggested changes

* Change variable name

* Insert assert statement to check len(_JobOperation._jobs) == 1

* Internally change the use to lists to tuples

* Additional suggested changes from code review

* Add suggested changes

* Revert accidental changes

* Use repr(job) instead of str(job) for __repr__ method and remove instance check

* Use f-strings

* Remove checks from run, submit, print_status, _generate_id methods. To be implemented in #336
kidrahahjo added a commit that referenced this pull request Sep 1, 2020
* Add the private Aggregate class having fundamental concepts

* Tests for _Aggregate class added

* Logic for _condition, BaseFlowOperation, FlowGroup added

* Logic extended for exec command

* Logic extended for next command

* Logic for run command

* Logic for script command added

* Logic for submit and status enabled

* Make tests pass, replaced by-job grouping with by-op grouping

* Style Fix

* Remove print statement

* Suggested changes implemented with refactoring of Aggregate class and FlowGroup

* Documentation Edit and implement suggested changes

* Improve status fetching performance

* Use get_id() instead of the id property

* Remove the use of Aggregate class and refactored

* Minute document change

* Apply suggested changes

* Deprecate eligible and complete methods of Flow{Group|Operation}

* Move property method job to JobOperation only

* Apply suggested changes

* Apply suggested changes

* Apply suggested changes

* Change variable name

* Insert assert statement to check len(_JobOperation._jobs) == 1

* Internally change the use to lists to tuples

* Additional suggested changes from code review

* Add suggested changes

* Revert accidental changes

* Use repr(job) instead of str(job) for __repr__ method and remove instance check

* Use f-strings

* Remove checks from run, submit, print_status, _generate_id methods. To be implemented in #336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregation GSoC Google Summer of Code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants