Enable aggregate logic in flow #324

kidrahahjo · 2020-07-16T12:31:10Z

Enabling aggregate logic in flow (enabling internal handling of lists) without any change in the API.

Description

The original PR for enabling aggregate operations #289 is too large.
This PR reduces the scope of that PR by handling only aggregates of 1.

Motivation and Context

Suggested by @glotzerlab/signac-gsoc-mentors

Types of Changes

Documentation update
Bug fix
New feature
Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

I am familiar with the Contributing Guidelines.
I agree with the terms of the Contributor Agreement.
My name is on the list of contributors.
My code follows the code style guideline of this project.
The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

I have updated the API documentation as part of the package doc-strings.
I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
I have updated the changelog.

kidrahahjo · 2020-07-16T12:39:08Z

The tests passed locally. The test test_status_performance passed locally but is failing here (The time on my machine was 6.59 s)
Apart from this, I think we'd want to deprecate the get_job_status method or maybe refactor it. Internally get_job_status is not used now, after this implementation, this method will just float in the API.
Also, I think that this is ready for review.

[EDIT]
In this PR:
Only the internal methods which previously took a single job as an argument now takes in a list of jobs instead.

bdice

I haven't fully reviewed this PR (I stopped around line 550 of project.py) but here are some starting comments for improvement.

flow/aggregate.py

flow/project.py

… FlowGroup

b-butler

I haven't finished my review, yet, but wanted to get some comments out there. There may be some places where we will have to technically change the user API for aggregation. In general, many of these could be made private. I think having that discussion would be helpful for development. One of these #313, has already been talked about.

flow/aggregate.py

flow/project.py

kidrahahjo · 2020-07-18T09:35:56Z

I have optimized the way we fetch status, now it's much more efficient than the previous implementation.
Given this, I think now the tests will pass.
I have not resolved the conversations which I feel would be helpful in future to keep a track of the changes we introduced.

b-butler

This PR is really coming along. I have a few comments. I like @bdice's idea regarding using tuples internally.

b-butler · 2020-08-10T15:18:06Z

flow/project.py

+        assert isinstance(jobs, list)
+
+        self._jobs = jobs


If we want to use tuples for immutability which makes sense here we would want the structure of aggregation to be a generator or list of aggregate tuples. I don't have a strong preference for the assert. Ideally we wouldn't need an assert or explicit conversion.

b-butler · 2020-08-10T15:19:29Z

flow/project.py

+        assert len(self._jobs) == 1
+        return f"{self.name}({str(self._jobs[0])})"


This is an internal class we don't need the len == 1 assumption. We could use this in JobOperation though.

After we include actual aggregation, this will have to change to something like.
f"{self.name}-{len(self._jobs)}: ({str(self._jobs[0])}-{str(self._jobs[-1])})" for every aggregate (even for length 1).

Please see this comment by @csadorf

I am aware. I more mean, we can do that now given this is an internal class now. We just need this behavior for JobOperation

@b-butler If we do this, I'll have to change the template-reference-data twice(once here and another in #334 )
This will result in a lot of additional work.
I think this should be addressed in #334

The assert should be in place wherever we actually make that assumption. Once that assumption is removed (e.g. in #334), the asserts will automatically alert us to spots in the code that require a revised implementation.

flow/project.py

kidrahahjo · 2020-08-10T17:06:02Z

Since the team agrees upon the use of tuples internally rather than jobs, I'll now go ahead and implement this functionality.

b-butler

Addressed the comments you left @kidrahahjo.

b-butler · 2020-08-10T18:12:50Z

flow/project.py

+        assert len(self._jobs) == 1
+        return f"{self.name}({str(self._jobs[0])})"


I am aware. I more mean, we can do that now given this is an internal class now. We just need this behavior for JobOperation

flow/project.py

b-butler

Thanks for the changes. When I said to use _verify_aggregate_project where we use jobs in the function signature, I meant for the public facing functions. Internally we should be guaranteed not to add jobs not a part of the project. This does include the command line interface so the _main... functions.

flow/project.py

kidrahahjo · 2020-08-10T21:11:12Z

@b-butler I have added the job check for these methods print_status, run, submit. Other than these, I don't think there's a need to add this check. Please suggest.

csadorf

Somehow GitHub dropped my previous review, but comments were still posted.

…ance check

kidrahahjo · 2020-08-11T10:01:11Z

@csadorf I addressed the comments which were not resolved in the diff. Please have a look

csadorf

I've commented on the f-string issue, would still like to see that addressed, but this is not critical. LGTM!

b-butler

I have a few small changes and I think we are good to go on this PR.

flow/project.py

…o be implemented in glotzerlab#336

b-butler

LGTM

b-butler · 2020-08-11T16:08:31Z

As for pros/cons on lists versus tuples. I don't think there are any cons with the tuples approach. Once a particular aggregate is generated, the idea is it is used immediately to create a _JobOperation which is not designed to have its jobs change. A tuple explicitly states this assumption. Any dynamic nature to aggregates is to be found in their generation.

csadorf · 2020-08-11T16:50:54Z

@bdice Do you want to have a final look?

bdice

I just looked at open conversations again. I left some unresolved so that they can be remembered in follow-up work. Otherwise this looks good!

* Add the private Aggregate class having fundamental concepts * Tests for _Aggregate class added * Logic for _condition, BaseFlowOperation, FlowGroup added * Logic extended for exec command * Logic extended for next command * Logic for run command * Logic for script command added * Logic for submit and status enabled * Make tests pass, replaced by-job grouping with by-op grouping * Style Fix * Remove print statement * Suggested changes implemented with refactoring of Aggregate class and FlowGroup * Documentation Edit and implement suggested changes * Improve status fetching performance * Use get_id() instead of the id property * Remove the use of Aggregate class and refactored * Minute document change * Apply suggested changes * Deprecate eligible and complete methods of Flow{Group|Operation} * Move property method job to JobOperation only * Apply suggested changes * Apply suggested changes * Apply suggested changes * Change variable name * Insert assert statement to check len(_JobOperation._jobs) == 1 * Internally change the use to lists to tuples * Additional suggested changes from code review * Add suggested changes * Revert accidental changes * Use repr(job) instead of str(job) for __repr__ method and remove instance check * Use f-strings * Remove checks from run, submit, print_status, _generate_id methods. To be implemented in #336

kidrahahjo added 9 commits July 15, 2020 22:00

Add the private Aggregate class having fundamental concepts

4eb97a0

Tests for _Aggregate class added

07c5e16

Logic for _condition, BaseFlowOperation, FlowGroup added

5cc7912

Logic extended for exec command

b9c4655

Logic extended for next command

9a7ca7e

Logic for run command

15b79cf

Logic for script command added

df99e50

Logic for submit and status enabled

5f1e060

Make tests pass, replaced by-job grouping with by-op grouping

0a33a27

kidrahahjo requested review from csadorf and a team July 16, 2020 12:31

kidrahahjo requested review from a team as code owners July 16, 2020 12:31

kidrahahjo requested review from yuanzhou0827 and removed request for a team and yuanzhou0827 July 16, 2020 12:31

Style Fix

6a37920

kidrahahjo self-assigned this Jul 16, 2020

kidrahahjo added GSoC Google Summer of Code aggregation labels Jul 16, 2020

Remove print statement

39b4a3c

bdice reviewed Jul 16, 2020

View reviewed changes

Suggested changes implemented with refactoring of Aggregate class and…

7478f4d

… FlowGroup

b-butler requested changes Jul 17, 2020

View reviewed changes

kidrahahjo added 2 commits July 18, 2020 01:23

Documentation Edit and implement suggested changes

b5ba89f

Improve status fetching performance

1b07f7b

Use get_id() instead of the id property

6da1cf1

b-butler requested changes Aug 10, 2020

View reviewed changes

kidrahahjo requested a review from csadorf August 10, 2020 16:06

Resolve merge conflicts

9181b27

b-butler reviewed Aug 10, 2020

View reviewed changes

kidrahahjo added 3 commits August 11, 2020 00:29

Internally change the use to lists to tuples

e8518f5

Additional suggested changes from code review

3ab294f

Add suggested changes

8338517

kidrahahjo requested a review from b-butler August 10, 2020 19:14

kidrahahjo commented Aug 10, 2020

View reviewed changes

flow/project.py Outdated Show resolved Hide resolved

b-butler requested changes Aug 10, 2020

View reviewed changes

flow/project.py Outdated Show resolved Hide resolved

Revert accidental changes

d3f5f8f

csadorf reviewed Aug 11, 2020

View reviewed changes

Use repr(job) instead of str(job) for __repr__ method and remove inst…

2333a63

…ance check

kidrahahjo requested review from b-butler and csadorf August 11, 2020 11:02

csadorf approved these changes Aug 11, 2020

View reviewed changes

Use f-strings

069657f

b-butler requested changes Aug 11, 2020

View reviewed changes

flow/project.py Show resolved Hide resolved

flow/project.py Outdated Show resolved Hide resolved

Remove checks from run, submit, print_status, _generate_id methods. T…

417f0f2

…o be implemented in glotzerlab#336

b-butler approved these changes Aug 11, 2020

View reviewed changes

bdice approved these changes Aug 12, 2020

View reviewed changes

bdice merged commit 3fe6006 into glotzerlab:master Aug 12, 2020

bdice added this to the v0.11.0 milestone Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable aggregate logic in flow #324

Enable aggregate logic in flow #324

kidrahahjo commented Jul 16, 2020 •

edited

Loading

kidrahahjo commented Jul 16, 2020 •

edited

Loading

bdice left a comment

b-butler left a comment

kidrahahjo commented Jul 18, 2020

b-butler left a comment

b-butler Aug 10, 2020

b-butler Aug 10, 2020

kidrahahjo Aug 10, 2020

b-butler Aug 10, 2020

kidrahahjo Aug 10, 2020

csadorf Aug 11, 2020

kidrahahjo commented Aug 10, 2020

b-butler left a comment

b-butler Aug 10, 2020

b-butler left a comment

kidrahahjo commented Aug 10, 2020

csadorf left a comment

kidrahahjo commented Aug 11, 2020

csadorf left a comment •

edited

Loading

b-butler left a comment

b-butler left a comment

b-butler commented Aug 11, 2020

csadorf commented Aug 11, 2020

bdice left a comment

		assert len(self._jobs) == 1
		return f"{self.name}({str(self._jobs[0])})"

Enable aggregate logic in flow #324

Enable aggregate logic in flow #324

Conversation

kidrahahjo commented Jul 16, 2020 • edited Loading

Description

Motivation and Context

Types of Changes

Checklist:

kidrahahjo commented Jul 16, 2020 • edited Loading

bdice left a comment

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

kidrahahjo commented Jul 18, 2020

b-butler left a comment

Choose a reason for hiding this comment

b-butler Aug 10, 2020

Choose a reason for hiding this comment

b-butler Aug 10, 2020

Choose a reason for hiding this comment

kidrahahjo Aug 10, 2020

Choose a reason for hiding this comment

b-butler Aug 10, 2020

Choose a reason for hiding this comment

kidrahahjo Aug 10, 2020

Choose a reason for hiding this comment

csadorf Aug 11, 2020

Choose a reason for hiding this comment

kidrahahjo commented Aug 10, 2020

b-butler left a comment

Choose a reason for hiding this comment

b-butler Aug 10, 2020

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

kidrahahjo commented Aug 10, 2020

csadorf left a comment

Choose a reason for hiding this comment

kidrahahjo commented Aug 11, 2020

csadorf left a comment • edited Loading

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

b-butler commented Aug 11, 2020

csadorf commented Aug 11, 2020

bdice left a comment

Choose a reason for hiding this comment

kidrahahjo commented Jul 16, 2020 •

edited

Loading

kidrahahjo commented Jul 16, 2020 •

edited

Loading

csadorf left a comment •

edited

Loading