Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Aggregation Feature to Flow #336

Closed
wants to merge 96 commits into from

Conversation

kidrahahjo
Copy link
Collaborator

Description

Introducing the use of Aggregate class in flow.
This class includes features like:

  1. Aggregating by numbers
  2. Aggregating by a condition/statepoint-param
  3. Sorting by a statepoint-parameter
  4. Filtering jobs using the select parameter

Motivation and Context

The current PR for aggregation is too big. Now that PR will track my project and I'll be splitting the work into several small PRs.

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog.

Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments -- I'll do a more thorough review later, but this should be a good starting point for general style-related things to fix throughout the PR.

return [jobs]

if not callable(aggregator):
raise TypeError("Expected callable for aggregator, got {}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use f-strings, here and below.

flow/aggregate.py Outdated Show resolved Hide resolved
flow/aggregate.py Outdated Show resolved Hide resolved
Comment on lines +67 to +70
if getattr(aggregator, '_num', False):
self._is_aggregate = False if aggregator._num == 1 else True
else:
self._is_aggregate = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equivalent - please check it.

Suggested change
if getattr(aggregator, '_num', False):
self._is_aggregate = False if aggregator._num == 1 else True
else:
self._is_aggregate = True
self._is_aggregate = getattr(aggregator, '_num', 0) != 1

flow/aggregate.py Show resolved Hide resolved
"got {}".format(type(default)))

def keyfunction(job):
return [job.sp.get(key, default[i]) for i, key in enumerate(keys)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rewrite this to use zip instead of enumerate. This might work:

Suggested change
return [job.sp.get(key, default[i]) for i, key in enumerate(keys)]
return [job.sp.get(key, default_value) for key, default_value in zip(keys, default)]

r"""This class handles the creation of aggregates.

.. note::
This class should not be instantiated by users directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should be private, according to this comment.

flow/aggregate.py Outdated Show resolved Hide resolved
if len(jobs) == 1:
return str(jobs[0]) # Return job id as it's already unique

blob = ''.join((job.get_id() for job in jobs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to separate the job ids with a comma for conceptual clarity. I know this isn't really "meaningful" since it's hashed, but I think it makes the purpose clearer.

Suggested change
blob = ''.join((job.get_id() for job in jobs))
blob = ','.join((job.get_id() for job in jobs))

flow/aggregate.py Outdated Show resolved Hide resolved
@kidrahahjo
Copy link
Collaborator Author

@bdice I addressed your comments in #348

@kidrahahjo kidrahahjo marked this pull request as draft August 31, 2020 16:35
@kidrahahjo
Copy link
Collaborator Author

To be marked as ready for review after #335 gets merged

@kidrahahjo kidrahahjo force-pushed the feature/enable-aggregate-status-logic branch from e3c94ec to 619f9bf Compare September 1, 2020 15:19
kidrahahjo added a commit that referenced this pull request Sep 1, 2020
* Add the private Aggregate class having fundamental concepts

* Tests for _Aggregate class added

* Logic for _condition, BaseFlowOperation, FlowGroup added

* Logic extended for exec command

* Logic extended for next command

* Logic for run command

* Logic for script command added

* Logic for submit and status enabled

* Make tests pass, replaced by-job grouping with by-op grouping

* Style Fix

* Remove print statement

* Suggested changes implemented with refactoring of Aggregate class and FlowGroup

* Documentation Edit and implement suggested changes

* Improve status fetching performance

* Use get_id() instead of the id property

* Remove the use of Aggregate class and refactored

* Minute document change

* Apply suggested changes

* Deprecate eligible and complete methods of Flow{Group|Operation}

* Move property method job to JobOperation only

* Apply suggested changes

* Apply suggested changes

* Apply suggested changes

* Change variable name

* Insert assert statement to check len(_JobOperation._jobs) == 1

* Internally change the use to lists to tuples

* Additional suggested changes from code review

* Add suggested changes

* Revert accidental changes

* Use repr(job) instead of str(job) for __repr__ method and remove instance check

* Use f-strings

* Remove checks from run, submit, print_status, _generate_id methods. To be implemented in #336
@kidrahahjo kidrahahjo force-pushed the feature/enable-aggregate-status-logic branch from 619f9bf to 5493b4b Compare September 1, 2020 15:28
Base automatically changed from feature/enable-aggregate-status-logic to feature/enable-aggregation November 16, 2020 15:39
Base automatically changed from feature/enable-aggregation to next November 16, 2020 16:09
@bdice
Copy link
Member

bdice commented Nov 23, 2020

This PR is on hold until we make improvements to next as discussed in #362, in addition to code cleanup and performance improvement. (Discussed with @vyasr and @kidrahahjo via Slack.)

@csadorf csadorf added the blocked Dependent on something else label Nov 27, 2020
Base automatically changed from next to master January 20, 2021 18:59
@kidrahahjo kidrahahjo removed the blocked Dependent on something else label Jan 28, 2021
@kidrahahjo kidrahahjo added this to the v0.13.0 milestone Jan 28, 2021
@kidrahahjo
Copy link
Collaborator Author

Moved to #464

@kidrahahjo kidrahahjo closed this Feb 11, 2021
@kidrahahjo kidrahahjo deleted the feature/introduce-aggregator-classes branch February 11, 2021 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregation GSoC Google Summer of Code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants