Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement task restart policies #280

Merged
merged 63 commits into from
Jan 23, 2025
Merged
Changes from 1 commit
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
7f752b3
Added placeholder tests for proposed methods
ianmkenney Jul 16, 2024
dd8f0e9
Added models for new node types
ianmkenney Jul 16, 2024
da17e45
Updated new GufeTokenizable models in statestore
ianmkenney Jul 17, 2024
b7f63d4
Added placeholder unit tests for new models
ianmkenney Jul 17, 2024
6a167f1
Added validation and unit tests for storage models
ianmkenney Jul 18, 2024
a10e235
Added `taskhub_sk` to `TaskRestartPattern`
ianmkenney Jul 22, 2024
b99d8ef
Added `statestore` methods for restart patterns
ianmkenney Jul 22, 2024
39f9868
Added APPLIES relationship when adding pattern
ianmkenney Jul 25, 2024
988155f
Establish APPLIES when actioning a Task
ianmkenney Jul 26, 2024
d3f25f8
Canceling a Task removes the APPLIES relationship
ianmkenney Jul 26, 2024
510ae66
Task status changes affect APPLIES relationship
ianmkenney Aug 1, 2024
2310fd5
Tests for Task status change on APPLIES
ianmkenney Aug 4, 2024
ea2851f
Added method (unimplemented) calls for restarts
ianmkenney Aug 4, 2024
8e011be
Implemented add_protocol_dag_result_ref_traceback
ianmkenney Aug 5, 2024
4f07dde
Started implementation of restart resolution
ianmkenney Aug 6, 2024
78c4551
Tracebacks now include key data from its source units
ianmkenney Aug 7, 2024
7acc003
Built out custom fixture for testing restart policies
ianmkenney Aug 13, 2024
03d9fa1
Added the `chainable` decorator to Neo4jStore
ianmkenney Aug 19, 2024
aad97e3
Resolve task restarts now sets all remaining tasks to waiting
ianmkenney Aug 19, 2024
a655dc7
Corrected resolution logic
ianmkenney Aug 19, 2024
5bb6700
Extracted complexity out of test_resolve_task_restarts
ianmkenney Aug 23, 2024
fe4b87b
resolve restart of tasks with no tracebacks
ianmkenney Aug 23, 2024
8a6f980
Replaced many maps with a for loop
ianmkenney Aug 23, 2024
93eb5f5
Small changes from review
dotsdl Sep 4, 2024
0900f39
Chainable now uses the update_wrapper function
ianmkenney Sep 9, 2024
c8ddafc
Updated Traceback class
ianmkenney Sep 9, 2024
2a59499
Renamed Traceback to Tracebacks
ianmkenney Sep 9, 2024
148d048
Updated cancel and increment logic
ianmkenney Sep 9, 2024
645b2e4
Fixed query for deleting the APPLIES relationship
ianmkenney Sep 9, 2024
3a8eeca
Removed unused testing fixture
ianmkenney Sep 9, 2024
ea6e66f
Clarified comment and added complimentary assertion
ianmkenney Sep 9, 2024
7a4b114
Small changes to Tracebacks
dotsdl Sep 13, 2024
cf0e961
Merge pull request #286 from OpenFreeEnergy/feature/iss-277-restart-p…
ianmkenney Sep 19, 2024
6066796
Fix for Tracebacks unit tests
ianmkenney Sep 24, 2024
fcf77a0
Added API endpoints for managing restart policies
ianmkenney Sep 25, 2024
cea16bc
Added untested client method for task restart policies
ianmkenney Oct 1, 2024
a4da776
Added testing for client methods dealing with restart policies
ianmkenney Oct 1, 2024
fdc25a7
`get_taskhub` calls `get_taskhubs`
ianmkenney Oct 7, 2024
51194ff
Updated docstrings
ianmkenney Oct 7, 2024
f03417c
Merge branch 'main' into feature/iss-277-restart-policy
ianmkenney Oct 8, 2024
977c896
Added docstrings to client methods
ianmkenney Oct 21, 2024
2d2d8f6
Added Task restart patterns to user guide
ianmkenney Oct 21, 2024
d7dcd5c
Link to python classes and methods in restart pattern section
ianmkenney Oct 21, 2024
006e689
Merge branch 'main' into feature/iss-277-restart-policy
dotsdl Oct 25, 2024
d331cc4
Merge branch 'main' into feature/iss-277-restart-policy
dotsdl Dec 3, 2024
c468b43
statestore edits from review
dotsdl Jan 3, 2025
bb5dbcd
Tracebacks model doc fix
dotsdl Jan 3, 2025
3776c7a
Consistency fix to TaskRestartPattern._defaults
dotsdl Jan 3, 2025
b4865fd
Docstring updates to client; token validation to interface api restar…
dotsdl Jan 3, 2025
555ba62
Merge branch 'main' into feature/iss-277-restart-policy
dotsdl Jan 3, 2025
893a790
Review edits
dotsdl Jan 3, 2025
2787527
Edits from review
dotsdl Jan 20, 2025
7ba1b4f
Black!
dotsdl Jan 20, 2025
0220e00
User guide fixes, consistency edits
dotsdl Jan 20, 2025
ae584eb
Cypher fix
dotsdl Jan 20, 2025
e6d1ece
Test fix
dotsdl Jan 20, 2025
cf4ebb3
Another test fix
dotsdl Jan 21, 2025
2af71b9
Remove testing of GufeTokenizable level dict keys
ianmkenney Jan 21, 2025
9e991d6
Remove unnecessary comment about taskhub validation
ianmkenney Jan 21, 2025
d1e4726
Remove unused tests that will not be implemented
ianmkenney Jan 21, 2025
280ef48
Compare `applies_count` to expected value in tests
ianmkenney Jan 21, 2025
7f546d0
Removed __eq__ method for TaskRestartPattern
ianmkenney Jan 21, 2025
4056752
Use standard dict instead of defaultdict
ianmkenney Jan 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Implemented add_protocol_dag_result_ref_traceback
* Renamed add_task_traceback to add_protocol_dag_result_ref_traceback
* Added tests for add_protocol_dag_result_ref_traceback
ianmkenney committed Aug 5, 2024
commit 8e011beccce70dc4fcac041c36dc673d087990b0
4 changes: 3 additions & 1 deletion alchemiscale/compute/api.py
Original file line number Diff line number Diff line change
@@ -271,7 +271,9 @@ def set_task_result(
if protocoldagresultref.ok:
n4js.set_task_complete(tasks=[task_sk])
else:
n4js.add_task_traceback(task_sk, pdr.protocol_unit_failures, result_sk)
n4js.add_protocol_dag_result_ref_traceback(
pdr.protocol_unit_failures, result_sk
)
n4js.set_task_error(tasks=[task_sk])
n4js.resolve_task_restarts(tasks=[task_sk])

44 changes: 41 additions & 3 deletions alchemiscale/storage/statestore.py
Original file line number Diff line number Diff line change
@@ -30,6 +30,7 @@
TaskHub,
TaskRestartPattern,
TaskStatusEnum,
Traceback,
)
from ..strategies import Strategy
from ..models import Scope, ScopedKey
@@ -2417,13 +2418,50 @@ def get_task_failures(self, task: ScopedKey) -> List[ProtocolDAGResultRef]:
"""
return self._get_protocoldagresultrefs(q, task)

def add_task_traceback(
def add_protocol_dag_result_ref_traceback(
self,
task_scoped_key: ScopedKey,
protocol_unit_failures: List[ProtocolUnitFailure],
protocol_dag_result_ref_scoped_key: ScopedKey,
):
raise NotImplementedError
subgraph = Subgraph()

with self.transaction() as tx:

query = """
MATCH (pdrr:ProtocolDAGResultRef {`_scoped_key`: $protocol_dag_result_ref_scoped_key})
RETURN pdrr
"""

pdrr_result = tx.run(
query,
protocol_dag_result_ref_scoped_key=str(
protocol_dag_result_ref_scoped_key
),
).to_eager_result()

try:
protocol_dag_result_ref_node = record_data_to_node(
pdrr_result.records[0]["pdrr"]
)
except IndexError:
raise KeyError("Could not find ProtocolDAGResultRef in database.")

tracebacks = list(map(lambda puf: puf.traceback, protocol_unit_failures))
traceback = Traceback(tracebacks)

_, traceback_node, _ = self._gufe_to_subgraph(
traceback.to_shallow_dict(),
labels=["GufeTokenizable", traceback.__class__.__name__],
gufe_key=traceback.key,
scope=protocol_dag_result_ref_scoped_key.scope,
)

subgraph |= Relationship.type("DETAILS")(
traceback_node,
protocol_dag_result_ref_node,
)

merge_subgraph(tx, subgraph, "GufeTokenizable", "_scoped_key")

def set_task_status(
self, tasks: List[ScopedKey], status: TaskStatusEnum, raise_error: bool = False
58 changes: 58 additions & 0 deletions alchemiscale/tests/integration/storage/test_statestore.py
Original file line number Diff line number Diff line change
@@ -1944,6 +1944,64 @@ def test_get_task_failures(
assert pdr_ref_sk in failure_pdr_ref_sks
assert pdr_ref2_sk in failure_pdr_ref_sks

@pytest.mark.parametrize("failure_count", (1, 2, 3, 4))
def test_add_protocol_dag_result_ref_traceback(
self,
network_tyk2_failure,
n4js,
scope_test,
transformation_failure,
protocoldagresults_failure,
failure_count: int,
):

an = network_tyk2_failure.copy_with_replacements(
name=network_tyk2_failure.name
+ "_test_add_protocol_dag_result_ref_traceback"
)
n4js.assemble_network(an, scope_test)
transformation_scoped_key = n4js.get_scoped_key(
transformation_failure, scope_test
)

# create a task; pretend we computed it, submit reference for pre-baked
# result
task_scoped_key = n4js.create_task(transformation_scoped_key)

protocol_unit_failure = protocoldagresults_failure[0].protocol_unit_failures[0]

pdrr = ProtocolDAGResultRef(
scope=task_scoped_key.scope,
obj_key=protocoldagresults_failure[0].key,
ok=protocoldagresults_failure[0].ok(),
)

# push the result
pdrr_scoped_key = n4js.set_task_result(task_scoped_key, pdrr)

protocol_unit_failures = []
for failure_index in range(failure_count):
protocol_unit_failures.append(
protocol_unit_failure.copy_with_replacements(
traceback=protocol_unit_failure.traceback + "_" + str(failure_index)
)
)

n4js.add_protocol_dag_result_ref_traceback(
protocol_unit_failures, pdrr_scoped_key
)

query = """
MATCH (traceback:Traceback)-[:DETAILS]->(:ProtocolDAGResultRef {`_scoped_key`: $pdrr_scoped_key})
RETURN traceback
"""

results = n4js.execute_query(query, pdrr_scoped_key=str(pdrr_scoped_key))

returned_tracebacks = results.records[0]["traceback"]["tracebacks"]

assert returned_tracebacks == [puf.traceback for puf in protocol_unit_failures]

### task restart policies

class TestTaskRestartPolicy: