Duplicate entity name prevention #480

amandarichardsonn · 2024-02-07T22:46:36Z

This PR prevents the launch of entities with names that already exist in the Experiment.

MattToast

Gave a quick skim: just a couple of small requests and some threads to pull on while you work out the CI

smartsim/_core/control/controller.py

MattToast · 2024-02-07T23:25:38Z

smartsim/_core/control/controller.py

@@ -512,6 +512,14 @@ def _launch_step(
        :raises SmartSimError: if launch fails
        """
        try:
+            if (
+                entity.name in self._jobs.completed


Are we sure about this check? Won't this prevent relaunching a completed model?

exp = Experiment(...) rs = exp.create_run_settings(...) model = exp.create_model(run_settings=rs, ...) exp.start(model, block=True) # Very explicitly wait for the model to complete exp.start(model, block=True) # <-- I do not think we want this to error

@MattToast I think I pushed this idea because running the same model would overwrite outputs from the first run. Do we care about the outputs?

It's not super common for anything I write, but I know there have been use cases were users run a model, ignore the outputs or programmatically move them somewhere else, re-generate the model dir, and then re-run the same model.

Another more common case (at least for me) is, when a user is working in a jupyter notebook, they might start up an orchestrator, stop it for some reason, but then decide to relaunch the same orc object without restarting the kernel or creating a new experiment.

I'd be hesitant to make this change just because this is behavior that we have billed as a feature in the past (see tests/on_wlm/test_restart.py) and would constitute an pretty significant API break.

Which isn't to say that I think the API itself isn't somewhat incredibly convoluted, especially now that we are specifically trying to enforce a name uniqueness requirement that was in the past, at best, a suggestion, despite our assumption that they were. Unfortunately, that does mean that a user could start another model of the same name if they let the first model finsish. e.g.

exp = Experiment("multi-start") rs = exp.create_run_settings("echo", ["this", "just", "works"]) model_1 = exp.create_model("my-model", rs) model_2 = exp.create_model("my-model", rs) # <-- note the same name exp.start(model_1, block=True) exp.start(model_2, block=True) # <-- This, unfortunately, now just works # which, I would argue, is partially what # we are trying to discourage

I suppose we could check to see if the user is trying to start the same object with a

if (entity.name in self._jobs.completed and entity is not self._jobs.completed[entity.name]): ... # TODO: block the launch

condition, but I think the same problem exists as a model's run settings is a writable attribute. So a user could effectively just launch the same model, with the same name, with a completely different set of run settings, e.g.

exp = Experiment("multi-start") rs_1 = exp.create_run_settings("echo", ["my", "model"]) model = exp.create_model("my-model") exp.start(model, block=True) rs_2 = exp.create_run_settings("echo", ["my", "completely", "different", "model"]) model.run_settings = rs_2 exp.start(model) # <-- Effectively just ran two different models # with the same name

somewhat defeating the point of the check.

For that reason, I think we should scope this PR to "preventing duplicate simultaneously running entity names launched through the same experiment", and decide how (or even if) we want to enforce a true unique names constraint in a later spike. Does that seem like a good scope @amandarichardsonn @ankona?

I have the current implementation below that will allow users to restart a completed entity but will not allow users to run an entity of the same name that is completed or running. I think this is a bandaid patch and should be revisited.

completed_job = self._jobs.completed.get(entity_passed_in.name, None) if completed_job is None and is not a running job: launch elif completed_job is not None and completed_job.entity is entity_passed_in: launch else: raise

However, with this change above, users are now required to specify a db_identifier var if launching multiple Orchestrators. The implementation caught a SS test that launched two orchs with no db_identifiers, and therefore threw on same name. Do we want this?

I say if this is more complex than the implementation that I currently have - we scope down to what you mentioned @MattToast. Is this something we should discuss with the team as far as allowing users to overwrite entity info by relaunching before merging in? or is this more supporting SS currently functionality intentions

tests/test_controller_errors.py

codecov · 2024-02-09T00:34:06Z

Codecov Report

Attention: Patch coverage is 40.00000% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 90.75%. Comparing base (8408368) to head (81d9140).
Report is 12 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #480      +/-   ##
===========================================
- Coverage    90.83%   90.75%   -0.09%     
===========================================
  Files           60       60              
  Lines         3818     3838      +20     
===========================================
+ Hits          3468     3483      +15     
- Misses         350      355       +5

Files	Coverage Δ
smartsim/_core/control/controller.py	`85.95% <40.00%> (-1.26%)`	⬇️

... and 2 files with indirect coverage changes

MattToast

One last minor nit, but nothing worth holding approval up over! LGTM!!

tests/test_controller_errors.py

@MattToast

Removes behavior deprecated in #480 from test suite. [ committed by @MattToast ] [ reviewed by @mellis13 ]

amandarichardsonn added 5 commits February 6, 2024 18:53

pushing changes

204218d

pushing changes

5782bad

pushing to create PR

783774c

Merge branch 'develop' into double-entity-name

448e698

small changes

3ce069b

MattToast self-requested a review February 7, 2024 23:14

MattToast requested changes Feb 7, 2024

View reviewed changes

amandarichardsonn added 3 commits February 7, 2024 17:53

edits

78742b5

changes

1954c9c

Merge branch 'develop' into double-entity-name

37c53fb

fix issue

fb31308

amandarichardsonn requested review from ankona and MattToast February 9, 2024 18:38

MattToast approved these changes Feb 21, 2024

View reviewed changes

tests/test_controller_errors.py Outdated Show resolved Hide resolved

matt comments

81d9140

amandarichardsonn merged commit 39354db into CrayLabs:develop Feb 22, 2024
34 checks passed

amandarichardsonn deleted the double-entity-name branch February 22, 2024 23:01

MattToast mentioned this pull request Mar 13, 2024

Remove duplicate launched model names from full test suite #520

Merged

MattToast added a commit that referenced this pull request Mar 18, 2024

Remove duplicate launched model names from full test suite (#520)

6dea582

Removes behavior deprecated in #480 from test suite. [ committed by @MattToast ] [ reviewed by @mellis13 ]

amandarichardsonn mentioned this pull request Apr 29, 2024

Automate release notes #568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate entity name prevention #480

Duplicate entity name prevention #480

amandarichardsonn commented Feb 7, 2024

MattToast left a comment

MattToast Feb 7, 2024

ankona Feb 8, 2024 •

edited

Loading

MattToast Feb 8, 2024 •

edited

Loading

MattToast Feb 8, 2024 •

edited

Loading

amandarichardsonn Feb 9, 2024

amandarichardsonn Feb 9, 2024 •

edited

Loading

codecov bot commented Feb 9, 2024 •

edited

Loading

MattToast left a comment

Duplicate entity name prevention #480

Duplicate entity name prevention #480

Conversation

amandarichardsonn commented Feb 7, 2024

MattToast left a comment

Choose a reason for hiding this comment

MattToast Feb 7, 2024

Choose a reason for hiding this comment

ankona Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

MattToast Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

MattToast Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

amandarichardsonn Feb 9, 2024

Choose a reason for hiding this comment

amandarichardsonn Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Feb 9, 2024 • edited Loading

Codecov Report

MattToast left a comment

Choose a reason for hiding this comment

ankona Feb 8, 2024 •

edited

Loading

MattToast Feb 8, 2024 •

edited

Loading

MattToast Feb 8, 2024 •

edited

Loading

amandarichardsonn Feb 9, 2024 •

edited

Loading

codecov bot commented Feb 9, 2024 •

edited

Loading