Implement concurrent.futures.ProcessPoolExecutor interface #1155

jan-janssen · 2023-07-04T18:32:45Z

Example code:

from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('test')

job = pr.create.job.Lammps("lmp")
job.structure = pr.create.structure.ase.bulk("Al", cubic=True)
job.server.executor = ProcessPoolExecutor()
job.server.cores = 2
fs = job.run()

print(fs.done())
print(fs.result())
print(fs.done())

Example code: ``` from concurrent.futures import ProcessPoolExecutor from pyiron_atomistics import Project pr = Project('test') job = pr.create.job.Lammps("lmp") job.structure = pr.create.structure.ase.bulk("Al", cubic=True) job.server.executor = ProcessPoolExecutor() job.server.cores = 2 fs = job.run() print(fs.done()) print(fs.result()) print(fs.done()) ```

jan-janssen · 2023-07-04T18:42:42Z

An alternative suggestion would be to attach the future object to job.server.future, then the run function does not return any output, if that is more desirable.

liamhuber

Yeah, looks good. I do think the future should be put somewhere else, probably on the server as you suggest. But that's the only non-local/big change; the rest is just a bunch of very specific requests.

EDIT: Ok actually, I have one more UX concern: the error raised when you mix executor children with masters is not very clear. E.g.

from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('tmp')
pr.remove_jobs(recursive=True, silently=True)

job = pr.create.job.Lammps("lmp")
job.structure = pr.create.structure.ase.bulk("Al", cubic=True)
job.server.executor = ProcessPoolExecutor(max_workers=4)

master = pr.create.job.Murnaghan("murn")
master.ref_job = job
master.run()

Runs fine without the executor line, but with it gives a stack trace ending in

File ~/work/pyiron/pyiron_base/pyiron_base/jobs/master/parallel.py:556, in ParallelMaster.run_static(self)
    554             self._run_if_master_modal_child_non_modal(job)
    555         else:
--> 556             raise TypeError()
    557 else:
    558     self.status.collect = True

TypeError:

It would be great to have something more informative. I also checked SerialMasterBase and it's got the same safety else: raise TypeError() clause that should keep the behaviour reasonable, but will need its own informative error message.

pyiron_base/jobs/job/extension/server/generic.py

pyiron_base/jobs/job/generic.py

pyiron_base/jobs/job/runfunction.py

tests/job/test_genericJob.py

pyiron_base/jobs/job/runfunction.py

jan-janssen · 2023-07-04T21:38:04Z

Yeah, looks good. I do think the future should be put somewhere else, probably on the server as you suggest. But that's the only non-local/big change; the rest is just a bunch of very specific requests.

The future is now moved to job.server.future.

EDIT: Ok actually, I have one more UX concern: the error raised when you mix executor children with masters is not very clear. E.g.

Done.

liamhuber

I like the modifications and they addressed all my original concerns, but per my comment on 1154 I played around a bit combining it with other elements of the codebase and ran into trouble.

The issue I see is that this is not yet deeply integrated enough with the rest of the Server.run_mode options. What is needed is to think through the different combinations of user actions, decide what the outcome of these actions should be, and then enforce these outcomes in the tests -- and the test part really is crticial, e.g. "turning it on and turning it off again" was never actually tested and my original suggestion resulted in an exception!

I'm not familiar off-hand with all the ways we have to modify the run_mode, but here is one example of what I'm thinking of:

from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('tmp')
pr.remove_jobs(recursive=True, silently=True)

job = pr.create.job.Lammps("lmp")
job.structure = pr.atomistics.structure.bulk("Al")
job.server.executor = ProcessPoolExecutor()
print(job.server.run_mode)
# >>> executor
job.interactive_open()
print(job.server.run_mode, job.server.executor is not None)
# >>> interactive True
job.run()
print(job.output.energy_pot, job.server.future)
# >>> [-3.36] None

job2 = job.copy()
print(job.server.run_mode, job2.server.run_mode)
# >>> interactive executor

I think it's fine that turning on interactive_open() supplanted the executor and left the future empty, but then It's weird that the server.executor stays populated. Even worse, if we copy the interactively-opened job, the copy winds up with a different run mode! In this particular case, I guess setting the run mode to interactive needs to clear the executor to None?

I don't have super strong opinions right now about what the behaviour should be, but it shouldn't be left up to chance/side effects and decisions about the intended interface need to be enforced by the tests.

@samwaseda, I think that after this PR is resolved most of the remaining changes for flux in particular over in #1154 will be pretty straightforward; it's probably more worthwhile for you to review here before reading there.

pyiron_base/jobs/job/extension/server/generic.py

Co-authored-by: Liam Huber <[email protected]>

jan-janssen · 2023-07-05T17:47:47Z

I think it's fine that turning on interactive_open() supplanted the executor and left the future empty, but then It's weird that the server.executor stays populated. Even worse, if we copy the interactively-opened job, the copy winds up with a different run mode! In this particular case, I guess setting the run mode to interactive needs to clear the executor to None?

I fixed the issue and reset the executor when the run_mode is changed afterwards. Currently, the interactive mode is not yet compatible to the executor, these are two separate developments and we have to think more how an integration could look like.

pyiron_base/jobs/job/extension/server/generic.py

Co-authored-by: Liam Huber <[email protected]>

jan-janssen · 2023-07-05T18:49:22Z

@liamhuber , @niklassiemer and @samwaseda Any other feedback? Otherwise I would like to merge this pull request, so we can merge the flux pull request on Friday and have everything ready next Monday.

pyiron_base/jobs/job/runfunction.py

pmrv · 2023-07-06T09:12:31Z

Am generally happy with this new interface, it feels more natural and powerful. If this works with other concurrent futures executors (and I don't see why not) it might also give us a why replace remote jobs with specific executors, like parsl or dask (notwithstanding their other shortcomings).

A small additional feature I would like is that Project.wait_for_job is aware of executor jobs and waits on the future objects.

One can of worms I think we need to open is canceling futures. Say I submit a bunch of things and make a mistake or my workflow converges and I don't need additional calculations. I suppose job.server.executor.shutdown() is always possible, but users will likely have one executor for a notebook, so that's a bit harsh. job.server.future.cancel() I guess will work to stop the calculation, but probably won't update the database entry of the job. I'm not sure if it is possible to attach something to the future that will take care of this, but a minimal fix would be to make Project.refresh_job_status aware of executors, so that it can reset the job status if it finds one with a canceled future. Whatever the solution GenericJob.kill should also cancel the underlying future if there is one.

Other than that, I think it's good stuff.

jan-janssen · 2023-07-06T12:21:06Z

A small additional feature I would like is that Project.wait_for_job is aware of executor jobs and waits on the future objects.

I like the idea and I guess we can extend this concept in a way that if you remove the jobs using Project.remove_jobs the futures are also cancelled.

The other way of cancelling the futures resulting the job status to change sounds more difficult to me and would require more of a rewrite of the integration of the job.status with the concurrent.futures. I agree that this would be great, but for me this is out of the scope of the current pull request.

…cutor

samwaseda · 2023-07-06T14:00:09Z

Do I understand correctly that with the last commit we can simply use job.status and access data from job if the job is finished? Or do I still need to call job.server.future.result()?

jan-janssen · 2023-07-06T14:05:23Z

Do I understand correctly that with the last commit we can simply use job.status and access data from job if the job is finished? Or do I still need to call job.server.future.result()?

You can always use job.status to check the status of the job, this worked the whole time. The last commit just implements to security checks:

it cancels the future object when you remove the job before it is executed, this helpful if you create ten jobs, submit them all to an executor and then decide to remove job five to eight before they are executed. Without this fix pyiron would start to run the job just to realize that it no longer exists. With this fix the Executor is informed when you delete the job.
if the user decides to cancel the job, using job.server.future.cancel() rather than the traditional job.status.aborted = True, then this is now correctly transferred to the job status, so the job status is set to aborted when the future object is cancelled.

samwaseda · 2023-07-06T14:08:39Z

ok that was not quite my question. I wanted to ask whether job.server.future.result() has to be called before the final results can be accessed.

jan-janssen · 2023-07-06T14:16:53Z

ok that was not quite my question. I wanted to ask whether job.server.future.result() has to be called before the final results can be accessed.

no, and this was never the case. You can use job.server.future.result() as alternative to pr.wait_for_job(job) but it is not mandatory.

liamhuber · 2023-07-06T21:05:18Z

I'm happy to stack a PR that resolves my outstanding concerns.

Aaand one of my kids needs to be picked up from daycare sick. Probably won't be able to do this.

niklassiemer

I am fine with this state of the new interface.

liamhuber · 2023-07-07T01:15:21Z

I agree that the development releases are the right way to move forward, but this functionality is not yet available, so for now I am to get his merged in the main branch.

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

jan-janssen · 2023-07-07T01:17:36Z

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

As far as I can see the dev release is not yet pushed to conda.

srmnitc · 2023-07-07T06:54:19Z

I agree that the development releases are the right way to move forward, but this functionality is not yet available, so for now I am to get his merged in the main branch.

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

Yes correct!

srmnitc · 2023-07-07T06:55:05Z

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

As far as I can see the dev release is not yet pushed to conda.

I was hoping some of you would add a review, since the tests pass, I will go ahead and merge.

srmnitc · 2023-07-07T07:29:46Z

@jan-janssen @liamhuber Both base and atomistics have the dev release functionality now.

Co-authored-by: Liam Huber <[email protected]>

liamhuber · 2023-07-07T15:35:23Z

I'm re-reviewing. Almost done. Please hold changes for a minute

liamhuber

I implemented your suggestion to not copy jobs with futures that aren't done and tested it
I still want the job-independent tests moved out of the job tests and into the server tests

pyiron_base/jobs/job/extension/server/generic.py

pyiron_base/jobs/job/generic.py

tests/job/test_genericJob.py

liamhuber · 2023-07-07T15:40:40Z

I'm re-reviewing. Almost done. Please hold changes for a minute

K super, thanks! EDIT: thanks for waiting. For future readers I'm not just thanking myself -- there was a timely thumbs up 😝

All done. A couple more minor requests re testing. I still get a little nervous about potential weird states that server.future can get into given how it's used elsewhere, but since no one can see any explicit issues or an alternative suggestion, let's just go for it. Definitely I see your point that we don't want to just delete it immediately. Disallowing copying jobs with a non-done() future was a good suggestion though, so that's in and tested.

Co-authored-by: Liam Huber <[email protected]>

pyiron_base/jobs/job/generic.py

jan-janssen · 2023-07-07T16:09:51Z

@liamhuber I merged your changes plus fixes - so from my perspective it is ready to be merged now. What do you think?

liamhuber

I would still like the server-only tests moved to the lowest place they can go, but that's not approval-prohibiting.

🚀

jan-janssen · 2023-07-07T16:19:43Z

I would still like the server-only tests moved to the lowest place they can go, but that's not approval-prohibiting.

Done

jan-janssen added the format_black reformat the code using the black standard label Jul 4, 2023

Format black

82dcea6

jan-janssen mentioned this pull request Jul 4, 2023

Reimplement the flux interface #1154

Merged

liamhuber requested changes Jul 4, 2023

View reviewed changes

jan-janssen commented Jul 4, 2023

View reviewed changes

pyiron_base/jobs/job/runfunction.py Outdated Show resolved Hide resolved

jan-janssen added format_black reformat the code using the black standard and removed format_black reformat the code using the black standard labels Jul 4, 2023

jan-janssen added 2 commits July 4, 2023 15:52

Implement suggestions by Liam

4b85865

Test fix

5fb644c

jan-janssen force-pushed the future_executor branch from fc997f7 to 5fb644c Compare July 4, 2023 21:53

liamhuber requested changes Jul 5, 2023

View reviewed changes

pyiron_base/jobs/job/extension/server/generic.py Outdated Show resolved Hide resolved

jan-janssen and others added 2 commits July 5, 2023 11:36

Update pyiron_base/jobs/job/extension/server/generic.py

5290f3e

Co-authored-by: Liam Huber <[email protected]>

Reset executor when runmode is changed

22ac25d

liamhuber reviewed Jul 5, 2023

View reviewed changes

pyiron_base/jobs/job/extension/server/generic.py Outdated Show resolved Hide resolved

Update pyiron_base/jobs/job/extension/server/generic.py

04e4c55

Co-authored-by: Liam Huber <[email protected]>

pmrv reviewed Jul 6, 2023

View reviewed changes

pyiron_base/jobs/job/runfunction.py Show resolved Hide resolved

jan-janssen added 2 commits July 6, 2023 07:54

Fix refresh_job_status() and remove()

c9ef6ee

Merge remote-tracking branch 'origin/future_executor' into future_exe…

feacd22

…cutor

wait_for_job() use job.server.future.result() when available

54da7eb

Update pyiron_base/jobs/job/extension/server/generic.py

86a34bc

Update pyiron_base/jobs/job/extension/server/generic.py

c7f67ff

niklassiemer approved these changes Jul 6, 2023

View reviewed changes

pmrv mentioned this pull request Jul 7, 2023

Forbid changing the job.server.run_mode after job.run #1157

Open

srmnitc mentioned this pull request Jul 7, 2023

add possibility for dev release conda-forge/pyiron_atomistics-feedstock#110

Merged

5 tasks

Update tests/job/test_genericJob.py

cc16753

Co-authored-by: Liam Huber <[email protected]>

liamhuber requested changes Jul 7, 2023

View reviewed changes

jan-janssen and others added 3 commits July 7, 2023 09:54

Update tests/job/test_genericJob.py

222c4f5

Co-authored-by: Liam Huber <[email protected]>

Update pyiron_base/jobs/job/generic.py

e9df9b1

Co-authored-by: Liam Huber <[email protected]>

Update tests/job/test_genericJob.py

c01375c

Co-authored-by: Liam Huber <[email protected]>

jan-janssen added format_black reformat the code using the black standard and removed format_black reformat the code using the black standard labels Jul 7, 2023

Format black

c000ddc

jan-janssen commented Jul 7, 2023

View reviewed changes

pyiron_base/jobs/job/generic.py Outdated Show resolved Hide resolved

Update pyiron_base/jobs/job/generic.py

186c781

liamhuber approved these changes Jul 7, 2023

View reviewed changes

Move test_job_executor_set() to test_server.py

30334a7

jan-janssen merged commit 2dd3b0b into main Jul 7, 2023

delete-merged-branch bot deleted the future_executor branch July 7, 2023 16:34

jan-janssen mentioned this pull request Jan 10, 2024

Parallel pyiron table #1050

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement concurrent.futures.ProcessPoolExecutor interface #1155

Implement concurrent.futures.ProcessPoolExecutor interface #1155

jan-janssen commented Jul 4, 2023

jan-janssen commented Jul 4, 2023

liamhuber left a comment •

edited

Loading

jan-janssen commented Jul 4, 2023

liamhuber left a comment

jan-janssen commented Jul 5, 2023

jan-janssen commented Jul 5, 2023

pmrv commented Jul 6, 2023

jan-janssen commented Jul 6, 2023

samwaseda commented Jul 6, 2023

jan-janssen commented Jul 6, 2023

samwaseda commented Jul 6, 2023

jan-janssen commented Jul 6, 2023

liamhuber commented Jul 6, 2023

niklassiemer left a comment

liamhuber commented Jul 7, 2023

jan-janssen commented Jul 7, 2023

srmnitc commented Jul 7, 2023

srmnitc commented Jul 7, 2023

srmnitc commented Jul 7, 2023

liamhuber commented Jul 7, 2023

liamhuber left a comment

liamhuber commented Jul 7, 2023 •

edited

Loading

jan-janssen commented Jul 7, 2023

liamhuber left a comment

jan-janssen commented Jul 7, 2023

Implement concurrent.futures.ProcessPoolExecutor interface #1155

Implement concurrent.futures.ProcessPoolExecutor interface #1155

Conversation

jan-janssen commented Jul 4, 2023

jan-janssen commented Jul 4, 2023

liamhuber left a comment • edited Loading

Choose a reason for hiding this comment

jan-janssen commented Jul 4, 2023

liamhuber left a comment

Choose a reason for hiding this comment

jan-janssen commented Jul 5, 2023

jan-janssen commented Jul 5, 2023

pmrv commented Jul 6, 2023

jan-janssen commented Jul 6, 2023

samwaseda commented Jul 6, 2023

jan-janssen commented Jul 6, 2023

samwaseda commented Jul 6, 2023

jan-janssen commented Jul 6, 2023

liamhuber commented Jul 6, 2023

niklassiemer left a comment

Choose a reason for hiding this comment

liamhuber commented Jul 7, 2023

jan-janssen commented Jul 7, 2023

srmnitc commented Jul 7, 2023

srmnitc commented Jul 7, 2023

srmnitc commented Jul 7, 2023

liamhuber commented Jul 7, 2023

liamhuber left a comment

Choose a reason for hiding this comment

liamhuber commented Jul 7, 2023 • edited Loading

jan-janssen commented Jul 7, 2023

liamhuber left a comment

Choose a reason for hiding this comment

jan-janssen commented Jul 7, 2023

liamhuber left a comment •

edited

Loading

liamhuber commented Jul 7, 2023 •

edited

Loading