Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement concurrent.futures.ProcessPoolExecutor interface #1155

Merged
merged 38 commits into from
Jul 7, 2023

Conversation

jan-janssen
Copy link
Member

Example code:

from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('test')

job = pr.create.job.Lammps("lmp")
job.structure = pr.create.structure.ase.bulk("Al", cubic=True)
job.server.executor = ProcessPoolExecutor()
job.server.cores = 2
fs = job.run()

print(fs.done())
print(fs.result())
print(fs.done())

Example code:
```
from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('test')

job = pr.create.job.Lammps("lmp")
job.structure = pr.create.structure.ase.bulk("Al", cubic=True)
job.server.executor = ProcessPoolExecutor()
job.server.cores = 2
fs = job.run()

print(fs.done())
print(fs.result())
print(fs.done())
```
@jan-janssen jan-janssen added the format_black reformat the code using the black standard label Jul 4, 2023
@jan-janssen
Copy link
Member Author

An alternative suggestion would be to attach the future object to job.server.future, then the run function does not return any output, if that is more desirable.

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good. I do think the future should be put somewhere else, probably on the server as you suggest. But that's the only non-local/big change; the rest is just a bunch of very specific requests.

EDIT: Ok actually, I have one more UX concern: the error raised when you mix executor children with masters is not very clear. E.g.

from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('tmp')
pr.remove_jobs(recursive=True, silently=True)

job = pr.create.job.Lammps("lmp")
job.structure = pr.create.structure.ase.bulk("Al", cubic=True)
job.server.executor = ProcessPoolExecutor(max_workers=4)

master = pr.create.job.Murnaghan("murn")
master.ref_job = job
master.run()

Runs fine without the executor line, but with it gives a stack trace ending in

File ~/work/pyiron/pyiron_base/pyiron_base/jobs/master/parallel.py:556, in ParallelMaster.run_static(self)
    554             self._run_if_master_modal_child_non_modal(job)
    555         else:
--> 556             raise TypeError()
    557 else:
    558     self.status.collect = True

TypeError: 

It would be great to have something more informative. I also checked SerialMasterBase and it's got the same safety else: raise TypeError() clause that should keep the behaviour reasonable, but will need its own informative error message.

pyiron_base/jobs/job/extension/server/generic.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/extension/server/generic.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/extension/server/generic.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/generic.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/runfunction.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/runfunction.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/runfunction.py Outdated Show resolved Hide resolved
pyiron_base/jobs/job/runfunction.py Outdated Show resolved Hide resolved
tests/job/test_genericJob.py Show resolved Hide resolved
pyiron_base/jobs/job/runfunction.py Show resolved Hide resolved
@jan-janssen jan-janssen added format_black reformat the code using the black standard and removed format_black reformat the code using the black standard labels Jul 4, 2023
@jan-janssen
Copy link
Member Author

Yeah, looks good. I do think the future should be put somewhere else, probably on the server as you suggest. But that's the only non-local/big change; the rest is just a bunch of very specific requests.

The future is now moved to job.server.future.

EDIT: Ok actually, I have one more UX concern: the error raised when you mix executor children with masters is not very clear. E.g.

Done.

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the modifications and they addressed all my original concerns, but per my comment on 1154 I played around a bit combining it with other elements of the codebase and ran into trouble.

The issue I see is that this is not yet deeply integrated enough with the rest of the Server.run_mode options. What is needed is to think through the different combinations of user actions, decide what the outcome of these actions should be, and then enforce these outcomes in the tests -- and the test part really is crticial, e.g. "turning it on and turning it off again" was never actually tested and my original suggestion resulted in an exception!

I'm not familiar off-hand with all the ways we have to modify the run_mode, but here is one example of what I'm thinking of:

from concurrent.futures import ProcessPoolExecutor
from pyiron_atomistics import Project

pr = Project('tmp')
pr.remove_jobs(recursive=True, silently=True)

job = pr.create.job.Lammps("lmp")
job.structure = pr.atomistics.structure.bulk("Al")
job.server.executor = ProcessPoolExecutor()
print(job.server.run_mode)
# >>> executor
job.interactive_open()
print(job.server.run_mode, job.server.executor is not None)
# >>> interactive True
job.run()
print(job.output.energy_pot, job.server.future)
# >>> [-3.36] None

job2 = job.copy()
print(job.server.run_mode, job2.server.run_mode)
# >>> interactive executor

I think it's fine that turning on interactive_open() supplanted the executor and left the future empty, but then It's weird that the server.executor stays populated. Even worse, if we copy the interactively-opened job, the copy winds up with a different run mode! In this particular case, I guess setting the run mode to interactive needs to clear the executor to None?

I don't have super strong opinions right now about what the behaviour should be, but it shouldn't be left up to chance/side effects and decisions about the intended interface need to be enforced by the tests.

@samwaseda, I think that after this PR is resolved most of the remaining changes for flux in particular over in #1154 will be pretty straightforward; it's probably more worthwhile for you to review here before reading there.

pyiron_base/jobs/job/extension/server/generic.py Outdated Show resolved Hide resolved
@jan-janssen
Copy link
Member Author

I think it's fine that turning on interactive_open() supplanted the executor and left the future empty, but then It's weird that the server.executor stays populated. Even worse, if we copy the interactively-opened job, the copy winds up with a different run mode! In this particular case, I guess setting the run mode to interactive needs to clear the executor to None?

I fixed the issue and reset the executor when the run_mode is changed afterwards. Currently, the interactive mode is not yet compatible to the executor, these are two separate developments and we have to think more how an integration could look like.

@jan-janssen
Copy link
Member Author

@liamhuber , @niklassiemer and @samwaseda Any other feedback? Otherwise I would like to merge this pull request, so we can merge the flux pull request on Friday and have everything ready next Monday.

@pmrv
Copy link
Contributor

pmrv commented Jul 6, 2023

Am generally happy with this new interface, it feels more natural and powerful. If this works with other concurrent futures executors (and I don't see why not) it might also give us a why replace remote jobs with specific executors, like parsl or dask (notwithstanding their other shortcomings).

A small additional feature I would like is that Project.wait_for_job is aware of executor jobs and waits on the future objects.

One can of worms I think we need to open is canceling futures. Say I submit a bunch of things and make a mistake or my workflow converges and I don't need additional calculations. I suppose job.server.executor.shutdown() is always possible, but users will likely have one executor for a notebook, so that's a bit harsh. job.server.future.cancel() I guess will work to stop the calculation, but probably won't update the database entry of the job. I'm not sure if it is possible to attach something to the future that will take care of this, but a minimal fix would be to make Project.refresh_job_status aware of executors, so that it can reset the job status if it finds one with a canceled future. Whatever the solution GenericJob.kill should also cancel the underlying future if there is one.

Other than that, I think it's good stuff.

@jan-janssen
Copy link
Member Author

A small additional feature I would like is that Project.wait_for_job is aware of executor jobs and waits on the future objects.

I like the idea and I guess we can extend this concept in a way that if you remove the jobs using Project.remove_jobs the futures are also cancelled.

The other way of cancelling the futures resulting the job status to change sounds more difficult to me and would require more of a rewrite of the integration of the job.status with the concurrent.futures. I agree that this would be great, but for me this is out of the scope of the current pull request.

@samwaseda
Copy link
Member

Do I understand correctly that with the last commit we can simply use job.status and access data from job if the job is finished? Or do I still need to call job.server.future.result()?

@jan-janssen
Copy link
Member Author

Do I understand correctly that with the last commit we can simply use job.status and access data from job if the job is finished? Or do I still need to call job.server.future.result()?

You can always use job.status to check the status of the job, this worked the whole time. The last commit just implements to security checks:

  1. it cancels the future object when you remove the job before it is executed, this helpful if you create ten jobs, submit them all to an executor and then decide to remove job five to eight before they are executed. Without this fix pyiron would start to run the job just to realize that it no longer exists. With this fix the Executor is informed when you delete the job.
  2. if the user decides to cancel the job, using job.server.future.cancel() rather than the traditional job.status.aborted = True, then this is now correctly transferred to the job status, so the job status is set to aborted when the future object is cancelled.

@samwaseda
Copy link
Member

ok that was not quite my question. I wanted to ask whether job.server.future.result() has to be called before the final results can be accessed.

@jan-janssen
Copy link
Member Author

ok that was not quite my question. I wanted to ask whether job.server.future.result() has to be called before the final results can be accessed.

no, and this was never the case. You can use job.server.future.result() as alternative to pr.wait_for_job(job) but it is not mandatory.

@liamhuber
Copy link
Member

I'm happy to stack a PR that resolves my outstanding concerns.

Aaand one of my kids needs to be picked up from daycare sick. Probably won't be able to do this.

Copy link
Member

@niklassiemer niklassiemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this state of the new interface.

@liamhuber
Copy link
Member

I agree that the development releases are the right way to move forward, but this functionality is not yet available, so for now I am to get his merged in the main branch.

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

@jan-janssen
Copy link
Member Author

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

As far as I can see the dev release is not yet pushed to conda.

@srmnitc
Copy link
Member

srmnitc commented Jul 7, 2023

I agree that the development releases are the right way to move forward, but this functionality is not yet available, so for now I am to get his merged in the main branch.

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

Yes correct!

@srmnitc
Copy link
Member

srmnitc commented Jul 7, 2023

Just for clarity, the development releases can target any branch (correct me if I'm wrong about this, @srmnitc). So while it looks like we'll merge this by the end of the week anyhow, there is no actual need to merge it in order to dev release the functionality -- the dev release can target #1154 directly. This is the point of the dev-release workflow: to allow releasing functionality without feeling time-pressured to merge things into main.

As far as I can see the dev release is not yet pushed to conda.

I was hoping some of you would add a review, since the tests pass, I will go ahead and merge.

@srmnitc
Copy link
Member

srmnitc commented Jul 7, 2023

@jan-janssen @liamhuber Both base and atomistics have the dev release functionality now.

@liamhuber
Copy link
Member

I'm re-reviewing. Almost done. Please hold changes for a minute

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I implemented your suggestion to not copy jobs with futures that aren't done and tested it
  • I still want the job-independent tests moved out of the job tests and into the server tests

pyiron_base/jobs/job/generic.py Show resolved Hide resolved
pyiron_base/jobs/job/generic.py Show resolved Hide resolved
tests/job/test_genericJob.py Outdated Show resolved Hide resolved
tests/job/test_genericJob.py Show resolved Hide resolved
@liamhuber
Copy link
Member

liamhuber commented Jul 7, 2023

I'm re-reviewing. Almost done. Please hold changes for a minute

K super, thanks! EDIT: thanks for waiting. For future readers I'm not just thanking myself -- there was a timely thumbs up 😝

All done. A couple more minor requests re testing. I still get a little nervous about potential weird states that server.future can get into given how it's used elsewhere, but since no one can see any explicit issues or an alternative suggestion, let's just go for it. Definitely I see your point that we don't want to just delete it immediately. Disallowing copying jobs with a non-done() future was a good suggestion though, so that's in and tested.

@jan-janssen jan-janssen added format_black reformat the code using the black standard and removed format_black reformat the code using the black standard labels Jul 7, 2023
@jan-janssen
Copy link
Member Author

@liamhuber I merged your changes plus fixes - so from my perspective it is ready to be merged now. What do you think?

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like the server-only tests moved to the lowest place they can go, but that's not approval-prohibiting.

🚀

@jan-janssen
Copy link
Member Author

I would still like the server-only tests moved to the lowest place they can go, but that's not approval-prohibiting.

Done

@jan-janssen jan-janssen merged commit 2dd3b0b into main Jul 7, 2023
@delete-merged-branch delete-merged-branch bot deleted the future_executor branch July 7, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black reformat the code using the black standard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants