Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make lazy input standard #588

Merged
merged 35 commits into from
Dec 5, 2022
Merged

make lazy input standard #588

merged 35 commits into from
Dec 5, 2022

Conversation

samwaseda
Copy link
Member

I realised that by using lazy=True it becomes much faster in SPHInX, so I make it standard.

@coveralls
Copy link

coveralls commented Apr 12, 2022

Pull Request Test Coverage Report for Build 3576649601

  • 3 of 6 (50.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.006%) to 68.662%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyiron_atomistics/atomistics/job/interactivewrapper.py 1 2 50.0%
pyiron_atomistics/sphinx/base.py 2 4 50.0%
Totals Coverage Status
Change from base Build 3565833200: -0.006%
Covered Lines: 12114
Relevant Lines: 17643

💛 - Coveralls

@stale
Copy link

stale bot commented Apr 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 27, 2022
@stale stale bot removed the stale label May 2, 2022
@samwaseda
Copy link
Member Author

spx = pr.create_job('Sphinx', 'spx_sxextopt_Al')
spx.structure = pr.create.structure.bulk('Al', cubic=True)
spx.server.run_mode.interactive = True
spx.calc_static()
sxextopt = spx.create_job('SxExtOptInteractive', 'sxextopt_Al')

sxextopt.save()
sxextopt = pr.load('sxextopt_Al')
sxextopt.run()

This is still not working and I have no idea why. Do you see a problem? @pmrv

@pmrv
Copy link
Contributor

pmrv commented May 17, 2022

spx = pr.create_job('Sphinx', 'spx_sxextopt_Al')
spx.structure = pr.create.structure.bulk('Al', cubic=True)
spx.server.run_mode.interactive = True
spx.calc_static()
sxextopt = spx.create_job('SxExtOptInteractive', 'sxextopt_Al')

sxextopt.save()
sxextopt = pr.load('sxextopt_Al')
sxextopt.run()

This is still not working and I have no idea why. Do you see a problem? @pmrv

Not sure from the error message. Have you checked the HDF5 files themselves? Can it be that they are somehow overwritten in between and therefore the lazy loading reaches into nothing?

I probably won't have time this week to have a detailed look.

@samwaseda
Copy link
Member Author

Now in principle I understood where the problem comes from but I don't know how to solve it

In ref_job_initialize therepop, in which in principle the following lines are executed:

spx = sxextopt._load_all_child_jobs(sxextopt._load_job_from_cache(sxextopt._job_name_lst[0]))
spx.project_hdf5.remove_group()
spx.project_hdf5 = job.project_hdf5.__class__(sxextopt.project, job.job_name, h5_path="/" + spx.job_name)

Now the difference is that spx.input works after these lines when it's not lazy, while with lazy it doesn't.

@samwaseda
Copy link
Member Author

For now I guess I'll wait for your reply @pmrv. It might be important to figure it out, because I realised that lazy=True is apparently also not compatible with Murnaghan when the job is interactive, which doesn't happen often right now, but maybe we're gonna face the same problem in the near future otherwise.

@pmrv
Copy link
Contributor

pmrv commented May 27, 2022

Now in principle I understood where the problem comes from but I don't know how to solve it

In ref_job_initialize therepop, in which in principle the following lines are executed:

spx = sxextopt._load_all_child_jobs(sxextopt._load_job_from_cache(sxextopt._job_name_lst[0]))
spx.project_hdf5.remove_group()
spx.project_hdf5 = job.project_hdf5.__class__(sxextopt.project, job.job_name, h5_path="/" + spx.job_name)

Now the difference is that spx.input works after these lines when it's not lazy, while with lazy it doesn't.

This actually sounds like a latent bug in how this was handled before. I'm assuming this happens:

  1. job gets loaded
  2. ref job gets loaded:
    2a. ref job input is read from HDF if lazy=True (i.e. copied)
    2b. ref job input is not read from HDF if lazy=False (i.e. DataContainer contains references to the HDF file and the group where the ref job was written to)
  3. the group where the ref job was written to is removed and replaced by the snippet you posted above
    3a. in the eager case this is no problem, because all the input is already loaded and will be written back once the full job is saved
    3b. in the lazy case we point to the group that was deleted above and we cannot restore it anymore

So the issue is created in step three, from the code you posted above. The solution is probably to check first whether this move of data is necessary (it seems to be deleting it and then writing it back to the same location to me) and if so reload the child job afterwards.

@samwaseda
Copy link
Member Author

So the issue is created in step three, from the code you posted above. The solution is probably to check first whether this move of data is necessary (it seems to be deleting it and then writing it back to the same location to me) and if so reload the child job afterwards.

That's what I also guessed. The problem is even without those lines there are errors (although they are probably different ones).

@samwaseda
Copy link
Member Author

I just realised that there's another problem:

job = pr.create.job.Sphinx('spx')
job.structure = pr.create.structure.bulk('Al', cubic=True)
job.run()
job = pr.load('spx')
job.input.copy()

This leads to a recursion error in the last line. It can be prevented by simply calling job.input before doing copy(), but this could be an inherent problem of the lazy loading.

@samwaseda
Copy link
Member Author

And I just realised that what I said about the workaround (calling spx.input) actually doesn't work...

@samwaseda
Copy link
Member Author

ok actually it works only if job.input is executed in a different cell, before job.input.copy() is called. No idea how this could possibly be like this

@pmrv
Copy link
Contributor

pmrv commented Jun 10, 2022

ok actually it works only if job.input is executed in a different cell, before job.input.copy() is called. No idea how this could possibly be like this

I know you told me not say that all the time, but that sounds weird. I'll give it a try.

@pmrv
Copy link
Contributor

pmrv commented Jun 10, 2022

ok actually it works only if job.input is executed in a different cell, before job.input.copy() is called. No idea how this could possibly be like this

I know you told me not say that all the time, but that sounds weird. I'll give it a try.

I ran your snippet in one cell and it comes out fine for me.

@samwaseda
Copy link
Member Author

I ran your snippet in one cell and it comes out fine for me.

Are you sure that you did it in this branch (i.e. with lazy on)?

@pmrv
Copy link
Contributor

pmrv commented Jun 10, 2022

I ran your snippet in one cell and it comes out fine for me.

Are you sure that you did it in this branch (i.e. with lazy on)?

Ah, no not. :|

@pmrv
Copy link
Contributor

pmrv commented Jun 20, 2022

Possible solution: copy the ref_job either in pop method of InteractiveWrapper or in the ref_job_initialize.

@jan-janssen
Copy link
Member

@samwaseda samwaseda added the integration Start the notebook integration tests for this PR label Oct 11, 2022
@samwaseda
Copy link
Member Author

So I found out again that if I call sxextopt.ref_job.input to show the content once before run, then it works, i.e.

Cell 1

spx = pr.create_job('Sphinx', 'spx_sxextopt_Al')
spx.structure = pr.create.structure.bulk('Al', cubic=True)
spx.server.run_mode.interactive = True
spx.calc_static()
sxextopt = spx.create_job('SxExtOptInteractive', 'sxextopt_Al')

sxextopt.save()
sxextopt = pr.load('sxextopt_Al')

Cell 2

sxextopt.ref_job.input

Cell 3

sxextopt.run()

This works. I remember there was something that had to be executed to have the same effect, right? @pmrv

@samwaseda
Copy link
Member Author

it's _force_load - got ya

@samwaseda
Copy link
Member Author

samwaseda commented Oct 12, 2022

https://github.com/pyiron/pyiron_base/blob/e15d4018beaddf8ecc45219729d848641bc0c9ca/pyiron_base/jobs/master/parallel.py#L804

I have a question for @pmrv and @jan-janssen because I'm not really sure what the solution should look like.

The current problem is that the input is not loaded in InteractiveWrapper after save and load. I am guessing the same problem does not happen in ParallelMaster, because the master job copies the child job whenever a new one is spawned (v.s.).

Now the question is obviously how to solve the problem for InteractiveWrapper. Do you see a possibility to do _force_load somewhere?

@samwaseda
Copy link
Member Author

samwaseda commented Oct 13, 2022

https://github.com/pyiron/pyiron_base/blob/e15d4018beaddf8ecc45219729d848641bc0c9ca/pyiron_base/jobs/master/generic.py#L223-L248

I made a bit more progress. In particular, I came back to the problem with pop. So, the following lines work with _force_load (which I commented out), but doesn't without:

spx = pr.create_job('Sphinx', 'spx_sxextopt_Al')
spx.structure = pr.create.structure.bulk('Al', cubic=True)
spx.server.run_mode.interactive = True
spx.calc_static()
sxextopt = spx.create_job('SxExtOptInteractive', 'sxextopt_Al')
sxextopt.save()
sxextopt = pr.load('sxextopt_Al')

### From here what `pop` does:
job_name_to_return = sxextopt._job_name_lst[-1]
job_to_return = sxextopt._load_all_child_jobs(sxextopt._load_job_from_cache(job_name_to_return))
# job_to_return.input._force_load()
job_to_return.project_hdf5.remove_group()
job_to_return.project_hdf5 = sxextopt.project_hdf5.__class__(
    sxextopt.project, job_to_return.job_name, h5_path="/" + job_to_return.job_name
)

@samwaseda
Copy link
Member Author

Here's my guess of what's happening inside pop:

  1. _load_all_child_jobs correctly loads the child job. And in this moment pyiron assumes that all the information is with the child job
  2. Child job loses all the groups
  3. New hdf address is assigned to the child job

If the list above is correct, then there are several solution possibilities:

  1. Load the entire input when _load_all_child_jobs is called
  2. Rewrite the hdf address for the input when the new address is assigned
  3. Not rewrite the hdf address in the first place

I would love the avoid the first solution, because then the advantage of lazy is essentially lost. I don't really know whether it's possible to realise the second solution, because there's no possibility for the master job to know the input/output structure of the child job, and the child job doesn't know that its address is rewritten. Actually the third solution is my favourite, especially because as I indicated in this issue it's not clear why the rewriting is performed in the first place. @jan-janssen

@samwaseda
Copy link
Member Author

I'm just having a discussion with @pmrv and realized that the option 3 is actually not really available.

@jan-janssen
Copy link
Member

Maybe we should simplify have different levels of load on the job level. We already have inspect and load and lazy_load sounds like a third option for me. So adding a lazy flag to the load function would be a solution and we could even make it the default. Just in special cases like copying jobs or extracting them from the HDF5 file of the master job we would do a full load.

@stale
Copy link

stale bot commented Nov 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 12, 2022
@stale stale bot removed the stale label Nov 17, 2022
@samwaseda
Copy link
Member Author

Do the tests use the conda environment for pyiron_base? The tests should not fail with the changes reflected in this PR

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@samwaseda samwaseda merged commit def92d1 into master Dec 5, 2022
@delete-merged-branch delete-merged-branch bot deleted the spx_lazy branch December 5, 2022 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration Start the notebook integration tests for this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants