Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBP dataset performance enhancements #361

Merged
merged 30 commits into from
Nov 25, 2024
Merged

PBP dataset performance enhancements #361

merged 30 commits into from
Nov 25, 2024

Conversation

M-R-Schaefer
Copy link
Contributor

@M-R-Schaefer M-R-Schaefer commented Nov 4, 2024

I have reworked the PBP dataset to enhance performance.
Previously, the buffer would be filled with batches using multiprocessing, run dry and then be refilled.
Now, there is a dedicated processing thread which uses MP internally.
That means the buffer is now continuously filled without blocking train step compute.

On my home PC I have achieved a 20 % epoch time reduction compared to the cached dataset.
On my work laptop, it's roughly 30 % slower, but still much faster than the old pbp (old: 6.3 s, new: 1.3 s)

Apart from this dataset, there is also the CSVLogger class which uses tensorflow.
I think if we write our own logger, we could turn tensorflow into an optional dependency.

TODO:

  • add vesin to poetry
  • swap out matscipy for vesin everywhere
  • remove matscipy dependency
  • make sure tests are running
  • rerun benchmark
  • add stress to batch processor

Update:

After rerunning the benchmark on my workstation, the cached dataset is still faster than the PBB, especially in worst case scenarios (same system size, low batch size, lightweight model).
300 atom system, batch size 4, rmax=6

cached: 1.5 s/epoch
pbp new : 2.5 s/epoch
pbp old: 4.7 s/epoch

With BS 8, n_ens = 8, rmax=5
cached: 5.6 s/epoch
pbp: 6.7 s/epoch

@M-R-Schaefer M-R-Schaefer marked this pull request as ready for review November 17, 2024 12:07
@M-R-Schaefer M-R-Schaefer requested review from PythonFZ and removed request for PythonFZ November 19, 2024 12:50
Copy link
Contributor

@PythonFZ PythonFZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some short questions. I am struggling a bit giving a good review because I am not very familiar with the multiprocessing in such extend.

apax/data/input_pipeline.py Show resolved Hide resolved
Comment on lines 419 to 421
labels["energy"][i] = lab["energy"]
if self.forces:
labels["forces"][i, : inp["n_atoms"]] = lab["forces"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are other labels treated, e.g. stress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I removed the inclusion of stress... Good catch

apax/data/input_pipeline.py Outdated Show resolved Hide resolved
apax/data/input_pipeline.py Show resolved Hide resolved
apax/data/input_pipeline.py Outdated Show resolved Hide resolved
apax/md/ase_calc.py Show resolved Hide resolved
Copy link
Contributor

@PythonFZ PythonFZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@M-R-Schaefer M-R-Schaefer merged commit 2409c83 into main Nov 25, 2024
2 checks passed
@M-R-Schaefer M-R-Schaefer deleted the threading branch November 25, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants