Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on GPU #76

Open
djcole56 opened this issue Oct 1, 2024 · 1 comment
Open

Training on GPU #76

djcole56 opened this issue Oct 1, 2024 · 1 comment

Comments

@djcole56
Copy link

djcole56 commented Oct 1, 2024

I had an issue when trying to perform a training run on the GPU, which appeared to be caused by reference and predicted data being stored on different devices leading to errors like RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu).

I can fix this by explicitly allocating the reference data (energies, forces and coords) to the GPU (

energy_ref = entry["energy"]
):

        energy_ref = entry["energy"].cuda()
        forces_ref = entry["forces"].reshape(len(energy_ref), -1, 3).cuda()

        coords = (
            entry["coords"]
            .reshape(len(energy_ref), -1, 3)
            .detach()
            .requires_grad_(True).cuda()
        )

but likely something smarter is needed that can deal with CPU/GPU runs.

@djcole56
Copy link
Author

djcole56 commented Oct 8, 2024

Ah solved by this PR by @jthorton I believe: #72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant