Training on GPU #76

djcole56 · 2024-10-01T09:53:19Z

I had an issue when trying to perform a training run on the GPU, which appeared to be caused by reference and predicted data being stored on different devices leading to errors like RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu).

I can fix this by explicitly allocating the reference data (energies, forces and coords) to the GPU (

descent/descent/targets/energy.py

Line 110 in 92a1396

energy_ref = entry["energy"]

):

        energy_ref = entry["energy"].cuda()
        forces_ref = entry["forces"].reshape(len(energy_ref), -1, 3).cuda()

        coords = (
            entry["coords"]
            .reshape(len(energy_ref), -1, 3)
            .detach()
            .requires_grad_(True).cuda()
        )

but likely something smarter is needed that can deal with CPU/GPU runs.

The text was updated successfully, but these errors were encountered:

djcole56 · 2024-10-08T09:28:44Z

Ah solved by this PR by @jthorton I believe: #72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on GPU #76

Training on GPU #76

djcole56 commented Oct 1, 2024

djcole56 commented Oct 8, 2024

Training on GPU #76

Training on GPU #76

Comments

djcole56 commented Oct 1, 2024

djcole56 commented Oct 8, 2024