Batch evaluation #233

M-R-Schaefer · 2024-02-26T12:50:00Z

Added a batch_eval method to the ASE calculator that processes data in the same way as we do during training. That means padding of inputs and unpading of outputs.
This should drastically accelerate the evaluation of whole datasets, especially if they consist of differently sized structures (which would trigger recompilations until now).

for more information, see https://pre-commit.ci

Chronum94 · 2024-02-26T14:45:39Z

apax/md/ase_calc.py

@@ -216,6 +230,40 @@ def calculate(self, atoms, properties=["energy"], system_changes=all_changes):
        self.results = {k: np.array(v, dtype=np.float64) for k, v in results.items()}
        self.results["energy"] = self.results["energy"].item()

+    def batch_eval(self, data, batch_size=64, silent=False):


Perhaps have data actually as atoms_list since that's what appears to be its role here, unless there are other conventions elsewhere in apax that use data? (in which case changes there may also be nice).

Probably a minimal docstring since this is a (likely common-use-case) user-facing method.

fair points, I"ll adjust the name and add some type hints + docs.

Chronum94 · 2024-02-26T15:03:36Z

Running into an issue where, if the batch size isn't a perfect divisor of the number of evaluation samples, I get a list index out of range error.

So for example, if I have 10 atoms objects in the list, then batch sizes of 1, 2, 5, 10, work as intended, all other numbers <10 do not.

~/apax/apax/md/ase_calc.py:261, in ASECalculator.batch_eval(self, data, batch_size, silent)
    259 for j in range(batch_size):
    260     atoms = data[i].copy()
--> 261     atoms.calc = SinglePointCalculator(atoms=atoms, **unpadded_results[j])
    262     evaluated_data.append(atoms)
    263 pbar.update(batch_size)

This isn't really a problem for really large batched evals unless of course one has a dataset sized at a very large number that is also very coincidentally a prime number but either a fix, or documentation of this limitation (?feature?) would be good.

Chronum94 · 2024-02-26T15:28:59Z

Potentially minor thing: Right now, the call signature looks something like so:

calc = ASECalculator(...)
...
calc.batch_eval(atoms_list)

But also requires that all of the atoms objects have a calculator attached. If there is nothing attached, we get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[43], line 1
----> 1 calc.batch_eval(atoms_list, 10)

File ~/apax/apax/md/ase_calc.py:236, in ASECalculator.batch_eval(self, data, batch_size, silent)
    234 if self.model is None:
    235     self.initialize(data[0])
--> 236 dataset = initialize_dataset(
    237     self.model_config, RawDataset(atoms_list=data), calc_stats=False
    238 )
    239 dataset.set_batch_size(batch_size)
    241 evaluated_data = []

File ~/apax/apax/data/initialization.py:56, in initialize_dataset(config, raw_ds, calc_stats)
     55 def initialize_dataset(config, raw_ds, calc_stats: bool = True):
---> 56     inputs, labels = create_dict_dataset(
     57         raw_ds.atoms_list,
     58         r_max=config.model.r_max,
     59         external_labels=raw_ds.additional_labels,
     60         disable_pbar=config.progress_bar.disable_nl_pbar,
     61         pos_unit=config.data.pos_unit,
     62         energy_unit=config.data.energy_unit,
     63     )
     65     if calc_stats:
     66         ds_stats = compute_scale_shift_parameters(
     67             inputs,
     68             labels,
   (...)
     72             config.data.scale_options,
     73         )

File ~/apax/apax/data/input_pipeline.py:102, in create_dict_dataset(atoms_list, r_max, external_labels, disable_pbar, pos_unit, energy_unit)
     94 def create_dict_dataset(
     95     atoms_list: list,
     96     r_max: float,
   (...)
    100     energy_unit: str = "eV",
    101 ) -> tuple[dict]:
--> 102     inputs, labels = atoms_to_arrays(atoms_list, pos_unit, energy_unit)
    104     if external_labels:
    105         for shape, label in external_labels.items():

File ~/apax/apax/utils/convert.py:112, in atoms_to_arrays(atoms_list, pos_unit, energy_unit)
    110 inputs["ragged"]["numbers"].append(atoms.numbers)
    111 inputs["fixed"]["n_atoms"].append(len(atoms))
--> 112 for key, val in atoms.calc.results.items():
    113     if key == "forces":
    114         labels["ragged"][key].append(
    115             val * unit_dict[energy_unit] / unit_dict[pos_unit]
    116         )

AttributeError: 'NoneType' object has no attribute 'results'

…tch_eval

for more information, see https://pre-commit.ci

M-R-Schaefer · 2024-02-26T17:24:05Z

Potentially minor thing: Right now, the call signature looks something like so:
calc = ASECalculator(...)
...
calc.batch_eval(atoms_list)
But also requires that all of the atoms objects have a calculator attached. If there is nothing attached, we get the following error:

This required a non trivial overhaul of the input pipeline. I have implemented a draft, but I'll consult a colleague to see what he thinks. I don't particularly like it and the tests would need to be adjusted.
I'll see if there is a more sensible option.

…t file.

…tch_eval

for more information, see https://pre-commit.ci

…tch_eval

for more information, see https://pre-commit.ci

M-R-Schaefer · 2024-02-27T13:07:28Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

M-R-Schaefer added 7 commits January 20, 2024 17:21

sketch of batch eval

a37d4d1

Merge branch 'dev' into batch_eval

35ea0ec

implemented eval loop

465b96b

Merge branch 'nl_gas_fix' into batch_eval

fdb7501

added uncertaintties to ASE all properties

a57d06b

completed batch_eval

2830b4f

added h5 to gitignore

70aec8e

M-R-Schaefer added the enhancement New feature or request label Feb 26, 2024

M-R-Schaefer requested a review from Chronum94 February 26, 2024 12:50

pre-commit-ci bot and others added 3 commits February 26, 2024 12:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef616d3

for more information, see https://pre-commit.ci

linting

4a16d25

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1c86e6

for more information, see https://pre-commit.ci

Chronum94 reviewed Feb 26, 2024

View reviewed changes

M-R-Schaefer and others added 6 commits February 26, 2024 17:00

adde docstrings, type hints. Always check for num strucutres in batch.

6bc206b

split atoms to arrays into functions for inputs and labels respectively

3b9fe79

made dataset from dicts independent of the number of spllied dicts

2824432

allow for datasets without labels

ffae6a4

Merge branch 'batch_eval' of https://github.com/apax-hub/apax into ba…

debef1f

…tch_eval

[pre-commit.ci] auto fixes from pre-commit.com hooks

07b59cc

for more information, see https://pre-commit.ci

M-R-Schaefer and others added 8 commits February 27, 2024 11:16

made option to read labels explicit

9e3de27

attach external labels directly to atoms, removed RawDataset

7fdc331

made shape of additional labels something to be specified in the inpu…

59e7fbb

…t file.

Merge branch 'batch_eval' of https://github.com/apax-hub/apax into ba…

7f07025

…tch_eval

[pre-commit.ci] auto fixes from pre-commit.com hooks

6fd57df

for more information, see https://pre-commit.ci

updated import paths in tests

af3a19d

fixed tests

b3a4b20

Merge branch 'batch_eval' of https://github.com/apax-hub/apax into ba…

3fa443c

…tch_eval

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f8694a

for more information, see https://pre-commit.ci

M-R-Schaefer requested a review from Tetracarbonylnickel February 27, 2024 13:06

M-R-Schaefer and others added 5 commits February 27, 2024 14:07

Merge branch 'dev' into batch_eval

cdbf75d

[pre-commit.ci] auto fixes from pre-commit.com hooks

554f850

for more information, see https://pre-commit.ci

linting

3af1be9

added additional property options to config

f6b4541

[pre-commit.ci] auto fixes from pre-commit.com hooks

9db40cb

for more information, see https://pre-commit.ci

Tetracarbonylnickel approved these changes Feb 28, 2024

View reviewed changes

M-R-Schaefer merged commit 047ae66 into dev Feb 28, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch evaluation #233

Batch evaluation #233

M-R-Schaefer commented Feb 26, 2024

Chronum94 Feb 26, 2024

Chronum94 Feb 26, 2024

M-R-Schaefer Feb 26, 2024

Chronum94 commented Feb 26, 2024 •

edited

Loading

Chronum94 commented Feb 26, 2024

M-R-Schaefer commented Feb 26, 2024

M-R-Schaefer commented Feb 27, 2024

Batch evaluation #233

Batch evaluation #233

Conversation

M-R-Schaefer commented Feb 26, 2024

Chronum94 Feb 26, 2024

Choose a reason for hiding this comment

Chronum94 Feb 26, 2024

Choose a reason for hiding this comment

M-R-Schaefer Feb 26, 2024

Choose a reason for hiding this comment

Chronum94 commented Feb 26, 2024 • edited Loading

Chronum94 commented Feb 26, 2024

M-R-Schaefer commented Feb 26, 2024

M-R-Schaefer commented Feb 27, 2024

Chronum94 commented Feb 26, 2024 •

edited

Loading