'DataFrame' object has no attribute ‘oligomeric_detail’ when training on custom PDBs #25

ntoxeg · 2024-08-19T07:18:54Z

After processing some PDB files from PINDER with process_pdb_files.py when running the training I get

python -W ignore experiments/train_se3_flows.py data.dataset=pdb
Error executing job with overrides: ['data.dataset=pdb']
Traceback (most recent call last):
  File "/home/greg/protein-frame-flow/experiments/train_se3_flows.py", line 111, in main
    exp = Experiment(cfg=cfg)
  File "/home/greg/protein-frame-flow/experiments/train_se3_flows.py", line 28, in __init__
    self._setup_dataset()
  File "/home/greg/protein-frame-flow/experiments/train_se3_flows.py", line 43, in _setup_dataset
    self._train_dataset, self._valid_dataset = eu.dataset_creation(
  File "/home/greg/protein-frame-flow/experiments/utils.py", line 179, in dataset_creation
    train_dataset = dataset_class(
  File "/home/greg/protein-frame-flow/data/datasets.py", line 332, in __init__
    metadata_csv = self._filter_metadata(self.raw_csv)
  File "/home/greg/protein-frame-flow/data/datasets.py", line 356, in _filter_metadata
    raw_csv.oligomeric_detail.isin(filter_cfg.oligomeric)]
  File "/home/greg/miniconda3/envs/fm/lib/python3.10/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute ‘oligomeric_detail'

While processing the script gives me warnings of UserWarning: Unlikely unit cell vectors detected in PDB file likely resulting from a dummy CRYST1 record. Discarding unit cell vectors. I’ve only found some old issue in the library’s repository but it seems it got solved years ago.

A side question, what do I need to do to make a .clusters file, as that doesn’t seem to get produced by the script either?

The text was updated successfully, but these errors were encountered:

jasonkyuyim · 2024-08-19T19:47:40Z

Yes you will need to manually add the oligomeric_detail column and populate it with the corresponding oligomeric state. This column is provided in the mmCIF files but I realize now I didn't code up the PDB parser to read the oligomeric detail. As a quick fix, you can populate the column with a default value or remove the filter all together. Since PINDER is populated with multimers, you'll have to be implement the chain and residue indices properly.

Regarding the warnings, if it don't break then don't bother :). I ignore the warnings unless it seems bad.

The PINDER files are taken from PDB if I'm not mistaken. So the cluster file should have a entry for every PDB ID. The clusters I used are from 2021 so you can download a more recent cluster file from RCSBPDB https://www.rcsb.org/docs/programmatic-access/file-download-services#sequence-clusters-data. Note that if a cluster isn't found then a new cluster will be assigned.

ntoxeg · 2024-08-20T15:02:55Z

There are sadly more issues, now I get a crash on pivot table generation

Traceback (most recent call last):
  File "/home/greg/protein-frame-flow/experiments/train_se3_flows.py", line 111, in main
    exp = Experiment(cfg=cfg)
  File "/home/greg/protein-frame-flow/experiments/train_se3_flows.py", line 28, in __init__
    self._setup_dataset()
  File "/home/greg/protein-frame-flow/experiments/train_se3_flows.py", line 43, in _setup_dataset
    self._train_dataset, self._valid_dataset = eu.dataset_creation(
  File "/home/greg/protein-frame-flow/experiments/utils.py", line 179, in dataset_creation
    train_dataset = dataset_class(
  File "/home/greg/protein-frame-flow/data/datasets.py", line 333, in __init__
    metadata_csv = self._filter_metadata(self.raw_csv)
  File "/home/greg/protein-frame-flow/data/datasets.py", line 363, in _filter_metadata
    data_csv = _rog_filter(data_csv, filter_cfg.rog_quantile)
  File "/home/greg/protein-frame-flow/data/datasets.py", line 26, in _rog_filter
    y_quant = y_quant.radius_gyration.to_numpy()
  File "/home/greg/miniconda3/envs/fm/lib/python3.10/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute ‘radius_gyration'

Inspection shows that the table generated by

    y_quant = pd.pivot_table(
        df,
        values='radius_gyration',
        index='modeled_seq_len',
        aggfunc=lambda x: np.quantile(x, quantile)
    )

is empty.
The column radius_gyration is present in the metadata file and so is modeled_seq_len.

ntoxeg · 2024-08-22T00:24:06Z

Ok, it seems that I had to disable filtering based on oligomeric detail, otherwise it filtered down to nothing.

jasonkyuyim · 2024-08-23T21:00:37Z

Yeah it sounds like you can remove the oligomeric detail filtering for your purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'DataFrame' object has no attribute ‘oligomeric_detail’ when training on custom PDBs #25

'DataFrame' object has no attribute ‘oligomeric_detail’ when training on custom PDBs #25

ntoxeg commented Aug 19, 2024

jasonkyuyim commented Aug 19, 2024 •

edited

Loading

ntoxeg commented Aug 20, 2024

ntoxeg commented Aug 22, 2024

jasonkyuyim commented Aug 23, 2024

'DataFrame' object has no attribute ‘oligomeric_detail’ when training on custom PDBs #25

'DataFrame' object has no attribute ‘oligomeric_detail’ when training on custom PDBs #25

Comments

ntoxeg commented Aug 19, 2024

jasonkyuyim commented Aug 19, 2024 • edited Loading

ntoxeg commented Aug 20, 2024

ntoxeg commented Aug 22, 2024

jasonkyuyim commented Aug 23, 2024

jasonkyuyim commented Aug 19, 2024 •

edited

Loading