Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CHARMM force field to July 2024 release #356

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

epretti
Copy link
Member

@epretti epretti commented Jan 23, 2025

This is a work-in-progress and I am opening this pull request as a draft to keep track of issues and potentially solicit feedback on some issues I have encountered during this. This updates the CHARMM force field distributed in FFXML format to the most recent as of this writing (July 2024) release.

  • Files that used to be excluded from the conversion that now read in properly:
    • toppar_all36_prot_aldehydes.str and toppar_all36_na_modifications.str appear to no longer cause the parameter inconsistency reported previously.
    • toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.
    • toppar_ions_won.str was excluded for an unknown reason from the water/ions files and is included again (one naming collision below could be worked around with a residue-specific exclusion).
  • Files that appear to contain duplicate sets of parameters, so one set must be chosen:
    • toppar_all36_prot_heme_for_new_psf_gen_code_2022.str and toppar_all36_na_rna_modified_for_new_psf_gen_code_2022.str are excluded in favor of toppar_all36_prot_heme.str and toppar_all36_na_rna_modified.str. I can't find documentation on what this is about, but I am guessing that a backwards-incompatible change in CHARMM led to a need for two versions of these files: no parameters seem to change, just a few entries regarding topology modification. ParmEd doesn't seem to complain with either version.
    • The lipid LJPME-optimized files are left out in favor of the standard lipid model. It seems like some of the lipid stream files have updated versions for this LJPME-optimized model, but others don't, so it was unclear that I would be generating a consistent set of parameters if I tried to include LJPME-optimized ones.
  • Inconsistencies in the CHARMM release that require exclusion of some files or residues:
    • Inconsistent parameter values for CG2R61-CG301 bond (bond from 6-membered aromatic ring carbon to aliphatic carbon with no hydrogens). The offending values are at line 574 of par_all36_cgenff.prm and line 22869 of toppar_all36_carb_glycolipid.str. The glycolipid file is excluded as many other files depend on the CGenFF file, and I am proceeding under the assumption that until we know better, it is better to leave out a file than to make manual modifications to one.
    • DMPR is defined as a residue at line 26903 of top_all36_cgenff.rtf (dimethylpropanamide) and as a patch at line 386 of toppar_all36_na_reactive_rna.str (thio-substitution of dimyristoyl-D-glycero-1-phosphatidic acid). This name collision prevents ParmEd from reading both files, so the reactive RNA file is excluded.
    • TM3P is defined as a residue at line 7093 of top_all36_cgenff.rtf (4'-methyl,3'-phosphate tetrahydrofuran) and Tm3p is defined as a residue at line 341 of toppar_ions_won.str (Thulium(III) ion). ParmEd uppercases many names read which causes a collision when the force field and water/ion XML files are read together. The ion version of the residue is excluded.
    • Nucleic acids remain excluded from the Drude conversion because ParmEd (still?) can't read the nucleic acid parameter file. It is not clear if the problems indicated in drude2019.yaml are still ones causing this issue, or if this is due to something else.
  • Some things whose use was unclear or that were effectively non-functional were removed. If someone knows that they are actually wanted in a working state, let me know:
    • protein.yaml to ostensibly create a protein-only version of the force field, except that no corresponding XML file exists in the repository.
    • Splitting out of "model compounds" into separate XML files. These toppar_all36_*_model.xml force fields don't actually contain any residues. Two are empty, containing nothing except some spurious references, and two contain a few atom types and parameters that are then never used.
  • Some missing references were added to charmm36.yaml.

Presently, I am trying to run the tests to ensure that this conversion is not completely incorrect. Unfortunately there are some issues because of distinct residue templates in the force field that OpenMM sees as topologically equivalent. Apparently this was not an issue before, but the test code may need modification. In the meantime, I wanted to document the current issues with the conversion for reference and in case anyone has encountered these exact problems before.

* Checked which excluded files that previously caused problems could now
  be included
* Checked which included files caused inconsistencies or other problems
  and needed to be excluded
* The protein-only force field didn't exist in the ffxml directory so I
  removed protein.yaml (not sure if this is desired)
* Added toppar_ions_won.str to the water models (a citation was present
  but the file was missing)
* Updated some missing citations
* Previously, there were various files for model compounds that were
  split out, but these XML files didn't have any residues in them and only
  had a few random atom types (or were entirely empty other than containing
  irrelevant references).  These have been removed.
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.75%. Comparing base (e6e6248) to head (6503664).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #356   +/-   ##
=======================================
  Coverage   69.75%   69.75%           
=======================================
  Files           5        5           
  Lines         830      830           
=======================================
  Hits          579      579           
  Misses        251      251           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@peastman
Copy link
Member

toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.

Can these be distinguished from the regular amino acids? I would expect them to differ only in conformation, not topology, and therefore to match the same templates.

@epretti
Copy link
Member Author

epretti commented Jan 23, 2025

toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.

Can these be distinguished from the regular amino acids? I would expect them to differ only in conformation, not topology, and therefore to match the same templates.

This is part of the problem that I am encountering now. ParmEd will read and write them as distinct residues (possibly because they use different atom classes for their chiral centers), and OpenMM will read a force field with them. But when you try to create a system, OpenMM issues a "Multiple non-identical matching templates found" error.

This is not as simple as just excluding the D-amino acids, as some residues have topological equivalents elsewhere. For instance, so far I've run into ILE which has not only DILE from the D-amino acids, but also IIL (the diastereomer alloisoleucine) from another stream file. Do we want to exclude all of these cases (this will prevent users from simulating them but perhaps there is not a lot of demand for them)? If so, I will try to figure out how to go through and find all of them, and make sure I understand why ParmEd isn't eliminating them like it eliminates other topologically equivalent residues. There might have to be a table somewhere where we manually specify which are the "standard" ones that we want since they aren't all localized to one file.

@peastman
Copy link
Member

How about converting them, but keeping them as separate XML files? If you just include charmm36.xml you'll get only the standard amino acids. If you want the others (which should be pretty rare), you can include the second file. It will create conflicts, but you can resolve them with the residueTemplates argument to createSystem().

* Added functionality to split out residues from selected CHARMM
  parameter files

* Allow specifying "fixes" to the generated XML files: can provide XPath
  to find elements, and a tree of XML elements to append as children to
  those found
epretti and others added 13 commits February 3, 2025 15:29
* Copy improper objects before compression to work around ParmEd bug

* Add an XML fix for harmonic improper dihedrals to account for
  periodicity of the improper angle (to support equilibrium angles of pi)
* Adds functionality to output impropers in a way that should be
  compatible with their explicit specification in the CHARMM force field
  files

* Ensures that impropers spanning patch and residue atoms will not be
  neglected

* Ensures that duplicate improper entries will not be written to split
  files
An improper in a patch with some atoms in the patch might need to find
those atoms in a residue, but alternatively, the atoms might be in
another patch that also modified the residue, so search the residues
compatible with a patch for patches compatible with the residue during
improper processing.
Goal is to replace test_charmm.py and energy.py with something cleaner
and more flexible.  Can select CHARMM (if installed), OpenMM-CHARMM,
ParmEd-CHARMM, and OpenMM-FFXML modes and compare energies and forces
across force groups.  Can define tests in a directory containing a
test.yaml file instead of having things hardcoded.  Plan to port
remaining tests over.
* Uses a force field script to correctly insert impropers.  Should
  handle all cases except for ambiguous equivalent atoms (like OT1/OT2).

* Removed some miscellaneous test code from the conversion script since
  the test script handles this more thoroughly.
* New test script to test against CHARMM, OpenMM (reading a PSF) and
  OpenMM (reading FFXML).

* Converted old tests to new test format.

* Several new tests (amino acids and simple peptides for validation).

* Started reworking water box PSF generation as this was not placing the
  correct charges in the PSF files, leading to tests results that were
  not meaningful.
* Remove files left over from test system generation that are not being
  used by tests

* Remove some old test scripts whose behavior has been incorporated into
  new test script

* Note: 4- and 5-site water tests will be restored once it is determined
  how to generate the PSFs correctly with the virtual sites
* Use CHARMM to generate correct PSFs since ParmEd doesn't

* Update test script to update virtual site positions in OpenMM and skip
  testing virtual site forces (since CHARMM and OpenMM report them
  differently but accumulate them on the atoms the same way).
@epretti epretti marked this pull request as ready for review March 6, 2025 23:31
@epretti
Copy link
Member Author

epretti commented Mar 6, 2025

OK, there's a lot here, but this should be ready to look at now. Notes on changes and issues:

  • Some residues and patches are split out into separate files due to various issues and patching collisions (this is explained in more detail in the README)
  • Major overhaul to conversion script to allow proper splitting (see above), handling of impropers and anisotropic polarizabilities (it is necessary to bypass incorrect handling in ParmEd and do this directly via a script embedded in the force fields), and automated post-generation patching of FFXML files (needed for disulfide patch and some nucleic acid patches).
  • There are issues in ParmEd requiring a custom modified version of ParmEd to be used for the force field generation until issues (Issues in parsing CHARMM topology/parameter files ParmEd/ParmEd#1395, Integral net charge check for patching is incompatible with CHARMM36 nucleic acid FF ParmEd/ParmEd#1396, ParmEd OpenMM FFXML writer doesn't correctly handle patches that can modify residues in multiple ways ParmEd/ParmEd#1397) are fixed. The patched version can be found at epretti/ParmEd:patched-for-charmm-conversion. Note that this does not prevent the force fields from being used or tested, though!
  • Major overhaul to test script. Supports checking against locally installed or Docker containerized CHARMM, OpenMM reading CHARMM files, ParmEd reading CHARMM files, and the generated FFXML. Performs full energy and force decomposition. Test specifications are no longer hard-coded list items but are read in from YAML files.
  • Reorganized test suite to remove some extraneous files not read by CHARMM or OpenMM and to locate all YAML specifications and associated input files in dedicated directories.
  • Script to use CHARMM to regenerate water box test input files, since ParmEd doesn't support the necessary options for some water models with lone pairs or Drude atoms.
  • Testing of FFXML files against OpenMM and CHARMM is now fully automated in CI using a private (GitHub container registry) Docker container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants