Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added warning for 1 A^3 CRYST1 Record #2645

Merged
merged 21 commits into from
Apr 16, 2020

Conversation

Oscuro-Phoenix
Copy link
Contributor

@Oscuro-Phoenix Oscuro-Phoenix commented Mar 25, 2020

Fixes #2599

Changes made in this Pull Request:

  • Added a warning for 1 A^3 CRYST1 Record

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@Oscuro-Phoenix
Copy link
Contributor Author

Hi!,
Is this what we need? Do let me know if any changes are needed.
Thanks.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is where to start. Once we detect the box is 1,1,1 we want to reset it so size zero so that it doesn’t get used.

Then in addition to this we need a test which checks the code works

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per @richardjgowers' review, you'll probably want to skip feeding self.ts._unitcell if the CRYST record matches the cryo-em default pattern. I'd maybe consider restructuring this try/except block to include an else that validates the contents of the CRYST1 values before feeding self.ts._unitcells?

@@ -398,6 +398,8 @@ def _read_frame(self, frame):
warnings.warn("Failed to read CRYST1 record, "
"possibly invalid PDB file, got:\n{}"
"".format(line))
if (self.ts._unitcell[:3] == np.array([1,1,1],dtype=float)).all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consideration: the Cryo-EM pattern is very specific to a 1,1,1,90,90,90 box. It might be overall safer to check for something that matches that explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we'd like?

            try:
                cell_check = [line[6:15], line[15:24],
                              line[24:33], line[33:40],
                              line[40:47], line[47:54]]
                if (np.array(cell_check,dtype=float) == 
                    np.array([1,1,1,90,90,90],dtype=float)).all():
                    warnings.warn("1 A^3 CRYST1 record," 
                                  "possibly an EM structure file, setting box-size to 0",stacklevel=2)
                    self.ts._unitcell=np.array([0,0,0,90,90,90],dtype=float)    
            except ValueError:
                warnings.warn("Failed to read CRYST1 record, "
                              "possibly invalid PDB file, got:\n{}"
                              "".format(line))
            else:
                try:
                    if not (self.ts._unitcell==np.array([0,0,0,90,90,90])).all():
                        self.ts._unitcell[:] = [line[6:15], line[15:24],
                                                line[24:33], line[33:40],
                                                line[40:47], line[47:54]]
                except ValueError:
                    warnings.warn("Failed to read CRYST1 record, "
                                  "possibly invalid PDB file, got:\n{}"
                                  "".format(line))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general point to start with. It is usually easier for us to review if you can push larger code changes to you branch rather than place it in the comments. That way we can get a better feeling for how the code changes fit within the context of the existing code. It also makes it easier for you to read our comments afterwards 😄

Whilst you have most of the right core elements, I don't think the structure works.

try block:

What you want to do here is try to extract the lines into a temporary variable (here cell_dims might be a more understandable variable name than cell_check). There's no point testing/warning for cryo-em default values here because you are already doing this in the else block (you don't want to use up extra CPU cycles if you can avoid it).

except block:

This seems fine.

else block:

The else block is where I'm having the hardest time following what you are doing.
Since you will have already extracted the cell dimensions in the try block, there is no point trying to extract them again. Additionally, you have already tested if a ValueError will be raised when reading the appropriate parts of line. So you don't need the addtiional try/except here.

In this part of the code I would expect you to be testing the contents of the temporary variable hosting the cell dimensions against the default cryo-em CRYST1 values (i.e. [1,1,1,90,90,90]). If it matches (i.e. is a likely cryo-em structure), then raise a warning. If not then feed the cell dimensions to self.ts._unitcell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for taking this. I will keep your points in mind when I revert later.

What I wanted to do is basically have the cell_dims variable store the box values which are checked for CRYST1 values in the first try block. If they turn out to be the same as the specific set of values [1,1,1,90,90,90] then I set the self.ts._unitcell as having dimensions of [0,0,0,90,90,90]. And in else block, if the self.ts._unitcell is not set as having box-size of 0 then only I go ahead and set it as the cell_dims (again, I should have directly set it as =cell_dims). I see that the last try/except was extraneous. I'm really sorry for that.

package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Mar 25, 2020

Codecov Report

Merging #2645 into develop will increase coverage by 0.15%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2645      +/-   ##
===========================================
+ Coverage    90.89%   91.04%   +0.15%     
===========================================
  Files          174      159      -15     
  Lines        23592    21583    -2009     
  Branches      3084     3085       +1     
===========================================
- Hits         21443    19651    -1792     
+ Misses        1529     1314     -215     
+ Partials       620      618       -2     
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/PDB.py 90.17% <100.00%> (+0.08%) ⬆️
...age/MDAnalysis/analysis/hbonds/hbond_autocorrel.py 87.34% <0.00%> (-0.56%) ⬇️
visualization/__init__.py
topology/__init__.py
formats/__init__.py
auxiliary/__init__.py
core/__init__.py
transformations/__init__.py
datafiles.py
__init__.py
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e01176f...bca4684. Read the comment docs.

@IAlibay
Copy link
Member

IAlibay commented Mar 26, 2020

@Oscuro-Phoenix not had a look at the code yet, but adding & parsing a 38k line (cryo-em files are indeed quite large 😅 ) file to test a CRYST1 record might a bit overkill. I'd instead recommend looking at using the much smaller 5A7U.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more changes in addition to my other #2645 (comment). Just a few more changes and we should be getting there 😃

testsuite/MDAnalysisTests/datafiles.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved
#issue 2599
with pytest.warns(UserWarning) as record:
u = mda.Universe(CRYST1_BOX)
cur_sele = u.select_atoms('around 0.1 (resid 4 and name CA and segid A)')
Copy link
Member

@IAlibay IAlibay Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to call select_atoms to get the warning here (it should hopefully just be thrown on universe building. So the select_atoms shouldn't be necessary here. However, we definitely should be checking that selecting 0.1 A around one of the alpha carbons works now, so do please add this as a second test.

package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
except ValueError:
warnings.warn("Failed to read CRYST1 record, "
"possibly invalid PDB file, got:\n{}"
"".format(line))

else:
if not (self.ts._unitcell==np.array([0,0,0,90,90,90])).all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I would be expecting most of what is currently in try (except assigning cell_dims) to be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do let me know if it has been done the right way in my latest commit.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments @Oscuro-Phoenix

cell_dims = [line[6:15], line[15:24],
line[24:33], line[33:40],
line[40:47], line[47:54]]
if (np.array(cell_dims, dtype=float) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously mentioned we don't want anything else but the assignment of cell_dims in the try block. Everything else needs to be in else.

np.array([1, 1, 1, 90, 90, 90], dtype=float)).all():
warnings.warn("1 A^3 CRYST1 record,"
" possibly an EM structure file, setting box-size to 0",stacklevel=2)
cell_dims = np.array([0, 0, 0, 90, 90, 90],dtype=float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we match the CRYST1 record of a cryo-em file, we don't want to set the box at all. Setting cell_dims (and then eventually self.ts._unitcell[:] to [0, 0, 0, 90, 90, 90] technically means that some of the box was read in, which is not true here. Cryo-em CRYST1 records are just placeholder values so they don't have physicality.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more changes, nearly there :)

package/MDAnalysis/coordinates/PDB.py Show resolved Hide resolved
package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, you'll also need to add a .. versionchanged:: 1.0.0 entry in the docstring of the PDBReader class.

package/MDAnalysis/coordinates/PDB.py Show resolved Hide resolved
package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/PDB.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved

def test_cryst_em_select():
#issue 2599
u=mda.Universe(PDB_CRYOEM_BOX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8, can you add spaces around = please? (one the lines below too).

testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one final PEP8 change I think.

testsuite/MDAnalysisTests/coordinates/test_pdb.py Outdated Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just need to add an entry to CHANGELOG and add yourself to AUTHORS.
@richardjgowers any comments?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somes changes for the CHANGELOG. Also you have some merge conflicts, you'll need to rebase against MDA's develop branch and solve the conflicts. If necessary, we can fix them for you on our end.

@@ -111,6 +111,7 @@ Enhancements
* Improve the distance search in water bridge analysis with capped_distance (PR #2480)
* Added weights_groupselections option in RMSD for mapping custom weights
to groupselections (PR #2610, Issue #2429)
* Added user warning for CRYST1 cryo-em structures (PR #2645, Issue #2599)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here:

  1. Since this does more than just add a warning I'd be more explicit here, something along the lines of "Checks for cryo-em 1 A^3 default CRYST1 record, disabling setting of box dimensions if found (Issue 1 A^3 boxes from EM structures in the PDB #2599)".
  2. This probably should go under "fixes". Also as per the CHANGELOG rules, entries should be added "newest first", i.e. at the top of the list.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oscuro-Phoenix just the one addition needed to the CHANGELOG + the merge conflicts. Let me know if you are having issues rebasing, we can fix the merge conflicts on our end if needed.

package/CHANGELOG Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming Travis comes back green, LGTM! Thanks for all your work @Oscuro-Phoenix

@IAlibay IAlibay closed this Apr 15, 2020
@IAlibay IAlibay reopened this Apr 15, 2020
@IAlibay
Copy link
Member

IAlibay commented Apr 15, 2020

Sorry about the close-open, just re-starting Travis in the hope that this MacOS build actually runs through...

@Oscuro-Phoenix
Copy link
Contributor Author

Looks like there's some issue with Travis.

@IAlibay IAlibay closed this Apr 16, 2020
@IAlibay IAlibay reopened this Apr 16, 2020
@IAlibay
Copy link
Member

IAlibay commented Apr 16, 2020

I'm going to ping @richardjgowers here, thoughts on the Travis failures? Different builds keep timing out. I guess cumulatively the whole thing came back green eventually. Just merge or try to fix Travis somehow?

@richardjgowers richardjgowers merged commit c35a6bf into MDAnalysis:develop Apr 16, 2020
@richardjgowers
Copy link
Member

@Oscuro-Phoenix Thanks for the fix!

@IAlibay yeah travis is just a helper tool, you can merge stuff if you think you know better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 A^3 boxes from EM structures in the PDB
4 participants