-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added warning for 1 A^3 CRYST1 Record #2645
Added warning for 1 A^3 CRYST1 Record #2645
Conversation
Hi!, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is where to start. Once we detect the box is 1,1,1 we want to reset it so size zero so that it doesn’t get used.
Then in addition to this we need a test which checks the code works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per @richardjgowers' review, you'll probably want to skip feeding self.ts._unitcell
if the CRYST record matches the cryo-em default pattern. I'd maybe consider restructuring this try
/except
block to include an else
that validates the contents of the CRYST1 values before feeding self.ts._unitcells
?
@@ -398,6 +398,8 @@ def _read_frame(self, frame): | |||
warnings.warn("Failed to read CRYST1 record, " | |||
"possibly invalid PDB file, got:\n{}" | |||
"".format(line)) | |||
if (self.ts._unitcell[:3] == np.array([1,1,1],dtype=float)).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consideration: the Cryo-EM pattern is very specific to a 1,1,1,90,90,90 box. It might be overall safer to check for something that matches that explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what we'd like?
try:
cell_check = [line[6:15], line[15:24],
line[24:33], line[33:40],
line[40:47], line[47:54]]
if (np.array(cell_check,dtype=float) ==
np.array([1,1,1,90,90,90],dtype=float)).all():
warnings.warn("1 A^3 CRYST1 record,"
"possibly an EM structure file, setting box-size to 0",stacklevel=2)
self.ts._unitcell=np.array([0,0,0,90,90,90],dtype=float)
except ValueError:
warnings.warn("Failed to read CRYST1 record, "
"possibly invalid PDB file, got:\n{}"
"".format(line))
else:
try:
if not (self.ts._unitcell==np.array([0,0,0,90,90,90])).all():
self.ts._unitcell[:] = [line[6:15], line[15:24],
line[24:33], line[33:40],
line[40:47], line[47:54]]
except ValueError:
warnings.warn("Failed to read CRYST1 record, "
"possibly invalid PDB file, got:\n{}"
"".format(line))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general point to start with. It is usually easier for us to review if you can push larger code changes to you branch rather than place it in the comments. That way we can get a better feeling for how the code changes fit within the context of the existing code. It also makes it easier for you to read our comments afterwards 😄
Whilst you have most of the right core elements, I don't think the structure works.
try
block:
What you want to do here is try to extract the lines into a temporary variable (here cell_dims
might be a more understandable variable name than cell_check
). There's no point testing/warning for cryo-em default values here because you are already doing this in the else
block (you don't want to use up extra CPU cycles if you can avoid it).
except
block:
This seems fine.
else
block:
The else
block is where I'm having the hardest time following what you are doing.
Since you will have already extracted the cell dimensions in the try
block, there is no point trying to extract them again. Additionally, you have already tested if a ValueError
will be raised when reading the appropriate parts of line
. So you don't need the addtiional try
/except
here.
In this part of the code I would expect you to be testing the contents of the temporary variable hosting the cell dimensions against the default cryo-em CRYST1 values (i.e. [1,1,1,90,90,90]). If it matches (i.e. is a likely cryo-em structure), then raise a warning. If not then feed the cell dimensions to self.ts._unitcell
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for taking this. I will keep your points in mind when I revert later.
What I wanted to do is basically have the cell_dims
variable store the box values which are checked for CRYST1
values in the first try block. If they turn out to be the same as the specific set of values [1,1,1,90,90,90] then I set the self.ts._unitcell
as having dimensions of [0,0,0,90,90,90]. And in else block, if the self.ts._unitcell
is not set as having box-size of 0 then only I go ahead and set it as the cell_dims (again, I should have directly set it as =cell_dims
). I see that the last try/except was extraneous. I'm really sorry for that.
Codecov Report
@@ Coverage Diff @@
## develop #2645 +/- ##
===========================================
+ Coverage 90.89% 91.04% +0.15%
===========================================
Files 174 159 -15
Lines 23592 21583 -2009
Branches 3084 3085 +1
===========================================
- Hits 21443 19651 -1792
+ Misses 1529 1314 -215
+ Partials 620 618 -2
Continue to review full report at Codecov.
|
This reverts commit c45f1ec.
@Oscuro-Phoenix not had a look at the code yet, but adding & parsing a 38k line (cryo-em files are indeed quite large 😅 ) file to test a CRYST1 record might a bit overkill. I'd instead recommend looking at using the much smaller 5A7U. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more changes in addition to my other #2645 (comment). Just a few more changes and we should be getting there 😃
#issue 2599 | ||
with pytest.warns(UserWarning) as record: | ||
u = mda.Universe(CRYST1_BOX) | ||
cur_sele = u.select_atoms('around 0.1 (resid 4 and name CA and segid A)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to call select_atoms
to get the warning here (it should hopefully just be thrown on universe building. So the select_atoms
shouldn't be necessary here. However, we definitely should be checking that selecting 0.1 A around one of the alpha carbons works now, so do please add this as a second test.
except ValueError: | ||
warnings.warn("Failed to read CRYST1 record, " | ||
"possibly invalid PDB file, got:\n{}" | ||
"".format(line)) | ||
|
||
else: | ||
if not (self.ts._unitcell==np.array([0,0,0,90,90,90])).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I would be expecting most of what is currently in try
(except assigning cell_dims
) to be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do let me know if it has been done the right way in my latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments @Oscuro-Phoenix
cell_dims = [line[6:15], line[15:24], | ||
line[24:33], line[33:40], | ||
line[40:47], line[47:54]] | ||
if (np.array(cell_dims, dtype=float) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously mentioned we don't want anything else but the assignment of cell_dims
in the try
block. Everything else needs to be in else
.
np.array([1, 1, 1, 90, 90, 90], dtype=float)).all(): | ||
warnings.warn("1 A^3 CRYST1 record," | ||
" possibly an EM structure file, setting box-size to 0",stacklevel=2) | ||
cell_dims = np.array([0, 0, 0, 90, 90, 90],dtype=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we match the CRYST1 record of a cryo-em file, we don't want to set the box at all. Setting cell_dims
(and then eventually self.ts._unitcell[:]
to [0, 0, 0, 90, 90, 90] technically means that some of the box was read in, which is not true here. Cryo-em CRYST1 records are just placeholder values so they don't have physicality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more changes, nearly there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there, you'll also need to add a .. versionchanged:: 1.0.0
entry in the docstring of the PDBReader
class.
|
||
def test_cryst_em_select(): | ||
#issue 2599 | ||
u=mda.Universe(PDB_CRYOEM_BOX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8, can you add spaces around =
please? (one the lines below too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one final PEP8 change I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just need to add an entry to CHANGELOG
and add yourself to AUTHORS
.
@richardjgowers any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somes changes for the CHANGELOG
. Also you have some merge conflicts, you'll need to rebase against MDA's develop branch and solve the conflicts. If necessary, we can fix them for you on our end.
package/CHANGELOG
Outdated
@@ -111,6 +111,7 @@ Enhancements | |||
* Improve the distance search in water bridge analysis with capped_distance (PR #2480) | |||
* Added weights_groupselections option in RMSD for mapping custom weights | |||
to groupselections (PR #2610, Issue #2429) | |||
* Added user warning for CRYST1 cryo-em structures (PR #2645, Issue #2599) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things here:
- Since this does more than just add a warning I'd be more explicit here, something along the lines of "Checks for cryo-em 1 A^3 default CRYST1 record, disabling setting of box dimensions if found (Issue 1 A^3 boxes from EM structures in the PDB #2599)".
- This probably should go under "fixes". Also as per the CHANGELOG rules, entries should be added "newest first", i.e. at the top of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Oscuro-Phoenix just the one addition needed to the CHANGELOG + the merge conflicts. Let me know if you are having issues rebasing, we can fix the merge conflicts on our end if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming Travis comes back green, LGTM! Thanks for all your work @Oscuro-Phoenix
Sorry about the close-open, just re-starting Travis in the hope that this MacOS build actually runs through... |
Looks like there's some issue with Travis. |
I'm going to ping @richardjgowers here, thoughts on the Travis failures? Different builds keep timing out. I guess cumulatively the whole thing came back green eventually. Just merge or try to fix Travis somehow? |
@Oscuro-Phoenix Thanks for the fix! @IAlibay yeah travis is just a helper tool, you can merge stuff if you think you know better :) |
Fixes #2599
Changes made in this Pull Request:
PR Checklist