-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added reading of record types #1762
Conversation
|
||
@staticmethod | ||
def _gen_initial_values(na, nr, ns): | ||
return np.array(['ATOM'] * na, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems (memory)-wasteful to store a string with each atom. For 1M atoms, that can be an additional len("HETATM") * 1e6 / (1024*1024)
= 5.7 MiB. Could we instead use a small integer, such as 1 = ATOM, 2 = HETATM, 0 = no idea, and then get the string back on the fly (e.g., as a dict look-up)?
Or is this more complicated than the memory consumption?
I'm just conscientious of the fact that we will be using the additional memory for any PDB file (especially simulation systems), where no-one cares about the atom records per se.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question about memory consumption
Yeah, we could do some encoding actually seeing as there's only two
options, that's a good point
…On Fri, 26 Jan 2018, 8:57 p.m. Oliver Beckstein, ***@***.***> wrote:
***@***.**** commented on this pull request.
question about memory consumption
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1762 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AI0jB_A2jaUxmg43Ibb18-SLOdseuQR7ks5tOjwfgaJpZM4RtHC_>
.
|
Encoding would save some memory for sure, and it will not affect performances in a perceivable way (or at least I do not expect it to). But it will create a new level of indirection for the user where what he sees is not what is stored; because it is different from how the other topology attribute are saved, I am worried the surprise the encoding could cause could be worst than the memory consumption. |
I think it depends on what you call the attribute and how you use it. Ultimately, only the PDBWriter needs to know what to do with it. (Of course, one needs to document the meaning of the codes but that's not worse than storing the PSF atomtype which can be either a number or a string, depending on if this is XPLOR PSF or CHARMM PSF.) |
It all depend on if we want it to be exposed to the users. The way the PR does it, it is exposed, and I think it is a good idea. Indeed, by exposing it we make it possible for a user to change the record type of some residues to deal with some software that handle the two types of record separately (many software just ignore HETATM records). We also make it possible to add the information to a universe that does not have it to later write a PDB as wanted; finally it allows to select atoms based on the record type to, for instance, ignore the HETATM like many software. If it is accessible to the user, it is much more natural to do something like |
I think with some work involving |
Yeah so @mnmelo is right, we can have our cake and eat it. With every TopologyAttr there's a This isn't currently 100% working now as |
So this should be good to go if @jbarnoud happy with the encoding. I've left out writing record types and selecting based on them deliberately because they'd make good GSOC starters, I'll make them into issues once this is merged. |
Starts #1753
Changes made in this Pull Request:
Is this the best name for the ATOM/HETATM thing in a PDB file?
PR Checklist