mmcif, auth_xx information versus label_xx information

Yes, Chimera uses the author's numbering because that it what is used in the author's publication. It turns out that this is especially important for waters since the PDB/RCSB refuses to number waters in the label_seq_id field which breaks the one-to-one mapping of (label_seq_id, label_asym_id) to a residue. In Chimera, the mapping with label_seq_id is discarded after is it used. In ChimeraX, there is no UI for label_seq_id, but, in Python code, the nth residue in a Chain instance is label_seq_id n. And residues have a mmcif_chain_id attribute which is the label_asym_id. So it is possible to reconstruct the mapping if need be. HTH, Greg On 12/6/18 4:03 AM, moocow@mindless.com wrote:
An mmcif / numbering question... When specifying residues, chimera seems to use "auth_seq_id" (the authors' numbering), as well as the authors' naming of chains (auth_asym_id). Is there any way to ask chimera to find residues using "label_seq_id" and "label_seq_id" (the numbering the protein data bank gives) ? Is the label_xxx information discarded when chimera reads an mmcif file or is it hidden in an attribute somewhere ?


So it's not so simple. The auth_.... fields are not required, they just happen to be in the files that the PDB provides. Other programs can not be expected to provide them. The RCSB has software that can validate mmCIF files against various dictionaries: the MMCIF Dictionary Suite, https://sw-tools.rcsb.org/apps/MMCIF-DICT-SUITE/index.html. Programs that generate mmCIF files should verify that the generated files are complete. And the dictionary used for validation should be documented in the audit_conform table. If your data is in the mmCIF format, you are better off using ChimeraX than Chimera. The reader is orders of magnitude faster. And I am working on fixing any bugs that come up :-). I have documented the parts of mmCIF files that are important for ChimeraX. For example, the Cartesian coordinates are required for atoms. See https://www.cgl.ucsf.edu/chimerax/docs/devel/bundles/mmcif/src/mmcif.html -- in particular, read "ChimeraX Fast mmCIF Guidelines" and "Writing mmCIF FIles in ChimeraX". -- Greg On 12/7/2018 12:47 AM, moocow@mindless.com wrote:
Thanks for the explanation. The water-numbering argument is convincing as a reason to use author_... fields. I think these also correspond with the fields in the old pdb format. I can see the PDB's logic in using a flexible format with room to put in every conceivable piece of information, but one can wonder if they really did the world a favour. It is almost guaranteed that this program will use author_.. fields and that program will use label_... fields.
Yes, Chimera uses the author's numbering because that it what is used in the author's publication. It turns out that this is especially important for waters since the PDB/RCSB refuses to number waters in the label_seq_id field which breaks the one-to-one mapping of (label_seq_id, label_asym_id) to a residue.
In Chimera, the mapping with label_seq_id is discarded after is it used. In ChimeraX, there is no UI for label_seq_id, but, in Python code, the nth residue in a Chain instance is label_seq_id n. And residues have a mmcif_chain_id attribute which is the label_asym_id. So it is possible to reconstruct the mapping if need be.
HTH,
Greg
On 12/6/18 4:03 AM, moocow@mindless.com wrote:
An mmcif / numbering question... When specifying residues, chimera seems to use "auth_seq_id" (the authors' numbering), as well as the authors' naming of chains (auth_asym_id). Is there any way to ask chimera to find residues using "label_seq_id" and "label_seq_id" (the numbering the protein data bank gives) ? Is the label_xxx information discarded when chimera reads an mmcif file or is it hidden in an attribute somewhere ?
_______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
participants (2)
-
Greg Couch
-
moocow@mindless.com