
On 7/7/2020 3:50 PM, Marcin Wojdyr wrote:
On Tue, 7 Jul 2020 at 22:38, Greg Couch <gregc@cgl.ucsf.edu> wrote:
As you know, the entity_poly_seq table says which polymeric residues are adjacent to each other (entity_poly_seq.num values are the atom_site.label_seq_id values). Yes, it's an enhanced equivalent of the SEQRES record in the PDB format. Files from the wwPDB always have it, but files from other sources will often miss it. I think currently almost all the mmCIF files in circulation originate from the wwPDB. But this is slowly changing and people are starting to use mmCIF as a working format. The sequence information is extremely useful for eliminating ambiguities in the atom_site data. And it is frequently known by the user, even when it is missing from the mmCIF file. Without that information, ChimeraX still needs the atom_site.label_seq_id to imply the polymeric ordering. Ok. It would be useful, though, if ChimeraX could infer the ordering and connectivity in the same way as it does for PDB files.
Thanks, Marcin
Adding a atom_site.label_seq_id isn't different from supplying a residue number in PDB file. When there are adjacent residues of the same type, does the PDB reader see a duplicate atom and generate a new residue? merge the residues? generate an error? I haven't tested the PDB reader, but a residue number helps it too. Learning to use mmCIF effectively will take time. Programmers should use the PDB's mmCIF Dictionary Suite to validate the mmCIF their programs output -- it won't validate completely (because it's for mmCIF files served by the PDB), but it will show where the mmCIF output can be improved. For more information about ChimeraX reading mmCIF, see https://www.rbvi.ucsf.edu/chimerax/docs/devel/bundles/mmcif/src/mmcif_guidel.... ChimeraX's mmCIF reader tries really hard to show you the data exactly as it is specified. So if there's badly specified data, you'll be able to see it. Over time, we will improve ChimeraX's ability to handle underspecified data as developers start generating mmCIF files. But we don't plan to modify the mmCIF reader if we can help it. Adding code, to infer information that should be explicitly given in the mmCIF file, slows down the mmCIF reader for everyone. This is especially true for large structures with over a million atoms. -- Greg