
On 7/8/2020 12:06 PM, Marcin Wojdyr wrote:
Hi Tom,
I don't want to continue it for too long, but I need to comment on this:
That documentation says it is used by the 60 mmCIF tables I copied below -- if you don't included it then none of those tables can refer to a specific residue. They can - by using different identifiers, the ones inherited from the PDB format. Moreover, if the residue that you refer to is HOH you have to use different identifiers, because label_seq_id is null (.) for ligands and waters. It cannot be used to refer to a specific water molecule.
Marcin
Yes, that is a major failing of the mmCIF format. If the atom_site.label_seq_id were non-null for solvent, then all of the lines in the atom_site table would be uniquely keyed, with the label_* values, like a database table -- that would be fantastic for us computer science types. ChimeraX uses the atom_site.auth_seq_id to disambiguate the solvent. The PDBe is thinking of adding another column to the atom_site table for the same purpose. It appears that it would be fairly easy for the PDB to fix this. The mmcif dictionary description for atom_site.label_seq_id would have to be updated to allow values for solvent. The current description is:
This data item is a pointer to _entity_poly_seq.num in the ENTITY_POLY_SEQ category. That could be amended. As far as I can tell, CIF validators do not use the description when validating CIF files, so the current limitation is not a formal requirement, just a convention. The problem is all of the programs the PDB uses.