Hi Marcin,

   In the mmCIF format atom_site.label_seq_id is required as described in the mmCIF documentation

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.label_seq_id.html

That documentation says it is used by the 60 mmCIF tables I copied below -- if you don't included it then none of those tables can refer to a specific residue.  A minimal mmCIF file can of course not have any of those 60 tables so it could work in principle to omit label_seq_id or specify it as ".".  But you are really asking for software not to work.  It is like not including any atom names because you don't happen to need the atom names -- good luck getting software to work correctly when you omit basic information.

Tom




On Jul 8, 2020, at 4:27 AM, Marcin Wojdyr <wojdyr@gmail.com> wrote:

Hi Greg,

Adding a atom_site.label_seq_id isn't different from supplying a residue
number in PDB file.  When there are adjacent residues of the same type,
does the PDB reader see a duplicate atom and generate a new residue?
merge the residues?  generate an error? I haven't tested the PDB reader,
but a residue number helps it too.

The sequence number/id from the PDB format tells which atoms are in
the same residue, but it doesn't imply connectivity between residues,
because the numbers don't need to be consecutive. In the mmCIF format
it is stored as _atom_site.auth_seq_id (+pdbx_PDB_ins_code for the
full id). So this makes conversion from PDB to mmCIF problematic. If
SEQRES is present I do sequence alignment to determine label_seq_id.
If SEQRES is missing I could ask the user to supply the full sequence,
but then the user may think that (since this was not obligatory when
working with PDB files) moving to mmCIF is a step backward. I could
infer gaps and increase label_seq_id by 2 if there is a gap, but the
resulting mmCIF file can be used for any purpose, not only by Chimera.
The apparent gap may actually be caused by misplaced atoms and it
could turn out that the gap in numbering is causing later different
problems. It's not clear to me what's better here - leaving
label_seq_id null or filling it with the best guesses (which sometimes
will be wrong).

Marcin
_______________________________________________
ChimeraX-users mailing list
ChimeraX-users@cgl.ucsf.edu
Manage subscription:
https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users