CIF writer renumbers residues

Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug? Best, -da

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday. As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in. In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering. HTH, Greg On 6/23/2020 10:47 AM, Daniel Asarnow wrote:
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
Best, -da

I’m unclear on whether ChimeraX currently has a method to definitively assign a sequence to a chain. If it does, that would be really useful.
On 23 Jun 2020, at 19:40, Greg Couch <gregc@cgl.ucsf.edu> wrote:
If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.
As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.
In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.
HTH,
Greg
On 6/23/2020 10:47 AM, Daniel Asarnow wrote: Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
Best, -da
_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

To set the sequence: chain.bulk_set(residues, seq_string) 'residues' should be the same length as seq_string, which means it may contain Nones where structure is missing. To make it so that the mmCIF writer believes that the sequence info is "authoritative", you also have to: chain.from_seqres = True (which you could do without the bulk_set() if the sequence info was otherwise already correct). --Eric
On Jun 23, 2020, at 11:49 AM, Tristan Croll <tic20@cam.ac.uk> wrote:
I’m unclear on whether ChimeraX currently has a method to definitively assign a sequence to a chain. If it does, that would be really useful.
On 23 Jun 2020, at 19:40, Greg Couch <gregc@cgl.ucsf.edu> wrote:
If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.
As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.
In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.
HTH,
Greg
On 6/23/2020 10:47 AM, Daniel Asarnow wrote: Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
Best, -da
_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

Hi Greg, The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id. Other software may only use label_seq_id and not author numbering and the user will have no control over that. Tom
On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc@cgl.ucsf.edu> wrote:
If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.
As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.
In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.
HTH,
Greg
On 6/23/2020 10:47 AM, Daniel Asarnow wrote:
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
Best, -da
_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

It's not that simple. The label_seq_id values are sequential positive integers, and if the structure is edited, then those numbers can change. The auth_seq_id is a much better fit. Here's part of the description of atom_site.auth_seq_id:
The author may assign values to _atom_site.auth_seq_id in any desired way. For instance, the values may be used to relate this structure to a numbering scheme in a homologous structure, including sequence gaps or insertion codes. Alternatively, a scheme may be used for a truncated polymer that maintains the numbering scheme of the full length polymer. In all cases, the scheme used here must match the scheme used in the publication that describes the structure.
And the auth_seq_id is used for a consistent numbering scheme in homologous structures in the PDB for easy comparison. -- Greg On 6/23/2020 11:55 AM, Tom Goddard wrote:
Hi Greg,
The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id. Other software may only use label_seq_id and not author numbering and the user will have no control over that.
Tom
On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc@cgl.ucsf.edu> wrote:
If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.
As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.
In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.
HTH,
Greg
On 6/23/2020 10:47 AM, Daniel Asarnow wrote:
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
Best, -da
_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

Ok, if the mmcif spec demands that label_seq_id must be sequential without gaps then there is nothing that can be done to preserve numbering of those. Tom
On Jun 23, 2020, at 12:39 PM, Greg Couch <gregc@cgl.ucsf.edu> wrote:
It's not that simple. The label_seq_id values are sequential positive integers, and if the structure is edited, then those numbers can change. The auth_seq_id is a much better fit. Here's part of the description of atom_site.auth_seq_id:
The author may assign values to _atom_site.auth_seq_id in any desired way. For instance, the values may be used to relate this structure to a numbering scheme in a homologous structure, including sequence gaps or insertion codes. Alternatively, a scheme may be used for a truncated polymer that maintains the numbering scheme of the full length polymer. In all cases, the scheme used here must match the scheme used in the publication that describes the structure.
And the auth_seq_id is used for a consistent numbering scheme in homologous structures in the PDB for easy comparison.
-- Greg
On 6/23/2020 11:55 AM, Tom Goddard wrote: Hi Greg,
The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id. Other software may only use label_seq_id and not author numbering and the user will have no control over that.
Tom
On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc@cgl.ucsf.edu> wrote:
If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.
As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.
In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.
HTH,
Greg
On 6/23/2020 10:47 AM, Daniel Asarnow wrote:
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
Best, -da
_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
participants (5)
-
Daniel Asarnow
-
Eric Pettersen
-
Greg Couch
-
Tom Goddard
-
Tristan Croll