CIF writer renumbers residues

older
Re: [chimerax-users] Antw: [EXT]...

Daniel Asarnow

23 Jun 2020 23 Jun '20

10:47 a.m.

Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug? Best, -da

Attachments:

attachment.html (text/html — 473 bytes)

Show replies by date

Greg Couch

23 Jun 23 Jun

11:40 a.m.

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday. As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in. In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering. HTH, Greg On 6/23/2020 10:47 AM, Daniel Asarnow wrote:

...

Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?

Best, -da

Tristan Croll

11:49 a.m.

I’m unclear on whether ChimeraX currently has a method to definitively assign a sequence to a chain. If it does, that would be really useful.

...

On 23 Jun 2020, at 19:40, Greg Couch <gregc@cgl.ucsf.edu> wrote:

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.

As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.

In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.

HTH,

Greg

...
On 6/23/2020 10:47 AM, Daniel Asarnow wrote: Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?

Best, -da

_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

Eric Pettersen

12:03 p.m.

To set the sequence: chain.bulk_set(residues, seq_string) 'residues' should be the same length as seq_string, which means it may contain Nones where structure is missing. To make it so that the mmCIF writer believes that the sequence info is "authoritative", you also have to: chain.from_seqres = True (which you could do without the bulk_set() if the sequence info was otherwise already correct). --Eric

...

On Jun 23, 2020, at 11:49 AM, Tristan Croll <tic20@cam.ac.uk> wrote:

I’m unclear on whether ChimeraX currently has a method to definitively assign a sequence to a chain. If it does, that would be really useful.

...
On 23 Jun 2020, at 19:40, Greg Couch <gregc@cgl.ucsf.edu> wrote:

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.

As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.

In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.

HTH,

Greg

...
On 6/23/2020 10:47 AM, Daniel Asarnow wrote: Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?

Best, -da

_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

Tom Goddard

11:55 a.m.

Hi Greg, The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id. Other software may only use label_seq_id and not author numbering and the user will have no control over that. Tom

...

On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc@cgl.ucsf.edu> wrote:

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.

As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.

In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.

HTH,

Greg

On 6/23/2020 10:47 AM, Daniel Asarnow wrote:

...
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?

Best, -da

_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

Greg Couch

12:39 p.m.

It's not that simple. The label_seq_id values are sequential positive integers, and if the structure is edited, then those numbers can change. The auth_seq_id is a much better fit. Here's part of the description of atom_site.auth_seq_id:

...

The author may assign values to _atom_site.auth_seq_id in any desired way. For instance, the values may be used to relate this structure to a numbering scheme in a homologous structure, including sequence gaps or insertion codes. Alternatively, a scheme may be used for a truncated polymer that maintains the numbering scheme of the full length polymer. In all cases, the scheme used here must match the scheme used in the publication that describes the structure.

And the auth_seq_id is used for a consistent numbering scheme in homologous structures in the PDB for easy comparison. -- Greg On 6/23/2020 11:55 AM, Tom Goddard wrote:

...

Hi Greg,

The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id. Other software may only use label_seq_id and not author numbering and the user will have no control over that.

Tom

...
On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc@cgl.ucsf.edu> wrote:

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.

As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.

In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.

HTH,

Greg

On 6/23/2020 10:47 AM, Daniel Asarnow wrote:

...
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?

Best, -da

_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

Tom Goddard

1:18 p.m.

Ok, if the mmcif spec demands that label_seq_id must be sequential without gaps then there is nothing that can be done to preserve numbering of those. Tom

...

On Jun 23, 2020, at 12:39 PM, Greg Couch <gregc@cgl.ucsf.edu> wrote:

It's not that simple. The label_seq_id values are sequential positive integers, and if the structure is edited, then those numbers can change. The auth_seq_id is a much better fit. Here's part of the description of atom_site.auth_seq_id:

...
The author may assign values to _atom_site.auth_seq_id in any desired way. For instance, the values may be used to relate this structure to a numbering scheme in a homologous structure, including sequence gaps or insertion codes. Alternatively, a scheme may be used for a truncated polymer that maintains the numbering scheme of the full length polymer. In all cases, the scheme used here must match the scheme used in the publication that describes the structure.

And the auth_seq_id is used for a consistent numbering scheme in homologous structures in the PDB for easy comparison.

-- Greg

...
On 6/23/2020 11:55 AM, Tom Goddard wrote: Hi Greg,

The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id. Other software may only use label_seq_id and not author numbering and the user will have no control over that.

Tom

...
...
On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc@cgl.ucsf.edu> wrote:

If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.

As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables. And the auth_seq_id, which is the author assigned value. The auth_seq_id written is the same as the one read in.

In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file). Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed. Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.

HTH,

Greg

On 6/23/2020 10:47 AM, Daniel Asarnow wrote:

...
Hi all, When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?

Best, -da

_______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

1873

Age (days ago)

1873

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

Daniel Asarnow
Eric Pettersen
Greg Couch
Tom Goddard
Tristan Croll