Re: How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. See https://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote:
*From: *Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> *Subject: **[chimerax-users] How to save a sequence in an mmCIF file?* *Date: *November 29, 2023 at 2:45:31 AM PST *To: *"chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> *Reply-To: *Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här:http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here:http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list --chimerax-users@cgl.ucsf.edu To unsubscribe send an email tochimerax-users-leave@cgl.ucsf.edu Archives:https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. See https://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu <mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote:
From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se <mailto:guillaume.gaullier@kemi.uu.se>>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net <mailto:goddard@sonic.net>> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu <mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote:
From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se <mailto:guillaume.gaullier@kemi.uu.se>>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Perfect timing - I was *just* getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users < chimerax-users@cgl.ucsf.edu> wrote:
Hi Tom,
Getting back to this much later than I wanted, sorry.
I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
------------------------------ *From:* Tom Goddard <goddard@sonic.net> *Sent:* Friday, December 1, 2023 11:19:39 PM *To:* Guillaume Gaullier *Cc:* Greg Couch; ChimeraX Users Help *Subject:* Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier < guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users < chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
------------------------------ *From:* Greg Couch <gregc@cgl.ucsf.edu> *Sent:* Friday, December 1, 2023 1:22:34 AM *To:* ChimeraX Users Help *Cc:* Guillaume Gaullier *Subject:* Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. See https://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote:
*From: *Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu
*Subject: **[chimerax-users] How to save a sequence in an mmCIF file?* *Date: *November 29, 2023 at 2:45:31 AM PST *To: *"chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> *Reply-To: *Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
-- Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
Ha, great! I was just about to open an issue on your github repo. No need then. Thanks, this is a great Christmas present! Guillaume ________________________________ From: Tristan Croll <tcroll@altoslabs.com> Sent: Thursday, December 21, 2023 2:22:43 PM To: Guillaume Gaullier Cc: Tom Goddard; ChimeraX Users Help Subject: Re: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi everyone, I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition! Many thanks and best wishes, Matthias From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
One other thing Tristan: maybe make sure that the isolde command includes a run of dssp on all protein chains in the designated model before saving? Given that the 'computedSheets true' option to 'save' doesn't do it automatically. Thanks again! This will be very helpful. Guillaume ________________________________ From: Guillaume Gaullier Sent: Thursday, December 21, 2023 2:30:07 PM To: Tristan Croll Cc: Tom Goddard; ChimeraX Users Help Subject: Re: [chimerax-users] Re: How to save a sequence in an mmCIF file? Ha, great! I was just about to open an issue on your github repo. No need then. Thanks, this is a great Christmas present! Guillaume ________________________________ From: Tristan Croll <tcroll@altoslabs.com> Sent: Thursday, December 21, 2023 2:22:43 PM To: Guillaume Gaullier Cc: Tom Goddard; ChimeraX Users Help Subject: Re: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hmm... this will turn out a little more complicated than I thought (will file a bug report in a minute): the ChimeraX 1.7 release (at least in Windows) is missing the `pprintpp` library, causing a `ModuleNotFoundError` when `computedSheets` is true. On Thu, Dec 21, 2023 at 1:30 PM Guillaume Gaullier < guillaume.gaullier@kemi.uu.se> wrote:
Ha, great! I was just about to open an issue on your github repo. No need then.
Thanks, this is a great Christmas present!
Guillaume
------------------------------ *From:* Tristan Croll <tcroll@altoslabs.com> *Sent:* Thursday, December 21, 2023 2:22:43 PM *To:* Guillaume Gaullier *Cc:* Tom Goddard; ChimeraX Users Help *Subject:* Re: [chimerax-users] Re: How to save a sequence in an mmCIF file?
Perfect timing - I was *just* getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users < chimerax-users@cgl.ucsf.edu> wrote:
Hi Tom,
Getting back to this much later than I wanted, sorry.
I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
------------------------------ *From:* Tom Goddard <goddard@sonic.net> *Sent:* Friday, December 1, 2023 11:19:39 PM *To:* Guillaume Gaullier *Cc:* Greg Couch; ChimeraX Users Help *Subject:* Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier < guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users < chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
------------------------------ *From:* Greg Couch <gregc@cgl.ucsf.edu> *Sent:* Friday, December 1, 2023 1:22:34 AM *To:* ChimeraX Users Help *Cc:* Guillaume Gaullier *Subject:* Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. See https://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote:
*From: *Guillaume Gaullier via ChimeraX-users < chimerax-users@cgl.ucsf.edu> *Subject: **[chimerax-users] How to save a sequence in an mmCIF file?* *Date: *November 29, 2023 at 2:45:31 AM PST *To: *"chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> *Reply-To: *Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
-- Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
Totally agree with you Matthias, a feature like this would be so helpful to prepare final models! Cheers, Guillaume ________________________________ From: Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Sent: Thursday, December 21, 2023 14:33 To: ChimeraX Users Help Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file? Hi everyone, I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition! Many thanks and best wishes, Matthias From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi everyone, Since the initial question there has been a bugfix (Dec 2 I believe) in 1.7 and 1.8 daily builds, so that using the ChimeraX "save" command to save an mmCIF file with "bestGuess true" should now write biopolymer sequences into the file as as entity_poly_seq as well as entity_poly. However, it does not involve association, so it still only includes the residues that actually have coordinates. Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Dec 21, 2023, at 5:33 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi everyone,
I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition!
Many thanks and best wishes, Matthias
From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file?
Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote: Hi Tom,
Getting back to this much later than I wanted, sorry.
I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Ack. pprintpp is for debugging code that should have been removed. It will be fixed in tomorrow's daily build and in 1.7.1 If we do a 1.7.1 release. To workaround the problem, In the current 1.7, on the ChimeraX command line, give the command "pip install pprintpp". You will get the debugging output in the log, but it will work. -- Greg On 12/21/23 05:38, Tristan Croll via ChimeraX-users wrote:
Hmm... this will turn out a little more complicated than I thought (will file a bug report in a minute): the ChimeraX 1.7 release (at least in Windows) is missing the `pprintpp` library, causing a `ModuleNotFoundError` when `computedSheets` is true.
On Thu, Dec 21, 2023 at 1:30 PM Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Ha, great! I was just about to open an issue on your github repo. No need then.
Thanks, this is a great Christmas present!
Guillaume
------------------------------------------------------------------------ *From:* Tristan Croll <tcroll@altoslabs.com> *Sent:* Thursday, December 21, 2023 2:22:43 PM *To:* Guillaume Gaullier *Cc:* Tom Goddard; ChimeraX Users Help *Subject:* Re: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was /just/ getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Tom,
Getting back to this much later than I wanted, sorry.
I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
------------------------------------------------------------------------ *From:* Tom Goddard <goddard@sonic.net> *Sent:* Friday, December 1, 2023 11:19:39 PM *To:* Guillaume Gaullier *Cc:* Greg Couch; ChimeraX Users Help *Subject:* Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
------------------------------------------------------------------------ *From:*Greg Couch <gregc@cgl.ucsf.edu> *Sent:*Friday, December 1, 2023 1:22:34 AM *To:*ChimeraX Users Help *Cc:*Guillaume Gaullier *Subject:*Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-filesfor how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. Seehttps://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullierwrote:
*From:*Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> *Subject:**[chimerax-users] How to save a sequence in an mmCIF file?* *Date:*November 29, 2023 at 2:45:31 AM PST *To:*"chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> *Reply-To:*Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här:http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here:http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list --chimerax-users@cgl.ucsf.edu To unsubscribe send an email tochimerax-users-leave@cgl.ucsf.edu Archives:https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING:Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION:Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list --chimerax-users@cgl.ucsf.edu To unsubscribe send an email tochimerax-users-leave@cgl.ucsf.edu Archives:https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
_______________________________________________ ChimeraX-users mailing list --chimerax-users@cgl.ucsf.edu To unsubscribe send an email tochimerax-users-leave@cgl.ucsf.edu Archives:https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Hi Matthias, Guillaume, I agree ChimeraX needs a way to set the sequence for a chain of an atomic model including residues for which no atomic coordinates exist. I don’t think it should be tied to AlphaFold database models. It would be nice to be able to say copy the sequence of model #1 chain B to model #2 chain C which would allow you to use the full-length alphafold model sequence if that is convenient. But if you have the sequence in a fasta file or simply want to copy and paste the sequence into a ChimeraX command (e.g. sequence copy AFDS…. toChain #2/C) then that should be easy to do also. I’ve made a ticket to add a command to allow this, and I put your email address on the ticket https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10392 If you have more thoughts about it reply to the email for that enhancement request ticket so we can keep track of the ideas, needs, solutions. Regarding the other suggestion that ChimeraX should be saving this info by default, e.g. what the save command “bestGuess” option does. That can also be discussed on the ticket. But I think it is not so simple. The bestGuess option I think uses the existing residues with atomic coordinates. But that isn’t going to handle gaps where you don’t have atomic coordinates for parts of your experimental sample. It is important not to put the wrong information in the file because this causes even more difficult problems later. Tom
On Dec 21, 2023, at 5:33 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi everyone,
I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition!
Many thanks and best wishes, Matthias
From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file?
Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom,
Getting back to this much later than I wanted, sorry.
I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
From: Tom Goddard <goddard@sonic.net <mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se <mailto:guillaume.gaullier@kemi.uu.se>> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net <mailto:goddard@sonic.net>> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu <mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se <mailto:guillaume.gaullier@kemi.uu.se>>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Thank you Tom. And I agree this functionality shouldn't be tied to alphafold models; all the example uses you describe make a lot of sense and it would be good to have all these options indeed. Regarding the defaults, I was asking out of curiosity. I understand that there are often good reasons why the defaults are what they are. In this particular case the default behavior of dropping the full-length sequence is a bit surprising (at least to me, not knowing all the other intricacies of the mmCIF format), and on top of this the option to change this default is not straightforward to find, hence my question. Cheers, Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 22, 2023 1:14:44 AM To: "Vorländer,Matthias Kopano"; Guillaume Gaullier Cc: ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Matthias, Guillaume, I agree ChimeraX needs a way to set the sequence for a chain of an atomic model including residues for which no atomic coordinates exist. I don’t think it should be tied to AlphaFold database models. It would be nice to be able to say copy the sequence of model #1 chain B to model #2 chain C which would allow you to use the full-length alphafold model sequence if that is convenient. But if you have the sequence in a fasta file or simply want to copy and paste the sequence into a ChimeraX command (e.g. sequence copy AFDS…. toChain #2/C) then that should be easy to do also. I’ve made a ticket to add a command to allow this, and I put your email address on the ticket https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10392 If you have more thoughts about it reply to the email for that enhancement request ticket so we can keep track of the ideas, needs, solutions. Regarding the other suggestion that ChimeraX should be saving this info by default, e.g. what the save command “bestGuess” option does. That can also be discussed on the ticket. But I think it is not so simple. The bestGuess option I think uses the existing residues with atomic coordinates. But that isn’t going to handle gaps where you don’t have atomic coordinates for parts of your experimental sample. It is important not to put the wrong information in the file because this causes even more difficult problems later. Tom On Dec 21, 2023, at 5:33 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote: Hi everyone, I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition! Many thanks and best wishes, Matthias From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi Tom and Guillaume, Thanks a lot for making the ticket! I think the solution that Tom has outlined would be ideal to offer the greatest flexibility. ChimeraX really facilitates model building for large EM structures, as a user it's really great to witness the continued improvements. Thank you all for that! Best wishes from Vienna, Matthias Sent from Outlook for Android<https://aka.ms/AAb9ysg> ________________________________ From: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Sent: Friday, December 22, 2023 9:22:49 AM To: Tom Goddard <goddard@sonic.net>; Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Thank you Tom. And I agree this functionality shouldn't be tied to alphafold models; all the example uses you describe make a lot of sense and it would be good to have all these options indeed. Regarding the defaults, I was asking out of curiosity. I understand that there are often good reasons why the defaults are what they are. In this particular case the default behavior of dropping the full-length sequence is a bit surprising (at least to me, not knowing all the other intricacies of the mmCIF format), and on top of this the option to change this default is not straightforward to find, hence my question. Cheers, Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 22, 2023 1:14:44 AM To: "Vorländer,Matthias Kopano"; Guillaume Gaullier Cc: ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Matthias, Guillaume, I agree ChimeraX needs a way to set the sequence for a chain of an atomic model including residues for which no atomic coordinates exist. I don’t think it should be tied to AlphaFold database models. It would be nice to be able to say copy the sequence of model #1 chain B to model #2 chain C which would allow you to use the full-length alphafold model sequence if that is convenient. But if you have the sequence in a fasta file or simply want to copy and paste the sequence into a ChimeraX command (e.g. sequence copy AFDS…. toChain #2/C) then that should be easy to do also. I’ve made a ticket to add a command to allow this, and I put your email address on the ticket https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10392 If you have more thoughts about it reply to the email for that enhancement request ticket so we can keep track of the ideas, needs, solutions. Regarding the other suggestion that ChimeraX should be saving this info by default, e.g. what the save command “bestGuess” option does. That can also be discussed on the ticket. But I think it is not so simple. The bestGuess option I think uses the existing residues with atomic coordinates. But that isn’t going to handle gaps where you don’t have atomic coordinates for parts of your experimental sample. It is important not to put the wrong information in the file because this causes even more difficult problems later. Tom On Dec 21, 2023, at 5:33 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote: Hi everyone, I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition! Many thanks and best wishes, Matthias From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file? Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now. On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Hi Tom, Getting back to this much later than I wanted, sorry. I think it is as simple as this: To save the final model from ISOLDE I use the command: isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained: save model.cif models #<model-ID> bestGuess true computedSheets true So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊 Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd. Have a nice Christmas break. Guillaume ________________________________ From: Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillaume, It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that. My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file. Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost. So here's an example of how to add back the full sequence open P12882 from alphafold # This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold # Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like # loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 # Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer. Tom On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> wrote: Thanks Tom, I will try this on Monday! Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq. Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread). This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously. Thank you again, and I will report next week. Have a good weekend, Guillaume On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net<mailto:goddard@sonic.net>> wrote: Hi Guillaume, You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif. Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence. open refined.cif save refined_noseq.pdb close open fullalphafold.pdb save alphafold_withseq.pdb close # In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ... open refined_withseq.pdb save refined_withseq.cif I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format. Tom On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> wrote: Thank you, good to know! I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix. Cheers, Guillaume ________________________________ From: Greg Couch <gregc@cgl.ucsf.edu<mailto:gregc@cgl.ucsf.edu>> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>" <chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu>> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se<mailto:guillaume.gaullier@kemi.uu.se>> Hello, Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning: Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed. When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice: Not saving entity_poly_seq for non-authoritative sequences The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file? Thank you in advance, Guillaume När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
Hi Guillame, As I understand it, the sequence is never dropped. If you read in an mmCIF that already has the full information, that information is kept. The conversation was regarding new structures being built that did not already have this information. E.g. if I use open 1www seq chain /V .... you can see the black outline boxes around the residues in the sequence of chain V that are missing from coordinates save test.cif close session open test.cif seq chain /V ... the whole sequence is still there Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Dec 22, 2023, at 12:22 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you Tom. And I agree this functionality shouldn't be tied to alphafold models; all the example uses you describe make a lot of sense and it would be good to have all these options indeed.
Regarding the defaults, I was asking out of curiosity. I understand that there are often good reasons why the defaults are what they are. In this particular case the default behavior of dropping the full-length sequence is a bit surprising (at least to me, not knowing all the other intricacies of the mmCIF format), and on top of this the option to change this default is not straightforward to find, hence my question.
Cheers,
Guillaume
From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 22, 2023 1:14:44 AM To: "Vorländer,Matthias Kopano"; Guillaume Gaullier Cc: ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Matthias, Guillaume,
I agree ChimeraX needs a way to set the sequence for a chain of an atomic model including residues for which no atomic coordinates exist. I don’t think it should be tied to AlphaFold database models. It would be nice to be able to say copy the sequence of model #1 chain B to model #2 chain C which would allow you to use the full-length alphafold model sequence if that is convenient. But if you have the sequence in a fasta file or simply want to copy and paste the sequence into a ChimeraX command (e.g. sequence copy AFDS…. toChain #2/C) then that should be easy to do also.
I’ve made a ticket to add a command to allow this, and I put your email address on the ticket
https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10392
If you have more thoughts about it reply to the email for that enhancement request ticket so we can keep track of the ideas, needs, solutions.
Regarding the other suggestion that ChimeraX should be saving this info by default, e.g. what the save command “bestGuess” option does. That can also be discussed on the ticket. But I think it is not so simple. The bestGuess option I think uses the existing residues with atomic coordinates. But that isn’t going to handle gaps where you don’t have atomic coordinates for parts of your experimental sample. It is important not to put the wrong information in the file because this causes even more difficult problems later.
Tom
On Dec 21, 2023, at 5:33 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi everyone,
I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition!
Many thanks and best wishes, Matthias
From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file?
Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote: Hi Tom,
Getting back to this much later than I wanted, sorry. I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
I think this happened for the model with which I used swapaa. But it is also very likely that I mis-saved something at some point, because I paused and resumed several times and generated several intermediate files. But good to know that under normal circumstances the sequence should be retained. Guillaume ________________________________ From: Elaine Meng <meng@cgl.ucsf.edu> Sent: Friday, December 22, 2023 5:14:12 PM To: Guillaume Gaullier Cc: "Vorländer,Matthias Kopano"; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file? Hi Guillame, As I understand it, the sequence is never dropped. If you read in an mmCIF that already has the full information, that information is kept. The conversation was regarding new structures being built that did not already have this information. E.g. if I use open 1www seq chain /V .... you can see the black outline boxes around the residues in the sequence of chain V that are missing from coordinates save test.cif close session open test.cif seq chain /V ... the whole sequence is still there Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Dec 22, 2023, at 12:22 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you Tom. And I agree this functionality shouldn't be tied to alphafold models; all the example uses you describe make a lot of sense and it would be good to have all these options indeed.
Regarding the defaults, I was asking out of curiosity. I understand that there are often good reasons why the defaults are what they are. In this particular case the default behavior of dropping the full-length sequence is a bit surprising (at least to me, not knowing all the other intricacies of the mmCIF format), and on top of this the option to change this default is not straightforward to find, hence my question.
Cheers,
Guillaume
From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 22, 2023 1:14:44 AM To: "Vorländer,Matthias Kopano"; Guillaume Gaullier Cc: ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Matthias, Guillaume,
I agree ChimeraX needs a way to set the sequence for a chain of an atomic model including residues for which no atomic coordinates exist. I don’t think it should be tied to AlphaFold database models. It would be nice to be able to say copy the sequence of model #1 chain B to model #2 chain C which would allow you to use the full-length alphafold model sequence if that is convenient. But if you have the sequence in a fasta file or simply want to copy and paste the sequence into a ChimeraX command (e.g. sequence copy AFDS…. toChain #2/C) then that should be easy to do also.
I’ve made a ticket to add a command to allow this, and I put your email address on the ticket
https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10392
If you have more thoughts about it reply to the email for that enhancement request ticket so we can keep track of the ideas, needs, solutions.
Regarding the other suggestion that ChimeraX should be saving this info by default, e.g. what the save command “bestGuess” option does. That can also be discussed on the ticket. But I think it is not so simple. The bestGuess option I think uses the existing residues with atomic coordinates. But that isn’t going to handle gaps where you don’t have atomic coordinates for parts of your experimental sample. It is important not to put the wrong information in the file because this causes even more difficult problems later.
Tom
On Dec 21, 2023, at 5:33 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi everyone,
I also just struggled with the demands of CIF file deposition to the PDB for a ~50 chain RNA-protein complex and the time it takes to copy-paste sequence into different web interfaces. Would it be possible to interface the ChimeraX sequence association functionalities to write the associated sequence into the mmCIF file? At the end of building large models using many cryo-EM maps the structures often have been cut apart and pieced together many times, so my dream scenario would be to use “alpafold match #model trim false“ to directly write the full-length sequence of the alphafold model into the mmcif file or PDB header for the matches chain. Would be of huge value for PDB deposition!
Many thanks and best wishes, Matthias
From: Tristan Croll via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Thursday, 21. December 2023 at 14:23 To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> Cc: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] Re: How to save a sequence in an mmCIF file?
Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote: Hi Tom,
Getting back to this much later than I wanted, sorry. I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
From: Tom Goddard <goddard@sonic.net> Sent: Friday, December 1, 2023 11:19:39 PM To: Guillaume Gaullier Cc: Greg Couch; ChimeraX Users Help Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name # "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif" # which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file. # Here is what that table looks like
# loop_ _entity_poly_seq.entity_id _entity_poly_seq.hetero _entity_poly_seq.mon_id _entity_poly_seq.num 1 n MET 1 1 n SER 2 1 n SER 3 1 n ASP 4 1 n SER 5 1 n GLU 6 1 n MET 7 1 n ALA 8 1 n ILE 9 1 n PHE 10 1 n GLY 11 ... 1 n GLU 1938 1 n GLU 1939 #
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week. Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cif save refined_noseq.pdb close
open fullalphafold.pdb save alphafold_withseq.pdb close
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb # The SEQRES lines look like: # SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG # SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN # SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS # ...
open refined_withseq.pdb save refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu> Sent: Friday, December 1, 2023 1:22:34 AM To: ChimeraX Users Help Cc: Guillaume Gaullier Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?
To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence". In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input. Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine. -- Greg On 11/29/23 02:45, Guillaume Gaullier wrote: From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] How to save a sequence in an mmCIF file? Date: November 29, 2023 at 2:45:31 AM PST To: "chimerax-users@cgl.ucsf.edu" <chimerax-users@cgl.ucsf.edu> Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_________________________... ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917 Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
participants (6)
-
Elaine Meng
-
Greg Couch
-
Guillaume Gaullier
-
Tom Goddard
-
Tristan Croll
-
Vorländer,Matthias Kopano