Ha, great! I was just about to open an issue on your github repo. No need then.
Thanks, this is a great Christmas present!
Guillaume
From: Tristan Croll <tcroll@altoslabs.com>
Sent: Thursday, December 21, 2023 2:22:43 PM
To: Guillaume Gaullier
Cc: Tom Goddard; ChimeraX Users Help
Subject: Re: [chimerax-users] Re: How to save a sequence in an mmCIF file?Perfect timing - I was just getting ready to upload the final version of ISOLDE 1.7. :) Will add this now.
On Thu, Dec 21, 2023 at 1:14 PM Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
_______________________________________________Hi Tom,
Getting back to this much later than I wanted, sorry.
I think it is as simple as this:
To save the final model from ISOLDE I use the command:
isolde write phenixRsrInput #<model-ID> <resolution> #<map-ID> modelFileName model_ISOLDE-done.cif paramFileName model_ISOLDE-done_for-phenix.eff
And this is at this step that the sequence is lost. When I save with ChimeraX's command and adequate options suggested by Greg, the full sequence that existed in the original file from AlphaFold-DB is correctly retained:
save model.cif models #<model-ID> bestGuess true computedSheets true
So, I need to discipline myself to always save a file with ChimeraX's command in addition to ISOLDE's. That is, until Tristan sees this discussion and adds this to ISOLDE. 😊
Is there a reason why the default values to the options to 'save' don't include the full sequence and secondary structure assignments? Not a big deal once one knows about it, but having the defaults drop some information seems odd.
Have a nice Christmas break.
Guillaume
From: Tom Goddard <goddard@sonic.net>
Sent: Friday, December 1, 2023 11:19:39 PM
To: Guillaume Gaullier
Cc: Greg Couch; ChimeraX Users Help
Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?Hi Guillaume,
It will be useful to see what problems you found with mmtbx.prepare_pdb_deposition and the ChimeraX mmCIF reader. Thanks in advance for providing that.
My instructions about adding the sequence were for an AlphaFold prediction that you run, which produces a PDB file. But since you instead fetched the initial model from the AlphaFoldDB database, that actually gives an mmCIF file, and that mmCIF file contains the entity_poly_seq table. So it is actually much simpler to add the full sequence in that case. Just open the original AlphaFoldDB file in a text editor, search for the entity_poly_seq table and copy it to your refined mmCIF file.
Now I'm slightly puzzled because if you open the AlphaFoldDB .cif file, delete some residues, then save it as a new .cif file, then that new file does have the entity_poly_seq table with the full sequence and ChimeraX will know the full sequence when you load that new file. But maybe when you refine with ISOLDE it makes a copy of the atomic model which loses the mmCIF entity_poly_seq metadata so that if you save that then the sequence was lost.
So here's an example of how to add back the full sequence
open P12882 from alphafold
# This file is fetched from AlphaFoldDB and the ChimeraX Log shows the file name# "Fetching compressed AlphaFold P12882 from https://alphafold.ebi.ac.uk/files/AF-P12882-F1-model_v4.cif"# which ChimeraX puts on your computer in ~/Downloads/ChimeraX/AlphaFold
# Now open AF-P12882-F1-model_v4.cif in a text editor, search for entity_poly_seq table and copy it into your .cif file.# Here is what that table looks like
#loop__entity_poly_seq.entity_id_entity_poly_seq.hetero_entity_poly_seq.mon_id_entity_poly_seq.num1 n MET 11 n SER 21 n SER 31 n ASP 41 n SER 51 n GLU 61 n MET 71 n ALA 81 n ILE 91 n PHE 101 n GLY 11...1 n GLU 19381 n GLU 1939#
Then opening your .cif file in ChimeraX will show the full sequence in the sequence viewer.
Tom
On Dec 1, 2023, at 1:56 PM, Guillaume Gaullier <guillaume.gaullier@kemi.uu.se> wrote:
Thanks Tom, I will try this on Monday!
Using mmtbx.prepare_pdb_deposition from phenix seemed to work, in that the command ran and did not report any error. But opening the cif file produced by this tool in ChimeraX logged more warnings; I will report which ones exactly on Monday when I am at the lab (I don’t have these files at hand now). And displaying its sequence resulted in the same problem as I had initially (the missing residues in the atomic model are not shown at all in the sequence view). I will try to inspect the content of this cif file to check that the phenix tool did its job (maybe I didn’t use it correctly, although it has so few options that this seems unlikely), but it is difficult because the coordinates in this file are no longer the same as in the cif file from AlphaFold-DB, so trying to diff the two files of course results in too much output and is not helpful to quickly identify differences in entity_poly_seq.
Do I understand it correctly that the procedure you suggest assumes I downloaded a PDB file from AlphaFold-DB? I can do this easily. But I just wanted to clarify that what I did initially was fetch the AlphaFold prediction directly from within ChimeraX (open <uniprot-id> from alphafold), dock it in the map, trim residues not supported by density, and finally save it as mmCIF. Unfortunately without the options suggested by Greg, because at the time I didn’t know I would lose the sequence information when saving with default options. And when I noticed, I thought there would be an easy way to fix it (which prompted my first email in this thread).
This wouldn’t be a big deal if I was going to be the only person ever seeing these files, because the PDB deposition portal takes an atomic model and a full sequence and produces the correct result. But in this particular case I will share these files with other people before deposition, so I want to fix the sequence so everybody can make sense of these files unambiguously.
Thank you again, and I will report next week.Have a good weekend,
Guillaume
On 1 Dec 2023, at 22:27, Tom Goddard <goddard@sonic.net> wrote:
Hi Guillaume,
You can get the full sequence into your refined mmCIF file with the following cumbersome procedure. If you open your original full-length AlphaFold .pdb file and save it as .cif in ChimeraX then ChimeraX won't put the sequence info in the file, even if you haven't deleted any residues, because the original AlphaFold .pdb file does not have SEQRES records. But the ChimeraX .pdb file writer works better. If you open your original full AlphaFold .pdb model, then save it as PDB in ChimeraX it will add the SEQRES records. Then you can copy those SEQRES records (a few lines at the top of the file) using a text editor into your refined .pdb file which does not have all the residues. Since you are using mmCIF for your refined model, you would first open that and save it as .pdb format in order to add the SEQRES, then you could resave that .pdb as .cif.
Here are the ChimeraX commands, assuming fullalphafold.pdb is the original AlphaFold model, and refined.cif is your refined model which lacks the full sequence.
open refined.cifsave refined_noseq.pdbclose
open fullalphafold.pdbsave alphafold_withseq.pdbclose
# In a text editor copy the SEQRES lines from alphafold_withseq.pdb to refined_noseq.pdb to make refined_withseq.pdb# The SEQRES lines look like:# SEQRES 1 A 125 GLY HIS MET HIS ASP CYS HIS GLN VAL THR VAL SER ARG# SEQRES 2 A 125 ASP VAL THR LEU GLN ASN LYS GLU ARG HIS ASP CYS ASN# SEQRES 3 A 125 GLN VAL CYS ALA SER ILE ASP LYS GLU THR GLU ASN LYS# ...
open refined_withseq.pdbsave refined_withseq.cif
I'm kind of surprised that the "bestGuess" option Greg described does not actually put the entity_poly_seq table in the mmCIF. If it did you could avoid all the monkey-business of converting to .pdb format, and just output the original AlphaFold model with bestGuess true as mmCIF and copy its entity_poly_seq table to your refined .cif file. But at least you can get the job done using the old .pdb format.
Tom
On Dec 1, 2023, at 12:43 AM, Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thank you, good to know!
I think in this case I can't get my FL sequence with ChimeraX without loosing a lot of my refinement work, because I saved my progress (without using these options...) while working in ISOLDE, re-opening the work-in-progress model every time I resumed working on it. But I will try the program in phenix.
Cheers,
Guillaume
From: Greg Couch <gregc@cgl.ucsf.edu>
Sent: Friday, December 1, 2023 1:22:34 AM
To: ChimeraX Users Help
Cc: Guillaume Gaullier
Subject: Re: [chimerax-users] How to save a sequence in an mmCIF file?_______________________________________________To answer the general question, ChimeraX doesn't yet have a way to set the full sequence for a mmCIF entity. Seehttps://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files for how to fix the mmCIF output from various refinement packages. For example, the Phenix has "mmtbx.prepare_pdb_deposition program to create a mmCIF file with the sequence".In this particular case, where the starting structure is an Alphafold prediction with atoms for every residue in the full sequence, you can get the correct sequence into the mmCIF output with the "bestGuess" option. See https://www.cgl.ucsf.edu/chimerax/docs/user/commands/save.html#mmcif. I'd also recommend, in your case, using the computedSheets option. In older ChimeraX's, you need to run dssp before using computedSheets to get the helix information. In recent ChimeraX's (daily build and 1.7 release candidate), computedSheets will also output the helix information if it wasn't present in the input.Adding a sequence with bestGuess can be deceiving of because missing leading or trailing residues, or gaps of indeterminate length. But in this case, you should be fine.
-- Greg
On 11/29/23 02:45, Guillaume Gaullier wrote:
From: Guillaume Gaullier via ChimeraX-users <chimerax-users@cgl.ucsf.edu>
Subject: [chimerax-users] How to save a sequence in an mmCIF file?
Date: November 29, 2023 at 2:45:31 AM PST
Reply-To: Guillaume Gaullier <guillaume.gaullier@kemi.uu.se>
Hello,
Starting from an AlphaFold prediction, I refined a model against a map with ISOLDE. I trimmed the segments not supported by any density. The resulting mmCIF file that I saved now opens with this warning:
Unknown polymer entity '1' near line 187
Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Displaying the sequence of this chain shows the correct numbering (with jumps in numbering according to missing segments in the structure), but the sequence of the missing structure segments is not displayed.
When I open the fasta file containing the full-length sequence, the sequence gets automatically associated to the structure, and the sequence viewer annotates the segments with missing structure correctly. I would like to save this full-length sequence in my mmCIF file so the full-length sequence with annotated missing structure segments shows up next time I open this file. But when I try to save at this point, I get the following notice:
Not saving entity_poly_seq for non-authoritative sequences
The documentation for "save" and "sequence" didn't help. How can I make this sequence "authoritative" and save it into my mmCIF file?
Thank you in advance,
Guillaume
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy_______________________________________________
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu
To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu
Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu
To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu
Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu
To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu
Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Altos Labs UK Limited | England | Company reg 13484917Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.