ChimeraX generated mmCIF files not suitable for PDB deposition

Dear ChimeraX team, Structure files in .cif format that are generated by ChimeraX are unfortunately not suitable for deposition to the PDB. This is due to lack of the column _entity_poly.pdbx_strand_id in the header. The column is usually found in the header in this table and contains the chain Identifier (i.e ‘A’ for chain A): loop_ _entity_poly.entity_id _entity_poly.nstd_monomer _entity_poly.type _entity_poly.pdbx_seq_one_letter_code_can Since ChimeraX does an excellent at converting PDBs to mCIFs otherwise, would it be possible to have ChimeraX include this column in the header of mmCIF files? This would greatly help to facilitate the tiresome process of depositing models to the PDB. Many thanks in advance, Best, Matthias

The PDB's documentation for entity_poly.pdbx_strand_id says it is not required -- https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.p.... And the documentation is says it is: "The PDB strand/chain id(s) corresponding to this polymer entity." That seems poorly defined. I'll ask the PDB for clarification. I am surprised that the PDB is requiring entity_poly,pdbx_strand_id, since it should be computable from the rest of the file. And if they computed it, there would be no ambiguities. But even if that is fixed, I suspect that the PDB will still have issues with the fact that ChimeraX outputs all sheet ids as unknown. And that is because ChimeraX currently does not keep track of the sheets, only the strands. To get unique sheet ids, give the "computsedSheets true" option -- "save foobar.cif computedSheets true". Then the mmCIF writer will run the DSSP computation and use that to generate the sheet/strand information. But that overrides any choices that were in the input file. HTH, Greg On 7/11/24 12:52, Vorländer,Matthias Kopano via ChimeraX-users wrote:
Dear ChimeraX team,
Structure files in .cif format that are generated by ChimeraX are unfortunately not suitable for deposition to the PDB. This is due to lack of the column *_entity_poly.pdbx_strand_id *in the header*. *The column is usually found in the header in this table and contains the chain Identifier (i.e ‘A’ for chain A):
loop_
_entity_poly.entity_id
_entity_poly.nstd_monomer
_entity_poly.type
_entity_poly.pdbx_seq_one_letter_code_can
Since ChimeraX does an excellent at converting PDBs to mCIFs otherwise, would it be possible to have ChimeraX include this column in the header of mmCIF files? This would greatly help to facilitate the tiresome process of depositing models to the PDB.
Many thanks in advance,
Best,
Matthias

Hi Greg, Thanks a lot for the clarification. I can only say from experience that ChimeraX mmCIFs throw an error upon upload due to the lack of entity_poly.pdbx_strand_id. If I copy it from a previous PDB generated file into the header using a text editor, the upload works fine. Regarding the sheets, the lack of sheets does not generate an upload error for the PDB. I tried the computeSheets true option through the GUI and it never worked – I just found a bug that when you click the”Computed sheets” check box in the save dialog, ChimeraX passes the wrong command: save<help:user/commands/save.html> "Models/For_Deposition/test.cif" models #2 fixedWidth false computeSheets true Expected a keyword If I manually edit it to “computedSheet” then the saving works. Thanks and best wishes, Matthias From: Greg Couch <gregc@cgl.ucsf.edu> Date: Thursday, July 11, 2024 at 22:55 To: Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at>, ChimeraX-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] ChimeraX generated mmCIF files not suitable for PDB deposition The PDB's documentation for entity_poly.pdbx_strand_id says it is not required -- https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.p.... And the documentation is says it is: "The PDB strand/chain id(s) corresponding to this polymer entity." That seems poorly defined. I'll ask the PDB for clarification. I am surprised that the PDB is requiring entity_poly,pdbx_strand_id, since it should be computable from the rest of the file. And if they computed it, there would be no ambiguities. But even if that is fixed, I suspect that the PDB will still have issues with the fact that ChimeraX outputs all sheet ids as unknown. And that is because ChimeraX currently does not keep track of the sheets, only the strands. To get unique sheet ids, give the "computsedSheets true" option -- "save foobar.cif computedSheets true". Then the mmCIF writer will run the DSSP computation and use that to generate the sheet/strand information. But that overrides any choices that were in the input file. HTH, Greg On 7/11/24 12:52, Vorländer,Matthias Kopano via ChimeraX-users wrote: Dear ChimeraX team, Structure files in .cif format that are generated by ChimeraX are unfortunately not suitable for deposition to the PDB. This is due to lack of the column _entity_poly.pdbx_strand_id in the header. The column is usually found in the header in this table and contains the chain Identifier (i.e ‘A’ for chain A): loop_ _entity_poly.entity_id _entity_poly.nstd_monomer _entity_poly.type _entity_poly.pdbx_seq_one_letter_code_can Since ChimeraX does an excellent at converting PDBs to mCIFs otherwise, would it be possible to have ChimeraX include this column in the header of mmCIF files? This would greatly help to facilitate the tiresome process of depositing models to the PDB. Many thanks in advance, Best, Matthias

Please feel free to use the "Help / Report a Bug" dialog when you encounter bugs like "computeSheets" vs "computedSheets". In this case, however, tt's fixed in the next daily build. -- Greg On 7/11/24 14:34, Vorländer,Matthias Kopano wrote:
Hi Greg,
Thanks a lot for the clarification. I can only say from experience that ChimeraX mmCIFs throw an error upon upload due to the lack of entity_poly.pdbx_strand_id. If I copy it from a previous PDB generated file into the header using a text editor, the upload works fine. Regarding the sheets, the lack of sheets does not generate an upload error for the PDB. I tried the computeSheets true option through the GUI and it never worked – I just found a bug that when you click the”Computed sheets” check box in the save dialog, ChimeraX passes the wrong command:
*save <help:user/commands/save.html> "Models/For_Deposition/test.cif" models #2 fixedWidth false **computeSheets true***
*Expected a keyword*
If I manually edit it to “compute*d*Sheet” then the saving works.
Thanks and best wishes,
Matthias
*From: *Greg Couch <gregc@cgl.ucsf.edu> *Date: *Thursday, July 11, 2024 at 22:55 *To: *Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at>, ChimeraX-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> *Subject: *Re: [chimerax-users] ChimeraX generated mmCIF files not suitable for PDB deposition
The PDB's documentation for entity_poly.pdbx_strand_id says it is not required -- https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.p.... And the documentation is says it is: "The PDB strand/chain id(s) corresponding to this polymer entity." That seems poorly defined. I'll ask the PDB for clarification. I am surprised that the PDB is requiring entity_poly,pdbx_strand_id, since it should be computable from the rest of the file. And if they computed it, there would be no ambiguities.
But even if that is fixed, I suspect that the PDB will still have issues with the fact that ChimeraX outputs all sheet ids as unknown. And that is because ChimeraX currently does not keep track of the sheets, only the strands. To get unique sheet ids, give the "computsedSheets true" option -- "save foobar.cif computedSheets true". Then the mmCIF writer will run the DSSP computation and use that to generate the sheet/strand information. But that overrides any choices that were in the input file.
HTH,
Greg
On 7/11/24 12:52, Vorländer,Matthias Kopano via ChimeraX-users wrote:
Dear ChimeraX team,
Structure files in .cif format that are generated by ChimeraX are unfortunately not suitable for deposition to the PDB. This is due to lack of the column *_entity_poly.pdbx_strand_id *in the header*. *The column is usually found in the header in this table and contains the chain Identifier (i.e ‘A’ for chain A):
loop_
_entity_poly.entity_id
_entity_poly.nstd_monomer
_entity_poly.type
_entity_poly.pdbx_seq_one_letter_code_can
Since ChimeraX does an excellent at converting PDBs to mCIFs otherwise, would it be possible to have ChimeraX include this column in the header of mmCIF files? This would greatly help to facilitate the tiresome process of depositing models to the PDB.
Many thanks in advance,
Best,
Matthias

I talked to the PDB about this error. The PDB's OneDep deposition system generates the entity_poly.pdbx_strand_id data item if there is associated entity_poly data in the file or if the full sequence is given as part of the deposition process. So I suspect that the real bug is that there is missing sequence information for some/all of the chains in your molecule. And, by default, ChimeraX does not save the entity_poly table for chains if it wasn't read in. Which can lead to the error. If this is not the case, then please share with me one of the problematic mmCIF files, so I can reproduce the exact problem. Or you could work with the PDB directly. There are several ways to get the sequence information into the mmCIF output: (A) If you are using phenix, using the full sequence, you post-process the mmCIF file with phenix's pdb_extract tool. (B) If you are using ChimeraX 1.9 (currently the daily build), after you open the full sequence, you can use the "sequence update" command or the use the context menu in the Sequence Viewer and the Structure / Update Chain Sequence dialog. (C) If the sequence of the molecule is the full sequence, then you can use "save model.cif bestGuess true". bestGuess can't tell how big gaps are, nor fill in the missing head or tail residues, so you're better off with A or B. Consequently, at this time, there are no plans to add the entity_poly.pdbx_strand_id data item to ChimeraX's mmCIF output. HTH, Greg
participants (2)
-
Greg Couch
-
Vorländer,Matthias Kopano