
Hi, I'm trying to find missing atoms in certain residues. I know they are in the pdb header information, but is there a way I can use Chimera or python script to access this using the command line only? I have a large number of proteins to go through, so checking them all individually is not really efficient. Korbin West

Hi Korbin, I don't know if this meets your needs, but: In Chimera, the attached python script findIncomplete.py selects templated residues (e.g. standard amino acids) that are missing any nonhydrogen atoms. It can only select atoms that are actually present, so it won't directly identify which atoms are missing. You could write a list of the selection (e.g. command "writesel") but you would need to infer which atoms were missing from that list. Also, this script wouldn't report residues that are completely missing. I hope this helps, Elaine ---------- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Nov 20, 2014, at 7:25 AM, Korbin West <khwest16@wabash.edu> wrote:
Hi, I'm trying to find missing atoms in certain residues. I know they are in the pdb header information, but is there a way I can use Chimera or python script to access this using the command line only? I have a large number of proteins to go through, so checking them all individually is not really efficient. Korbin West

On Nov 20, 2014, at 7:25 AM, Korbin West <khwest16@wabash.edu> wrote:
Hi,
I'm trying to find missing atoms in certain residues. I know they are in the pdb header information, but is there a way I can use Chimera or python script to access this using the command line only? I have a large number of proteins to go through, so checking them all individually is not really efficient.
Every Molecule model has a pdbHeaders attribute that is a dictionary whose keys are PDB records types (e.g. "REMARK", "HETNAM") and whose values are lists of text lines for that record type. Here's some sample code that would print out all the REMARK records of each model, prefixed by the model name: from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): try: for rec in m.pdbHeaders["REMARK"]: print m.name, rec except KeyError: print m.name, "has no REMARK records" Maybe you can adapt the above into a script that does what you need. --Eric Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu

This was on my ToDo list ... so here you go. If you are interested only in a list of missing atoms or residues as defined in the header, you don't need Chimera at all. These are standard strings in the PDB header. Just use grep: $ grep -H "REMARK 465 " *.pdb or $ grep -H "REMARK 470 " *.pdb Option -c gives you a count of how many REMARK records of this type are in each file, for a more concise output. (cf. http://www.wwpdb.org/documentation/format33/remarks2.html) You might also want to consider REMARKs 475 and 480 ... fantasy coordinates. On the other hand, if you don't trust the PDB header records (I myself wouldn't), and want to compare with a topology library and list all missing atoms, a small modification of Elaine's script will work. If its only "certain"(?) residues you are looking for, add a test for "r.type" after "for r in m.residues:" Cheers, B. Disclaimer: I am new to this, so if this code can be improved I am happy to learn :-) # ================================================================= # listMissingAtoms.py # # Compares the template for each residue in all models with the # actually existing atoms. Lists all atoms that are missing # from the model. Considers only existing residues and does not # consider header REMARKs. # Test with PDB files 4CH8, 3G85 and 4UMN in a local directory. # 3G85 has missing residues, but all observed residues are complete. import chimera import os from os.path import expanduser listH = False # Set to True if missing H atoms should also be listed # close all currently open models for x in chimera.openModels.list(all=True): chimera.openModels.close(x) os.chdir(expanduser("~") + "/test") # cd to a subdirectory of my home directory pdbFiles = [f for f in os.listdir('.') if f.endswith(".pdb")] out = '' for pdb in pdbFiles: chimera.openModels.open(os.getcwd() + "/" + pdb) for m in chimera.openModels.list(modelTypes=[chimera.Molecule]): missing = [] for r in m.residues: tmplRes = chimera.restmplFindResidue(r.type, False, False) atoms = '' if not tmplRes: continue # this residue type is not in the template library foundAt = r.atomNames() # atoms in pdbfile for templAt in tmplRes.atomsMap.keys(): # atoms in template if ((templAt[0] != "H" or listH) and templAt not in foundAt): # atom in template is not in pdbfile # add this atom to the string atoms += templAt + " " if atoms: # string is not empty res = ("%3s %4d%1s%1s: " % (r.type, r.id.position, r.id.insertionCode if r.id.insertionCode else " ", r.id.chainId) + atoms) # add this residue information to "missing" missing.append(res) if missing: # mising atoms were found ... # add the pdb filename to the output string out += "\n\n" + pdb + "\n" for line in missing: # add all residues to output string out += " " + line + "\n" for x in chimera.openModels.list(all=True): chimera.openModels.close(x) print(out) # === END ========================================================= On Nov 20, 2014, at 1:48 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote:
On Nov 20, 2014, at 7:25 AM, Korbin West <khwest16@wabash.edu> wrote:
Hi,
I'm trying to find missing atoms in certain residues. I know they are in the pdb header information, but is there a way I can use Chimera or python script to access this using the command line only? I have a large number of proteins to go through, so checking them all individually is not really efficient.
Every Molecule model has a pdbHeaders attribute that is a dictionary whose keys are PDB records types (e.g. "REMARK", "HETNAM") and whose values are lists of text lines for that record type. Here's some sample code that would print out all the REMARK records of each model, prefixed by the model name:
from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): try: for rec in m.pdbHeaders["REMARK"]: print m.name, rec except KeyError: print m.name, "has no REMARK records"
Maybe you can adapt the above into a script that does what you need.
--Eric
Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
participants (4)
-
Boris Steipe
-
Elaine Meng
-
Eric Pettersen
-
Korbin West