Re: [Chimera-users] [chimera-dev] distance measurements

Hi Feixia! Yes, you could use Chimera to measure all lysine CA-CA distances in many structures, but it would require some scripting to loop through the structures, find the lysines, and do the measurements. There is some information on looping through structures and running Chimera commands here: <http://www.rbvi.ucsf.edu/chimera/docs/ProgrammersGuide/basicPrimer.html> Now, for each structure, you might imagine the script should first find all the lysines and then use the “distance" command on each pairwise combination, substituting in the proper residue numbers. However, there is an easier way with the “findclash” command. You can use it for multiple distance measurement in a single command. For example: open 2gbp findclash :lys@ca test self overlap -1000 log true … will measure all distances among lysine CA atoms in structure 2gbp and list results in the Reply Log (open from Favorites menu). The -1000 in the command says to measure the distance even if the atoms are “overlapping” by -1000 (more than 1000 angstroms apart). The last column in the results is the atom-atom distance, and they are given in order of increasing distance, in this case: 464 contacts atom1 atom2 overlap distance LYS 191.A CA LYS 189.A CA -1.802 5.562 LYS 227.A CA LYS 223.A CA -2.137 5.897 LYS 300.A CA LYS 246.A CA -2.800 6.560 LYS 169.A CA LYS 164.A CA -4.676 8.436 LYS 270.A CA LYS 276.A CA -4.677 8.437 [… several lines removed …] LYS 276.A CA LYS 203.A CA -53.705 57.465 LYS 276.A CA LYS 137.A CA -56.756 60.516 LYS 61.A CA LYS 137.A CA -56.829 60.589 LYS 58.A CA LYS 137.A CA -59.598 63.358 464 contacts The log contents can be saved to a text file, for example see <http://plato.cgl.ucsf.edu/pipermail/chimera-users/2008-October/003184.html> However, your log would have a huge amount of results for all structures and it might be hard to tell which results go with which structures, so another possibility would be to save a separate file of results for each structure, and in that case your script would need to substitute in an output filename, e.g. findclash :lys@ca test self overlap -1000 saveFile ~/Desktop/2gbp.log See findclash documentation for all the options. Then you can embed the command in the python looping script as described in the first link above. <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/midas/findclash.html> I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Aug 18, 2015, at 8:36 AM, Feixia <feixia.chu@unh.edu> wrote:
Hi there, I am interested in retrieving distance information from large dataset in an automatic fashion. For instance, can we use Chimera to get the distances between lysine alpha-carbons of current PDB entries. Presumably, we can download all PDB structures on our local desktop, and just call functions one structure at a time. I wonder if we can do that with Chimera. Your advice will be highly appreciated. Best, Feixia

Hi Feixia, While the findclash in chimera will do, I believe you need a more lightweight tool suitable for high-throughput jobs. Chimera has a lot of overhead parsing the structure and also the python backend is not super-efficient for numerous calculations as is necessary in your case. The pdb has ~110,000 structures and even if it takes 1s to parse one structure it will still take about a day and a half to loop through all the structures. Therefore, I suggest you apply the right tool for the job, which in this case is the "ncont" tool from the ccp4 crystallographic suite available for free from http://www.ccp4.ac.uk/download/ For my test set of 20 pdbs it was about 15 times faster than chimera. A ncont input file for checking all :LYS@CA against all :LYS@CA would look like: ------ncont.inp------- source //(LYS)/CA target //(LYS)/CA maxdist 1000 sort distance inc ---------------------- You can run this file with the command: ncont xyzin structure.pdb < ncont.inp > ncont.out If you want to run it on every pdb file in the folder just run a loop: for f in *.pdb; do ncont xyzin "$f" < ncont.inp > "$f".dist; echo "$f finished"; done For doing the same thing in chimera I used this script: ---------------------------dist.py--------------------------------- # The script loops over all pdb files in the same folder as the # script is residing in and saves all distances between # lysine C-alpha atoms to a file. # # This script should be run with the following command: # chimera --silent --nogui dist.py import chimera from chimera import runCommand as rc from os import listdir # Make sure we only get pdbs from the current folder files = [ f for f in listdir('.') if f.endswith(".pdb")] # Loop over pdb files for f in files : rc('open %s' % f) rc('findclash :lys@ca test self overlap -1000 bondSeparation 1 saveFile %s.dist' % f) rc('close 0') print('%s done' % f) -------------------------------------------------------------------- Best, Matej ------------------------------------------------------ Dr. Matej Repic Ecole Polytechnique Fédérale de Lausanne Laboratory of Computational Chemistry and Biochemistry SB - ISIC LCBC BCH 4108 CH - 1015 Lausanne ------------------------------------------------------ On 8/19/15, 19:55, "chimera-users-bounces@cgl.ucsf.edu on behalf of Elaine Meng" <chimera-users-bounces@cgl.ucsf.edu on behalf of meng@cgl.ucsf.edu> wrote:
Hi Feixia! Yes, you could use Chimera to measure all lysine CA-CA distances in many structures, but it would require some scripting to loop through the structures, find the lysines, and do the measurements.
There is some information on looping through structures and running Chimera commands here: <http://www.rbvi.ucsf.edu/chimera/docs/ProgrammersGuide/basicPrimer.html>
Now, for each structure, you might imagine the script should first find all the lysines and then use the ³distance" command on each pairwise combination, substituting in the proper residue numbers. However, there is an easier way with the ³findclash² command. You can use it for multiple distance measurement in a single command. For example:
open 2gbp findclash :lys@ca test self overlap -1000 log true
Š will measure all distances among lysine CA atoms in structure 2gbp and list results in the Reply Log (open from Favorites menu). The -1000 in the command says to measure the distance even if the atoms are ³overlapping² by -1000 (more than 1000 angstroms apart). The last column in the results is the atom-atom distance, and they are given in order of increasing distance, in this case:
464 contacts atom1 atom2 overlap distance LYS 191.A CA LYS 189.A CA -1.802 5.562 LYS 227.A CA LYS 223.A CA -2.137 5.897 LYS 300.A CA LYS 246.A CA -2.800 6.560 LYS 169.A CA LYS 164.A CA -4.676 8.436 LYS 270.A CA LYS 276.A CA -4.677 8.437 [Š several lines removed Š] LYS 276.A CA LYS 203.A CA -53.705 57.465 LYS 276.A CA LYS 137.A CA -56.756 60.516 LYS 61.A CA LYS 137.A CA -56.829 60.589 LYS 58.A CA LYS 137.A CA -59.598 63.358 464 contacts
The log contents can be saved to a text file, for example see
<http://plato.cgl.ucsf.edu/pipermail/chimera-users/2008-October/003184.htm l>
However, your log would have a huge amount of results for all structures and it might be hard to tell which results go with which structures, so another possibility would be to save a separate file of results for each structure, and in that case your script would need to substitute in an output filename, e.g.
findclash :lys@ca test self overlap -1000 saveFile ~/Desktop/2gbp.log
See findclash documentation for all the options. Then you can embed the command in the python looping script as described in the first link above. <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/midas/findclash.html>
I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco
On Aug 18, 2015, at 8:36 AM, Feixia <feixia.chu@unh.edu> wrote:
Hi there, I am interested in retrieving distance information from large dataset in an automatic fashion. For instance, can we use Chimera to get the distances between lysine alpha-carbons of current PDB entries. Presumably, we can download all PDB structures on our local desktop, and just call functions one structure at a time. I wonder if we can do that with Chimera. Your advice will be highly appreciated. Best, Feixia
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

Thanks a lot! Best, Feixia On Wed, 19 Aug 2015 21:37:11 -0400, Repic Matej <matej.repic@epfl.ch> wrote:
Hi Feixia,
While the findclash in chimera will do, I believe you need a more lightweight tool suitable for high-throughput jobs. Chimera has a lot of overhead parsing the structure and also the python backend is not super-efficient for numerous calculations as is necessary in your case. The pdb has ~110,000 structures and even if it takes 1s to parse one structure it will still take about a day and a half to loop through all the structures. Therefore, I suggest you apply the right tool for the job, which in this case is the "ncont" tool from the ccp4 crystallographic suite available for free from http://www.ccp4.ac.uk/download/
For my test set of 20 pdbs it was about 15 times faster than chimera.
A ncont input file for checking all :LYS@CA against all :LYS@CA would look like: ------ncont.inp------- source //(LYS)/CA target //(LYS)/CA maxdist 1000 sort distance inc ----------------------
You can run this file with the command: ncont xyzin structure.pdb < ncont.inp > ncont.out
If you want to run it on every pdb file in the folder just run a loop: for f in *.pdb; do ncont xyzin "$f" < ncont.inp > "$f".dist; echo "$f finished"; done
For doing the same thing in chimera I used this script: ---------------------------dist.py--------------------------------- # The script loops over all pdb files in the same folder as the # script is residing in and saves all distances between # lysine C-alpha atoms to a file. # # This script should be run with the following command: # chimera --silent --nogui dist.py
import chimera from chimera import runCommand as rc from os import listdir
# Make sure we only get pdbs from the current folder files = [ f for f in listdir('.') if f.endswith(".pdb")]
# Loop over pdb files for f in files : rc('open %s' % f) rc('findclash :lys@ca test self overlap -1000 bondSeparation 1 saveFile %s.dist' % f) rc('close 0') print('%s done' % f) --------------------------------------------------------------------
Best, Matej
------------------------------------------------------ Dr. Matej Repic Ecole Polytechnique Fédérale de Lausanne Laboratory of Computational Chemistry and Biochemistry SB - ISIC LCBC BCH 4108 CH - 1015 Lausanne
------------------------------------------------------
On 8/19/15, 19:55, "chimera-users-bounces@cgl.ucsf.edu on behalf of Elaine Meng" <chimera-users-bounces@cgl.ucsf.edu on behalf of meng@cgl.ucsf.edu> wrote:
Hi Feixia! Yes, you could use Chimera to measure all lysine CA-CA distances in many structures, but it would require some scripting to loop through the structures, find the lysines, and do the measurements.
There is some information on looping through structures and running Chimera commands here: <http://www.rbvi.ucsf.edu/chimera/docs/ProgrammersGuide/basicPrimer.html>
Now, for each structure, you might imagine the script should first find all the lysines and then use the ³distance" command on each pairwise combination, substituting in the proper residue numbers. However, there is an easier way with the ³findclash² command. You can use it for multiple distance measurement in a single command. For example:
open 2gbp findclash :lys@ca test self overlap -1000 log true
Š will measure all distances among lysine CA atoms in structure 2gbp and list results in the Reply Log (open from Favorites menu). The -1000 in the command says to measure the distance even if the atoms are ³overlapping² by -1000 (more than 1000 angstroms apart). The last column in the results is the atom-atom distance, and they are given in order of increasing distance, in this case:
464 contacts atom1 atom2 overlap distance LYS 191.A CA LYS 189.A CA -1.802 5.562 LYS 227.A CA LYS 223.A CA -2.137 5.897 LYS 300.A CA LYS 246.A CA -2.800 6.560 LYS 169.A CA LYS 164.A CA -4.676 8.436 LYS 270.A CA LYS 276.A CA -4.677 8.437 [Š several lines removed Š] LYS 276.A CA LYS 203.A CA -53.705 57.465 LYS 276.A CA LYS 137.A CA -56.756 60.516 LYS 61.A CA LYS 137.A CA -56.829 60.589 LYS 58.A CA LYS 137.A CA -59.598 63.358 464 contacts
The log contents can be saved to a text file, for example see
<http://plato.cgl.ucsf.edu/pipermail/chimera-users/2008-October/003184.htm l>
However, your log would have a huge amount of results for all structures and it might be hard to tell which results go with which structures, so another possibility would be to save a separate file of results for each structure, and in that case your script would need to substitute in an output filename, e.g.
findclash :lys@ca test self overlap -1000 saveFile ~/Desktop/2gbp.log
See findclash documentation for all the options. Then you can embed the command in the python looping script as described in the first link above. <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/midas/findclash.html>
I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco
On Aug 18, 2015, at 8:36 AM, Feixia <feixia.chu@unh.edu> wrote:
Hi there, I am interested in retrieving distance information from large dataset in an automatic fashion. For instance, can we use Chimera to get the distances between lysine alpha-carbons of current PDB entries. Presumably, we can download all PDB structures on our local desktop, and just call functions one structure at a time. I wonder if we can do that with Chimera. Your advice will be highly appreciated. Best, Feixia
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
-- Feixia Chu, Ph.D. Associate Professor Molecular, Cellular & Biomedical Sciences University of New Hampshire Gregg Hall, Rm436 35 Colovos Rd Durham, NH 03824 Tel 603 862 2436

Thank you, Elaine! Best, Feixia On Wed, 19 Aug 2015 13:55:48 -0400, Elaine Meng <meng@cgl.ucsf.edu> wrote:
Hi Feixia! Yes, you could use Chimera to measure all lysine CA-CA distances in many structures, but it would require some scripting to loop through the structures, find the lysines, and do the measurements.
There is some information on looping through structures and running Chimera commands here: <http://www.rbvi.ucsf.edu/chimera/docs/ProgrammersGuide/basicPrimer.html>
Now, for each structure, you might imagine the script should first find all the lysines and then use the “distance" command on each pairwise combination, substituting in the proper residue numbers. However, there is an easier way with the “findclash” command. You can use it for multiple distance measurement in a single command. For example:
open 2gbp findclash :lys@ca test self overlap -1000 log true
… will measure all distances among lysine CA atoms in structure 2gbp and list results in the Reply Log (open from Favorites menu). The -1000 in the command says to measure the distance even if the atoms are “overlapping” by -1000 (more than 1000 angstroms apart). The last column in the results is the atom-atom distance, and they are given in order of increasing distance, in this case:
464 contacts atom1 atom2 overlap distance LYS 191.A CA LYS 189.A CA -1.802 5.562 LYS 227.A CA LYS 223.A CA -2.137 5.897 LYS 300.A CA LYS 246.A CA -2.800 6.560 LYS 169.A CA LYS 164.A CA -4.676 8.436 LYS 270.A CA LYS 276.A CA -4.677 8.437 [… several lines removed …] LYS 276.A CA LYS 203.A CA -53.705 57.465 LYS 276.A CA LYS 137.A CA -56.756 60.516 LYS 61.A CA LYS 137.A CA -56.829 60.589 LYS 58.A CA LYS 137.A CA -59.598 63.358 464 contacts
The log contents can be saved to a text file, for example see <http://plato.cgl.ucsf.edu/pipermail/chimera-users/2008-October/003184.html>
However, your log would have a huge amount of results for all structures and it might be hard to tell which results go with which structures, so another possibility would be to save a separate file of results for each structure, and in that case your script would need to substitute in an output filename, e.g.
findclash :lys@ca test self overlap -1000 saveFile ~/Desktop/2gbp.log
See findclash documentation for all the options. Then you can embed the command in the python looping script as described in the first link above. <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/midas/findclash.html>
I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco
On Aug 18, 2015, at 8:36 AM, Feixia <feixia.chu@unh.edu> wrote:
Hi there, I am interested in retrieving distance information from large dataset in an automatic fashion. For instance, can we use Chimera to get the distances between lysine alpha-carbons of current PDB entries. Presumably, we can download all PDB structures on our local desktop, and just call functions one structure at a time. I wonder if we can do that with Chimera. Your advice will be highly appreciated. Best, Feixia
-- Feixia Chu, Ph.D. Associate Professor Molecular, Cellular & Biomedical Sciences University of New Hampshire Gregg Hall, Rm436 35 Colovos Rd Durham, NH 03824 Tel 603 862 2436
participants (3)
-
Elaine Meng
-
Feixia
-
Repic Matej