Re: [chimera-dev] Help with "looping through PDB IDs" script
data:image/s3,"s3://crabby-images/6afbe/6afbe7577c5a571d04e2d32118581c9ef7f0ad74" alt=""
Hi Navya, This later mail of yours provides some additional details that help. So for one thing you might also want to look at the Programmer's Examples (Examples), particularly the first one ("Chimera's Object Model"). At any rate, the obvious problem with the script you have so far is that you haven't defined the 'residues' variable. Let's say you are trying to work with residues 50.A, 55.A, and 70.A. Here's a little code snippet that will get those residues into the 'residues' variable once you've opened your structure: rc("sel :50.a:55.a:70.a") from chimera.selection import currentResidues residues = currentResidues() Now, I don't know where you're storing your lists of residues. Are they in the same file as the IDs? A separate file? I don't know if you need more help with that or not. So once you have the residues, then printing the atomic surface areas and bfactors is: for r in residues: for a in r.atoms: print>>outf, a, a.areaSAS, a.bfactor Getting the CASTp information is another whole can of worms however, since Chimera stores that info in the CASTp dialog rather than with the atoms and residues. Nonetheless, once you've got the rest of your script working you could try to do that part. In would involve copying the processCastpID function from CASTp/__init__.py (in your Chimera installation's 'share' folder) and deleting the last two lines and instead return the cavity list. Each CastpCavity instance has 'mouthInfo' and 'pocketInfo' attributes which are dictionaries that have an 'atoms' key. The value for the 'atoms' key is a chimera.selection.ItemizedSelection instance. You can use the contains() method of those ItemizedSelections to see if a particular atom is in the ItemizedSelection. The pocketInfo dictionary also has 'SA volume' and 'MS volume' keys you can use to get the desired volume info. --Eric Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu On Apr 16, 2014, at 12:44 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Hi,
I need a help in chimera scripting. I have text file with list of PDB IDs and corresponding residue list. I am trying to write a chimera command which will scan this file, open each PDB ID and select the corresponding residues listed in the file. After selecting, it should write out the values of atomic areaSAS, atomic Bfactor and which atom belong to CASTp identified pockets (with pocket volume). I know I am asking a lot here but I am new to chimera scripting and I searched the users mailing list and found the "looping" script,
http://www.cgl.ucsf.edu/pipermail/chimera-users/2012-February/007281.html
but it is not working for me. Attached is the script I have till now.
Please help me on this.
Thank you in advance, Navya <test.py>
import os from chimera import runCommand as rc # use 'rc' as shorthand for runCommand from chimera import replyobj # for emitting status messages # change to folder with data files os.chdir("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred") # open file of PDB IDs f = open("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred/nr-list.txt", 'r') # loop through the IDs, opening, processing, and closing each in turn for line in f: pdbID = line.strip() replyobj.status("Processing " + pdbID) # show what PDB we're working on rc("open " + pdbID) rc("surf") # surface receptor outf = open(pdbID, "w") for r in residues: print>>outf, r, r.areaSAS outf.close() rc("close all") # uncommenting the line below will cause Chimera to exit when the script is done #rc("stop now") # note that indentation is significant in Python; the fact that # the above command is exdented means that it is executed after # the loop completes, whereas the indented commands that # preceded it are executed as part of the loop.
data:image/s3,"s3://crabby-images/2b734/2b7343c2ffbb21deeaaa9d02df699329561bbf99" alt=""
Hi Eric and Elaine, Thank you so much for your valuable suggestions and time. I am trying out the scripts as per your advice. I have a slight change here. Instead of looking at the PDB IDs, I am looking at the ".pdb1" files. These are the biological units of the PDB files. So I have a folder with about 400 ".pdb1" files. For each structure, I am trying to write atomic areaSAS and atomic Bfactor values. So far I have updated my script as attached in this mail based on your inputs. I am able to get these values for the ".pdb1" file it is reading in the end, but not for all the proteins. Also, is there a way it writes out the PDB ID in the 1st column along with the corresponding values in the next columns? I am aiming to get all these values into a single file, which leads me to another issue of file size. So is there a way python can be connected to SQL Server and to write these values for each ".pdb1" structure directly into the SQL database? Kindly advise on this. I really appreciate your help with regards to my issues. Thanks in advance, Navya On Wed, Apr 16, 2014 at 7:41 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote:
Hi Navya, This later mail of yours provides some additional details that help. So for one thing you might also want to look at the Programmer's Examples ( Examples<http://www.cgl.ucsf.edu/chimera/docs/ProgrammersGuide/Examples/index.html>), particularly the first one ("Chimera's Object Model"). At any rate, the obvious problem with the script you have so far is that you haven't defined the 'residues' variable. Let's say you are trying to work with residues 50.A, 55.A, and 70.A. Here's a little code snippet that will get those residues into the 'residues' variable once you've opened your structure:
rc("sel :50.a:55.a:70.a") from chimera.selection import currentResidues residues = currentResidues()
Now, I don't know where you're storing your lists of residues. Are they in the same file as the IDs? A separate file? I don't know if you need more help with that or not.
So once you have the residues, then printing the atomic surface areas and bfactors is:
for r in residues: for a in r.atoms: print>>outf, a, a.areaSAS, a.bfactor
Getting the CASTp information is another whole can of worms however, since Chimera stores that info in the CASTp dialog rather than with the atoms and residues. Nonetheless, once you've got the rest of your script working you could try to do that part. In would involve copying the processCastpID function from CASTp/__init__.py (in your Chimera installation's 'share' folder) and deleting the last two lines and instead return the cavity list. Each CastpCavity instance has 'mouthInfo' and 'pocketInfo' attributes which are dictionaries that have an 'atoms' key. The value for the 'atoms' key is a chimera.selection.ItemizedSelection instance. You can use the contains() method of those ItemizedSelections to see if a particular atom is in the ItemizedSelection. The pocketInfo dictionary also has 'SA volume' and 'MS volume' keys you can use to get the desired volume info.
--Eric
Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu
On Apr 16, 2014, at 12:44 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Hi,
I need a help in chimera scripting. I have text file with list of PDB IDs and corresponding residue list. I am trying to write a chimera command which will scan this file, open each PDB ID and select the corresponding residues listed in the file. After selecting, it should write out the values of atomic areaSAS, atomic Bfactor and which atom belong to CASTp identified pockets (with pocket volume). I know I am asking a lot here but I am new to chimera scripting and I searched the users mailing list and found the "looping" script,
http://www.cgl.ucsf.edu/pipermail/chimera-users/2012-February/007281.html
but it is not working for me. Attached is the script I have till now.
Please help me on this.
Thank you in advance, Navya <test.py>
import os from chimera import runCommand as rc # use 'rc' as shorthand for runCommand from chimera import replyobj # for emitting status messages
# change to folder with data files os.chdir("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred")
# open file of PDB IDs f = open("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred/nr-list.txt", 'r')
# loop through the IDs, opening, processing, and closing each in turn for line in f: pdbID = line.strip() replyobj.status("Processing " + pdbID) # show what PDB we're working on rc("open " + pdbID) rc("surf") # surface receptor outf = open(pdbID, "w") for r in residues: print>>outf, r, r.areaSAS outf.close() rc("close all") # uncommenting the line below will cause Chimera to exit when the script is done #rc("stop now") # note that indentation is significant in Python; the fact that # the above command is exdented means that it is executed after # the loop completes, whereas the indented commands that # preceded it are executed as part of the loop.
data:image/s3,"s3://crabby-images/2b734/2b7343c2ffbb21deeaaa9d02df699329561bbf99" alt=""
Hi, Sorry to bother you again. With your help, I was able to get areaSAS and Bfactor values for the atoms in each .pdb1 file showing corresponding PDB IDs in the 1st column. I really thank you for this! Now I am trying to write CASTp information for each of my proteins into a separate file. As you suggested in earlier email, the processCastpID function is in the gui.py file but not in __init__.py file. Hope I am not missing anything here. As per my understanding, this function is fetching the 4 castp files of which I would require only ".poc" and ".pocInfo" files. From these two files I want to write the data of only atoms list, pocID and MS_Volume data into a single file for all 400 proteins in my dataset. Is there a link or any script available for such requirement? Again, as mentioned in my last email, since my output files will be huge in size, will I be able to write my files directly to a database table in SQL server? Kindly help me with this. Thanks and regards, Navya On Wed, Apr 16, 2014 at 11:43 PM, Navya Shilpa Josyula <njosyu2@uic.edu>wrote:
Hi Eric and Elaine,
Thank you so much for your valuable suggestions and time. I am trying out the scripts as per your advice. I have a slight change here. Instead of looking at the PDB IDs, I am looking at the ".pdb1" files. These are the biological units of the PDB files.
So I have a folder with about 400 ".pdb1" files. For each structure, I am trying to write atomic areaSAS and atomic Bfactor values. So far I have updated my script as attached in this mail based on your inputs. I am able to get these values for the ".pdb1" file it is reading in the end, but not for all the proteins. Also, is there a way it writes out the PDB ID in the 1st column along with the corresponding values in the next columns?
I am aiming to get all these values into a single file, which leads me to another issue of file size. So is there a way python can be connected to SQL Server and to write these values for each ".pdb1" structure directly into the SQL database?
Kindly advise on this. I really appreciate your help with regards to my issues.
Thanks in advance, Navya
On Wed, Apr 16, 2014 at 7:41 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote:
Hi Navya, This later mail of yours provides some additional details that help. So for one thing you might also want to look at the Programmer's Examples ( Examples<http://www.cgl.ucsf.edu/chimera/docs/ProgrammersGuide/Examples/index.html>), particularly the first one ("Chimera's Object Model"). At any rate, the obvious problem with the script you have so far is that you haven't defined the 'residues' variable. Let's say you are trying to work with residues 50.A, 55.A, and 70.A. Here's a little code snippet that will get those residues into the 'residues' variable once you've opened your structure:
rc("sel :50.a:55.a:70.a") from chimera.selection import currentResidues residues = currentResidues()
Now, I don't know where you're storing your lists of residues. Are they in the same file as the IDs? A separate file? I don't know if you need more help with that or not.
So once you have the residues, then printing the atomic surface areas and bfactors is:
for r in residues: for a in r.atoms: print>>outf, a, a.areaSAS, a.bfactor
Getting the CASTp information is another whole can of worms however, since Chimera stores that info in the CASTp dialog rather than with the atoms and residues. Nonetheless, once you've got the rest of your script working you could try to do that part. In would involve copying the processCastpID function from CASTp/__init__.py (in your Chimera installation's 'share' folder) and deleting the last two lines and instead return the cavity list. Each CastpCavity instance has 'mouthInfo' and 'pocketInfo' attributes which are dictionaries that have an 'atoms' key. The value for the 'atoms' key is a chimera.selection.ItemizedSelection instance. You can use the contains() method of those ItemizedSelections to see if a particular atom is in the ItemizedSelection. The pocketInfo dictionary also has 'SA volume' and 'MS volume' keys you can use to get the desired volume info.
--Eric
Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu
On Apr 16, 2014, at 12:44 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Hi,
I need a help in chimera scripting. I have text file with list of PDB IDs and corresponding residue list. I am trying to write a chimera command which will scan this file, open each PDB ID and select the corresponding residues listed in the file. After selecting, it should write out the values of atomic areaSAS, atomic Bfactor and which atom belong to CASTp identified pockets (with pocket volume). I know I am asking a lot here but I am new to chimera scripting and I searched the users mailing list and found the "looping" script,
http://www.cgl.ucsf.edu/pipermail/chimera-users/2012-February/007281.html
but it is not working for me. Attached is the script I have till now.
Please help me on this.
Thank you in advance, Navya <test.py>
import os from chimera import runCommand as rc # use 'rc' as shorthand for runCommand from chimera import replyobj # for emitting status messages
# change to folder with data files os.chdir("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred")
# open file of PDB IDs f = open("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred/nr-list.txt", 'r')
# loop through the IDs, opening, processing, and closing each in turn for line in f: pdbID = line.strip() replyobj.status("Processing " + pdbID) # show what PDB we're working on rc("open " + pdbID) rc("surf") # surface receptor outf = open(pdbID, "w") for r in residues: print>>outf, r, r.areaSAS outf.close() rc("close all") # uncommenting the line below will cause Chimera to exit when the script is done #rc("stop now") # note that indentation is significant in Python; the fact that # the above command is exdented means that it is executed after # the loop completes, whereas the indented commands that # preceded it are executed as part of the loop.
data:image/s3,"s3://crabby-images/6afbe/6afbe7577c5a571d04e2d32118581c9ef7f0ad74" alt=""
On Apr 17, 2014, at 2:11 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Now I am trying to write CASTp information for each of my proteins into a separate file. As you suggested in earlier email, the processCastpID function is in the gui.py file but not in __init__.py file. Hope I am not missing anything here. As per my understanding, this function is fetching the 4 castp files of which I would require only ".poc" and ".pocInfo" files. From these two files I want to write the data of only atoms list, pocID and MS_Volume data into a single file for all 400 proteins in my dataset. Is there a link or any script available for such requirement?
There are some fine points that I missed in my answer yesterday, and the situation is complicated further by your use of .pdb1 files instead of the "normal" entries. So for one thing, if you are going to use the .pdb1 files, then you are going to have to run CASTp yourself on each and then process the results. In that case you might as well also analyze the .poc and .pocInfo files yourself to determine what pocket each atom belongs to (the next-to-last field in the .poc file) and the volume of that pocket (listed in the .pocInfo file). The main point I missed in my reply, which may now be moot because of the .pdb1 thing, is that processCastpID() builds its own structure and therefore you would not open the PDB first, you would instead return the structure (along with the cavities list) from that method and make the structure available in chimera with: chimera.openModels.add([structure]) and then proceed with selecting the right residues, using currentResidues() to list them, etc. I guess if you didn't want to process the .pdb1 CASTp files yourself (after running CASTp on the .pdb1) you could use processCastpFiles() to get the cavity list and structure and proceed as I just outlined. processCastpFiles is in __init__, unlike processCastpID() as you found!
Again, as mentioned in my last email, since my output files will be huge in size, will I be able to write my files directly to a database table in SQL server?
I'm not much of an expert on this, but maybe this page would help: DatabaseInterfaces - Python Wiki --Eric
data:image/s3,"s3://crabby-images/2b734/2b7343c2ffbb21deeaaa9d02df699329561bbf99" alt=""
Hi Eric, Thank you for your reply. I am understanding that the castp python script works only for .pdb files. But it would be really time consuming for me to upload each of the .pdb1 file to castp and then download the .poc and .pocInfo files as I have to process these two files again to extract the atoms list, pocID and MS_volume values. Could you please elaborate more on how to extract the .poc and .pocInfo values for .pdb1 files from CASTp server using processCastpID() and processCastpFiles(). I need to be able to loop through all .pdb1 files and for each pdb1 file I need to get the atoms list for each pocID along with pocVolumes and to write these values into a .csv file. Thank you in advance, Navya On Thu, Apr 17, 2014 at 5:05 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote:
On Apr 17, 2014, at 2:11 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Now I am trying to write CASTp information for each of my proteins into a separate file. As you suggested in earlier email, the processCastpID function is in the gui.py file but not in __init__.py file. Hope I am not missing anything here. As per my understanding, this function is fetching the 4 castp files of which I would require only ".poc" and ".pocInfo" files. From these two files I want to write the data of only atoms list, pocID and MS_Volume data into a single file for all 400 proteins in my dataset. Is there a link or any script available for such requirement?
There are some fine points that I missed in my answer yesterday, and the situation is complicated further by your use of .pdb1 files instead of the "normal" entries.
So for one thing, if you are going to use the .pdb1 files, then you are going to have to run CASTp yourself on each and then process the results. In that case you might as well also analyze the .poc and .pocInfo files yourself to determine what pocket each atom belongs to (the next-to-last field in the .poc file) and the volume of that pocket (listed in the .pocInfo file).
The main point I missed in my reply, which may now be moot because of the .pdb1 thing, is that processCastpID() builds its own structure and therefore you would not open the PDB first, you would instead return the structure (along with the cavities list) from that method and make the structure available in chimera with:
chimera.openModels.add([structure])
and then proceed with selecting the right residues, using currentResidues() to list them, etc. I guess if you didn't want to process the .pdb1 CASTp files yourself (after running CASTp on the .pdb1) you could use processCastpFiles() to get the cavity list and structure and proceed as I just outlined. processCastpFiles *is* in __init__, unlike processCastpID() as you found!
Again, as mentioned in my last email, since my output files will be huge in size, will I be able to write my files directly to a database table in SQL server?
I'm not much of an expert on this, but maybe this page would help: DatabaseInterfaces - Python Wiki <https://wiki.python.org/moin/DatabaseInterfaces>
--Eric
data:image/s3,"s3://crabby-images/efbb2/efbb295d03f662f94c18a6c6b9365d6e78cd26a5" alt=""
Hi Navya, I'm not sure if this is your question, but it is not possible to submit structures from Chimera to the CASTp web server for a new calculation. The only way to run a new calculation is by submitting directly at their website. <http://sts-fw.bioengr.uic.edu/castp/calculation.php> The CASTp fetch from Chimera is only getting pre-calculated results for existing PDB entries from the CASTp database. I hope this clarifies the situation, Elaine ---------- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Apr 17, 2014, at 10:12 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Hi Eric,
Thank you for your reply. I am understanding that the castp python script works only for .pdb files. But it would be really time consuming for me to upload each of the .pdb1 file to castp and then download the .poc and .pocInfo files as I have to process these two files again to extract the atoms list, pocID and MS_volume values.
Could you please elaborate more on how to extract the .poc and .pocInfo values for .pdb1 files from CASTp server using processCastpID() and processCastpFiles(). I need to be able to loop through all .pdb1 files and for each pdb1 file I need to get the atoms list for each pocID along with pocVolumes and to write these values into a .csv file.
Thank you in advance, Navya
On Thu, Apr 17, 2014 at 5:05 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote: On Apr 17, 2014, at 2:11 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Now I am trying to write CASTp information for each of my proteins into a separate file. As you suggested in earlier email, the processCastpID function is in the gui.py file but not in __init__.py file. Hope I am not missing anything here. As per my understanding, this function is fetching the 4 castp files of which I would require only ".poc" and ".pocInfo" files. From these two files I want to write the data of only atoms list, pocID and MS_Volume data into a single file for all 400 proteins in my dataset. Is there a link or any script available for such requirement?
There are some fine points that I missed in my answer yesterday, and the situation is complicated further by your use of .pdb1 files instead of the "normal" entries.
So for one thing, if you are going to use the .pdb1 files, then you are going to have to run CASTp yourself on each and then process the results. In that case you might as well also analyze the .poc and .pocInfo files yourself to determine what pocket each atom belongs to (the next-to-last field in the .poc file) and the volume of that pocket (listed in the .pocInfo file).
The main point I missed in my reply, which may now be moot because of the .pdb1 thing, is that processCastpID() builds its own structure and therefore you would not open the PDB first, you would instead return the structure (along with the cavities list) from that method and make the structure available in chimera with:
chimera.openModels.add([structure])
and then proceed with selecting the right residues, using currentResidues() to list them, etc. I guess if you didn't want to process the .pdb1 CASTp files yourself (after running CASTp on the .pdb1) you could use processCastpFiles() to get the cavity list and structure and proceed as I just outlined. processCastpFiles is in __init__, unlike processCastpID() as you found!
Again, as mentioned in my last email, since my output files will be huge in size, will I be able to write my files directly to a database table in SQL server?
I'm not much of an expert on this, but maybe this page would help: DatabaseInterfaces - Python Wiki
--Eric
data:image/s3,"s3://crabby-images/6afbe/6afbe7577c5a571d04e2d32118581c9ef7f0ad74" alt=""
Hi Navya, As per Elaine's remarks, Chimera has no capability to upload a file to CASTp and process the results. processCastpID() gets the files for standard PDB IDs and uses processCastpFiles() to process them. processCastpFiles() could be called by itself, but you would have had to obtain the files by using the CASTp web server yourself -- obviously not a great solution for you. If CASTp had offered a REST interface to its web server I would have probably implemented built-in upload, but with email notification as the only option it was just way too ugly and difficult. I don't know that there are any good options here, other than possibly using the standard PDB entries if that is compatible with the other aims of your research. --Eric On Apr 21, 2014, at 10:07 AM, Elaine Meng <meng@cgl.ucsf.EDU> wrote:
Hi Navya, I'm not sure if this is your question, but it is not possible to submit structures from Chimera to the CASTp web server for a new calculation. The only way to run a new calculation is by submitting directly at their website. <http://sts-fw.bioengr.uic.edu/castp/calculation.php>
The CASTp fetch from Chimera is only getting pre-calculated results for existing PDB entries from the CASTp database. I hope this clarifies the situation, Elaine ---------- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco
On Apr 17, 2014, at 10:12 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Hi Eric,
Thank you for your reply. I am understanding that the castp python script works only for .pdb files. But it would be really time consuming for me to upload each of the .pdb1 file to castp and then download the .poc and .pocInfo files as I have to process these two files again to extract the atoms list, pocID and MS_volume values.
Could you please elaborate more on how to extract the .poc and .pocInfo values for .pdb1 files from CASTp server using processCastpID() and processCastpFiles(). I need to be able to loop through all .pdb1 files and for each pdb1 file I need to get the atoms list for each pocID along with pocVolumes and to write these values into a .csv file.
Thank you in advance, Navya
On Thu, Apr 17, 2014 at 5:05 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote: On Apr 17, 2014, at 2:11 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Now I am trying to write CASTp information for each of my proteins into a separate file. As you suggested in earlier email, the processCastpID function is in the gui.py file but not in __init__.py file. Hope I am not missing anything here. As per my understanding, this function is fetching the 4 castp files of which I would require only ".poc" and ".pocInfo" files. From these two files I want to write the data of only atoms list, pocID and MS_Volume data into a single file for all 400 proteins in my dataset. Is there a link or any script available for such requirement?
There are some fine points that I missed in my answer yesterday, and the situation is complicated further by your use of .pdb1 files instead of the "normal" entries.
So for one thing, if you are going to use the .pdb1 files, then you are going to have to run CASTp yourself on each and then process the results. In that case you might as well also analyze the .poc and .pocInfo files yourself to determine what pocket each atom belongs to (the next-to-last field in the .poc file) and the volume of that pocket (listed in the .pocInfo file).
The main point I missed in my reply, which may now be moot because of the .pdb1 thing, is that processCastpID() builds its own structure and therefore you would not open the PDB first, you would instead return the structure (along with the cavities list) from that method and make the structure available in chimera with:
chimera.openModels.add([structure])
and then proceed with selecting the right residues, using currentResidues() to list them, etc. I guess if you didn't want to process the .pdb1 CASTp files yourself (after running CASTp on the .pdb1) you could use processCastpFiles() to get the cavity list and structure and proceed as I just outlined. processCastpFiles is in __init__, unlike processCastpID() as you found!
Again, as mentioned in my last email, since my output files will be huge in size, will I be able to write my files directly to a database table in SQL server?
I'm not much of an expert on this, but maybe this page would help: DatabaseInterfaces - Python Wiki
--Eric
_______________________________________________ Chimera-dev mailing list Chimera-dev@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-dev
data:image/s3,"s3://crabby-images/2b734/2b7343c2ffbb21deeaaa9d02df699329561bbf99" alt=""
Hi Elaine and Eric, Thank you for your replies. As per your suggestions, I tried using the processCastpFIles functions, but I was not able to figure out using thee functions. So I downloaded the files manually from CASTp server. However for ASA and Bfactor values, with your help I was able to get those values from chimera for all of my 400 proteins. I really appreciate you for helping me on this. Thanks a lot, Navya On Mon, Apr 21, 2014 at 4:46 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote:
Hi Navya, As per Elaine's remarks, Chimera has no capability to upload a file to CASTp and process the results. processCastpID() gets the files for *standard* PDB IDs and uses processCastpFiles() to process them. processCastpFiles() could be called by itself, but you would have had to obtain the files by using the CASTp web server yourself -- obviously not a great solution for you. If CASTp had offered a REST interface to its web server I would have probably implemented built-in upload, but with email notification as the only option it was just way too ugly and difficult. I don't know that there are any good options here, other than possibly using the standard PDB entries if that is compatible with the other aims of your research.
--Eric
On Apr 21, 2014, at 10:07 AM, Elaine Meng <meng@cgl.ucsf.EDU> wrote:
Hi Navya, I'm not sure if this is your question, but it is not possible to submit structures from Chimera to the CASTp web server for a new calculation. The only way to run a new calculation is by submitting directly at their website. <http://sts-fw.bioengr.uic.edu/castp/calculation.php>
The CASTp fetch from Chimera is only getting pre-calculated results for existing PDB entries from the CASTp database. I hope this clarifies the situation, Elaine ---------- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco
On Apr 17, 2014, at 10:12 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Hi Eric,
Thank you for your reply. I am understanding that the castp python script works only for .pdb files. But it would be really time consuming for me to upload each of the .pdb1 file to castp and then download the .poc and .pocInfo files as I have to process these two files again to extract the atoms list, pocID and MS_volume values.
Could you please elaborate more on how to extract the .poc and .pocInfo values for .pdb1 files from CASTp server using processCastpID() and processCastpFiles(). I need to be able to loop through all .pdb1 files and for each pdb1 file I need to get the atoms list for each pocID along with pocVolumes and to write these values into a .csv file.
Thank you in advance, Navya
On Thu, Apr 17, 2014 at 5:05 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote: On Apr 17, 2014, at 2:11 PM, Navya Shilpa Josyula <njosyu2@uic.edu> wrote:
Now I am trying to write CASTp information for each of my proteins into a separate file. As you suggested in earlier email, the processCastpID function is in the gui.py file but not in __init__.py file. Hope I am not missing anything here. As per my understanding, this function is fetching the 4 castp files of which I would require only ".poc" and ".pocInfo" files. From these two files I want to write the data of only atoms list, pocID and MS_Volume data into a single file for all 400 proteins in my dataset. Is there a link or any script available for such requirement?
There are some fine points that I missed in my answer yesterday, and the situation is complicated further by your use of .pdb1 files instead of the "normal" entries.
So for one thing, if you are going to use the .pdb1 files, then you are going to have to run CASTp yourself on each and then process the results. In that case you might as well also analyze the .poc and .pocInfo files yourself to determine what pocket each atom belongs to (the next-to-last field in the .poc file) and the volume of that pocket (listed in the .pocInfo file).
The main point I missed in my reply, which may now be moot because of the .pdb1 thing, is that processCastpID() builds its own structure and therefore you would not open the PDB first, you would instead return the structure (along with the cavities list) from that method and make the structure available in chimera with:
chimera.openModels.add([structure])
and then proceed with selecting the right residues, using currentResidues() to list them, etc. I guess if you didn't want to process the .pdb1 CASTp files yourself (after running CASTp on the .pdb1) you could use processCastpFiles() to get the cavity list and structure and proceed as I just outlined. processCastpFiles is in __init__, unlike processCastpID() as you found!
Again, as mentioned in my last email, since my output files will be huge in size, will I be able to write my files directly to a database table in SQL server?
I'm not much of an expert on this, but maybe this page would help: DatabaseInterfaces - Python Wiki
--Eric
_______________________________________________ Chimera-dev mailing list Chimera-dev@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-dev
participants (3)
-
Elaine Meng
-
Eric Pettersen
-
Navya Shilpa Josyula