Converting SDF files to Mol2 Format
data:image/s3,"s3://crabby-images/8af8d/8af8d736fb377af0e78c61d6ec173122dba05ff5" alt=""
Hi Elaine, I need to convert a large number of PubChem SDF files into Mol2 format. When I convert an SDF file into Mol2 format, I add partial charges and run energy minimisation on the molecule, as this is necessary for the program I am inputting the files into. So far, I have only used Chimera in this method to convert individual files. Is there a way to convert a large number of SDF files to Mol2 format at once, and also preserve the filenames? Thanks in advance, Nancy
data:image/s3,"s3://crabby-images/6afbe/6afbe7577c5a571d04e2d32118581c9ef7f0ad74" alt=""
On Mar 26, 2010, at 8:15 AM, Nancy wrote:
Hi Elaine,
I need to convert a large number of PubChem SDF files into Mol2 format. When I convert an SDF file into Mol2 format, I add partial charges and run energy minimisation on the molecule, as this is necessary for the program I am inputting the files into. So far, I have only used Chimera in this method to convert individual files. Is there a way to convert a large number of SDF files to Mol2 format at once, and also preserve the filenames?
Thanks in advance,
Nancy
Hi Nancy, There are several possibilities here. Probably the most straightforward is to use either the 1.4 or 1.4.1 release in conjunction with a script to process your files. Let's assume your files are all in one directory and have names of the form moleculeName.sdf . The following csh-style script would accomplish the task: #!/bin/tcsh -f foreach sdf (*.sdf) echo $sdf chimera --nogui $sdf processSDF.cmd mv output.mol2 $sdf:r.mol2 end and where the contents of "processSDF.cmd" is just: minimize write format mol2 0 output.mol2 You would want to put the csh script in a file (e.g. processSDF.csh) which you would want to make executable (chmod +x processSDF.csh). You would want to run the script in the directory with the SDF files, where the processSDF.cmd file would also be located. If you are familiar with shell scripting you can of course add paths to the various script names to change these requirements. The above, though simple, may be a little slow due to starting one instance of Chimera per SDF file. You can get Chimera to run through your directory of SDF files, but you have to use a Python script instead of a Chimera command script. Like this: chimera --nogui processSDF.py with the contents of processSDF.py being: from os import chdir, listdir from chimera import runCommand chdir("/path/to/SDF-file-dir") # change to the SDF file directory for sdf in listdir("."): if not sdf.endswith(".sdf"): continue runCommand("open " + sdf) runCommand("minimize") runCommand("write format mol2 0 " + sdf[:-4] + ".mol2") runCommand("close all") You could do all the above in the 1.5 release but the issue is that the 1.5 branch now uses the 1.3 version of AmberTools/Antechamber which relies on the program sqm to compute charges rather than mopac. While the charges computed by sqm are theoretically more precise than mopac, they take considerably longer to compute. We intend to add options for using less strict charge-convergence criteria in order to speed things up, but that work still is yet to be done. So whereas moieties involving ~35 or less atoms (including hydrogens) don't take excessive amounts of time (30 seconds or less), the compute time scales with the cube of the number of atoms(!) so a system of 72 atoms took more than 13 minutes and a system of 84 atoms took more than 21 minutes. I've also found that sqm fails to converge sometimes. The only structure I've had this happen with is ATP (at -4 charge), but it happened for a variety of conformers of ATP. A final possibility is to use the PubChem3D (PubChem3D release note) files and convert them directly to Mol2 files. This would require the 1.5 release (the 1.4 series doesn't read the charges in the SDF file) and assumes that the included MMFF charges are sufficient for your needs and that the conformer provided is good enough for your purposes without minimization. Using minimization would defeat the purpose here since Chimera's minimization needs to know GAFF atom types which requires non-standard residues to be processed by Antechamber which will add charges in addition to assigning types. --Eric Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu
data:image/s3,"s3://crabby-images/efbb2/efbb295d03f662f94c18a6c6b9365d6e78cd26a5" alt=""
On Mar 29, 2010, at 12:48 PM, Eric Pettersen wrote:
A final possibility is to use the PubChem3D (<http://pubchem.ncbi.nlm.nih.gov/release3d.html
) files and convert them directly to Mol2 files. This would require the 1.5 release (the 1.4 series doesn't read the charges in the SDF file) and assumes that the included MMFF charges are sufficient for your needs and that the conformer provided is good enough for your purposes without minimization.
Just a short note regarding PubChem3D... as discussed previously, <http://www.cgl.ucsf.edu/pipermail/chimera-users/2010-February/004766.html
you may not want to use their charges because functional groups such as amines and acids are in their neutral forms rather then the charged ones more likely at physiological pH. Elaine
participants (3)
-
Elaine Meng
-
Eric Pettersen
-
Nancy