Dear Chimera Users, How can I automate/import the Ensemble Cluster (Structure Comparison, MD/Ensemble Analysis; cluster members of a conformational ensemble, determine cluster representatives) module into a python script? Thanks, Katryna
Dear Chimera Users, How can I automate/import the Ensemble Cluster (Structure Comparison, MD/Ensemble Analysis; cluster members of a conformational ensemble, determine cluster representatives) module into a python script? Thanks, Katryna
Unfortunately, the nmr clustering code is closely tied to its graphical user interface, so there is no simple calls into Chimera for scripting. However, the function that does all the work is relatively simple, so I've abstracted it into its own function which takes a list of models (all with exactly the same set of atoms) and returns a list of pairs. The first element of the pair is a cluster representative model; the second element is a list of all models in the cluster, including the representative. I've included the code as an attachment. There is a small example at the end of the code on how to use the "cluster" function. Please let me know if this is what you were looking for. Thanks. Conrad Katryna Cisek wrote:
Dear Chimera Users,
How can I automate/import the Ensemble Cluster (Structure Comparison, MD/Ensemble Analysis; cluster members of a conformational ensemble, determine cluster representatives) module into a python script?
Thanks, Katryna _______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://www.cgl.ucsf.edu/mailman/listinfo/chimera-users
def clusterModels(modelList): from EnsembleMatch.distmat import DistanceMatrix from chimera import match # First we need to construct a full distance matrix, including # possibly identical elements fulldm = DistanceMatrix(len(modelList)) sameAs = {} atoms = [ match._coordArray(m.sortedAtoms()) for m in modelList ] for i in range(len(modelList)): mi = modelList[i] if mi in sameAs: continue ai = atoms[i] for j in range(i + 1, len(modelList)): aj = atoms[j] m = match.Match(ai, aj) if m.rms <= 0: # Detect identical elements and track them mj = modelList[j] sameAs[mj] = mi fulldm.set(i, j, m.rms) # Now we reduce the distance matrix by removing identical elements if not sameAs: # No identical elements, just use the whole thing dm = fulldm models = modelList else: models = [] indexMap = [] for i, mi in enumerate(modelList): if mi in sameAs: # Skip identical element continue models.append(mi) indexMap.append(i) dm = DistanceMatrix(len(models)) for i in range(len(models)): im = indexMap[i] for j in range(i + 1, len(models)): jm = indexMap[j] dm.set(i, j, fulldm.get(im, jm)) # Next we run the clustering algorithm from EnsembleMatch.nmrclust import NMRClust nmrc = NMRClust(dm) # Next reformat the results into something useful. # "clusterInfo" is a list of whose elements are 2-tuples of # (cluster representative model, list of models in cluster) clusterInfo = [] for cid, c in enumerate(nmrc.clusters): members = c.members() mList = [] for member in members: m = models[member] m.clusterId = cid m.clusterRep = 0 mList.append(m) rep = nmrc.representative(c) m = models[rep] m.clusterRep = len(members) clusterInfo.append((m, mList)) for mj, mi in sameAs.iteritems(): mj.clusterId = mi.clusterId mj.clusterRep = 0 rep, members = clusterInfo[mi.clusterId] rep.clusterRep += 1 members.append(mj) return clusterInfo # Here's an example on how you can use the function above. # Run this script with: # chimera --nogui --silent cluster.py # where cluster.py is the name of this file if __name__ == "chimeraOpenSandbox": import chimera # Use following line to fetch from RCSB models = chimera.openModels.open("1n9u", type="PDBID") # Use following line for your own data file # models = chimera.openModels.open("1N9U.pdb", type="PDB") ci = clusterModels(models) print len(models), "models ->", len(ci), "clusters" for rep, members in ci: print rep.oslIdent(), "(%d member)" % len(members) for m in members: print '\t', m.oslIdent()
Hello Conrad, Thank you so much for your email and your code. It worked well for 20 structures, but when testing on 400 structures it caused an automatic crash and reboot of the machine. I was actually looking for a way to import the Ensemble Cluster definition into a python script and then run the script from the chimera GUI. It's been working well for MatchMaker: ---------------------------------------------------------------------- #!/usr/local/MGLTools/i86Linux2/bin/python import sys import fileinput import os import string import shutil import glob import chimera from chimera import runCommand import Midas from MatchMaker import match, CP_BEST, defaultMatrix, \ defaultAlignAlgorithm, defaultGapOpen, \ defaultGapExtend, defaultIterateCutoff temp1 = 'template1.pdb' temp2 = 'template2.pdb' curdir = os.getcwd() pdblist = glob.glob('*.T*pdb') for x in pdblist: mol1 = chimera.openModels.open(temp1, type='PDB')[0] mol2 = chimera.openModels.open(temp2, type='PDB')[0] mol3 = chimera.openModels.open(x, type='PDB')[0] mol4 = chimera.openModels.open(x, type='PDB')[0] match(CP_BEST, (mol1, [mol3]), defaultMatrix, defaultAlignAlgorithm, defaultGapOpen, defaultGapExtend, iterate=defaultIterateCutoff, showAlignment=True)[0] match(CP_BEST, (mol2, [mol4]), defaultMatrix, defaultAlignAlgorithm, defaultGapOpen, defaultGapExtend, iterate=defaultIterateCutoff, showAlignment=True)[0] Midas.close(0) Midas.close(1) -------------------------------------------------------------------------------------------- Thanks, Katryna On Wed, Oct 15, 2008 at 3:00 PM, Conrad Huang <conrad@cgl.ucsf.edu> wrote:
Unfortunately, the nmr clustering code is closely tied to its graphical user interface, so there is no simple calls into Chimera for scripting. However, the function that does all the work is relatively simple, so I've abstracted it into its own function which takes a list of models (all with exactly the same set of atoms) and returns a list of pairs. The first element of the pair is a cluster representative model; the second element is a list of all models in the cluster, including the representative. I've included the code as an attachment. There is a small example at the end of the code on how to use the "cluster" function. Please let me know if this is what you were looking for. Thanks.
Conrad
Katryna Cisek wrote:
Dear Chimera Users,
How can I automate/import the Ensemble Cluster (Structure Comparison, MD/Ensemble Analysis; cluster members of a conformational ensemble, determine cluster representatives) module into a python script?
Thanks, Katryna _______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://www.cgl.ucsf.edu/mailman/listinfo/chimera-users
def clusterModels(modelList): from EnsembleMatch.distmat import DistanceMatrix from chimera import match # First we need to construct a full distance matrix, including # possibly identical elements fulldm = DistanceMatrix(len(modelList)) sameAs = {} atoms = [ match._coordArray(m.sortedAtoms()) for m in modelList ] for i in range(len(modelList)): mi = modelList[i] if mi in sameAs: continue ai = atoms[i] for j in range(i + 1, len(modelList)): aj = atoms[j] m = match.Match(ai, aj) if m.rms <= 0: # Detect identical elements and track them mj = modelList[j] sameAs[mj] = mi fulldm.set(i, j, m.rms) # Now we reduce the distance matrix by removing identical elements if not sameAs: # No identical elements, just use the whole thing dm = fulldm models = modelList else: models = [] indexMap = [] for i, mi in enumerate(modelList): if mi in sameAs: # Skip identical element continue models.append(mi) indexMap.append(i) dm = DistanceMatrix(len(models)) for i in range(len(models)): im = indexMap[i] for j in range(i + 1, len(models)): jm = indexMap[j] dm.set(i, j, fulldm.get(im, jm)) # Next we run the clustering algorithm from EnsembleMatch.nmrclust import NMRClust nmrc = NMRClust(dm) # Next reformat the results into something useful. # "clusterInfo" is a list of whose elements are 2-tuples of # (cluster representative model, list of models in cluster) clusterInfo = [] for cid, c in enumerate(nmrc.clusters): members = c.members() mList = [] for member in members: m = models[member] m.clusterId = cid m.clusterRep = 0 mList.append(m) rep = nmrc.representative(c) m = models[rep] m.clusterRep = len(members) clusterInfo.append((m, mList)) for mj, mi in sameAs.iteritems(): mj.clusterId = mi.clusterId mj.clusterRep = 0 rep, members = clusterInfo[mi.clusterId] rep.clusterRep += 1 members.append(mj) return clusterInfo
# Here's an example on how you can use the function above. # Run this script with: # chimera --nogui --silent cluster.py # where cluster.py is the name of this file if __name__ == "chimeraOpenSandbox": import chimera # Use following line to fetch from RCSB models = chimera.openModels.open("1n9u", type="PDBID") # Use following line for your own data file # models = chimera.openModels.open("1N9U.pdb", type="PDB") ci = clusterModels(models) print len(models), "models ->", len(ci), "clusters" for rep, members in ci: print rep.oslIdent(), "(%d member)" % len(members) for m in members: print '\t', m.oslIdent()
Hello Conrad, Thank you so much for your email and your code. It worked well for 20 structures, but when testing on 400 structures it caused an automatic crash and reboot of the machine. I was actually looking for a way to import the Ensemble Cluster definition into a python script and then run the script from the chimera GUI. It's been working well for MatchMaker: ---------------------------------------------------------------------- #!/usr/local/MGLTools/i86Linux2/bin/python import sys import fileinput import os import string import shutil import glob import chimera from chimera import runCommand import Midas from MatchMaker import match, CP_BEST, defaultMatrix, \ defaultAlignAlgorithm, defaultGapOpen, \ defaultGapExtend, defaultIterateCutoff temp1 = 'template1.pdb' temp2 = 'template2.pdb' curdir = os.getcwd() pdblist = glob.glob('*.T*pdb') for x in pdblist: mol1 = chimera.openModels.open(temp1, type='PDB')[0] mol2 = chimera.openModels.open(temp2, type='PDB')[0] mol3 = chimera.openModels.open(x, type='PDB')[0] mol4 = chimera.openModels.open(x, type='PDB')[0] match(CP_BEST, (mol1, [mol3]), defaultMatrix, defaultAlignAlgorithm, defaultGapOpen, defaultGapExtend, iterate=defaultIterateCutoff, showAlignment=True)[0] match(CP_BEST, (mol2, [mol4]), defaultMatrix, defaultAlignAlgorithm, defaultGapOpen, defaultGapExtend, iterate=defaultIterateCutoff, showAlignment=True)[0] Midas.close(0) Midas.close(1) -------------------------------------------------------------------------------------------- Thanks, Katryna
On Wed, Oct 15, 2008 at 3:00 PM, Conrad Huang <conrad@cgl.ucsf.edu> wrote:
Unfortunately, the nmr clustering code is closely tied to its graphical user interface, so there is no simple calls into Chimera for scripting. However, the function that does all the work is relatively simple, so I've abstracted it into its own function which takes a list of models (all with exactly the same set of atoms) and returns a list of pairs. The first element of the pair is a cluster representative model; the second element is a list of all models in the cluster, including the representative. I've included the code as an attachment. There is a small example at the end of the code on how to use the "cluster" function. Please let me know if this is what you were looking for. Thanks.
Conrad
Katryna Cisek wrote:
Dear Chimera Users,
How can I automate/import the Ensemble Cluster (Structure Comparison, MD/Ensemble Analysis; cluster members of a conformational ensemble, determine cluster representatives) module into a python script?
Thanks, Katryna _______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://www.cgl.ucsf.edu/mailman/listinfo/chimera-users
def clusterModels(modelList): from EnsembleMatch.distmat import DistanceMatrix from chimera import match # First we need to construct a full distance matrix, including # possibly identical elements fulldm = DistanceMatrix(len(modelList)) sameAs = {} atoms = [ match._coordArray(m.sortedAtoms()) for m in modelList ] for i in range(len(modelList)): mi = modelList[i] if mi in sameAs: continue ai = atoms[i] for j in range(i + 1, len(modelList)): aj = atoms[j] m = match.Match(ai, aj) if m.rms <= 0: # Detect identical elements and track them mj = modelList[j] sameAs[mj] = mi fulldm.set(i, j, m.rms) # Now we reduce the distance matrix by removing identical elements if not sameAs: # No identical elements, just use the whole thing dm = fulldm models = modelList else: models = [] indexMap = [] for i, mi in enumerate(modelList): if mi in sameAs: # Skip identical element continue models.append(mi) indexMap.append(i) dm = DistanceMatrix(len(models)) for i in range(len(models)): im = indexMap[i] for j in range(i + 1, len(models)): jm = indexMap[j] dm.set(i, j, fulldm.get(im, jm)) # Next we run the clustering algorithm from EnsembleMatch.nmrclust import NMRClust nmrc = NMRClust(dm) # Next reformat the results into something useful. # "clusterInfo" is a list of whose elements are 2-tuples of # (cluster representative model, list of models in cluster) clusterInfo = [] for cid, c in enumerate(nmrc.clusters): members = c.members() mList = [] for member in members: m = models[member] m.clusterId = cid m.clusterRep = 0 mList.append(m) rep = nmrc.representative(c) m = models[rep] m.clusterRep = len(members) clusterInfo.append((m, mList)) for mj, mi in sameAs.iteritems(): mj.clusterId = mi.clusterId mj.clusterRep = 0 rep, members = clusterInfo[mi.clusterId] rep.clusterRep += 1 members.append(mj) return clusterInfo
# Here's an example on how you can use the function above. # Run this script with: # chimera --nogui --silent cluster.py # where cluster.py is the name of this file if __name__ == "chimeraOpenSandbox": import chimera # Use following line to fetch from RCSB models = chimera.openModels.open("1n9u", type="PDBID") # Use following line for your own data file # models = chimera.openModels.open("1N9U.pdb", type="PDB") ci = clusterModels(models) print len(models), "models ->", len(ci), "clusters" for rep, members in ci: print rep.oslIdent(), "(%d member)" % len(members) for m in members: print '\t', m.oslIdent()
participants (3)
-
Conrad Huang
-
Katryna Cisek
-
Katryna Cisek