
Here are some tests I did to investigate speeding up opening large PDB models and coloring large molecules. Tom Optimizing PDB model opening and coloring ----------------------------------------- Opening PDB file optimization ----------------------------- I tested making Chimera open PDB files and display the molecule without creating Python atoms, bonds and residues. The molecule displayed about 4 times faster: Open 1jj2 (~100,000 atoms) current chimera: 26 seconds Open without Python atoms, bonds, residues: 6.5 seconds Open in PyMol: 3 seconds These times are on Linux, P4 2.2 GHz, 1 Gb memory. The Python atoms, bonds, residues are created when they are needed, for example, if the atoms are colored. To avoid a freeze there could be an idle procedure that creates the Python atoms, bonds and residues. This optimization requires the following code changes. 1) The b-factor, occupancy and serial number from the PDB file are saved as Python attributes. These would need to be C++ attributes. 2) WrapPy fetches all attributes and makes a reference to them, eg molecule.atoms, molecule.bonds, .... I commented out those lines that would cause Python atoms, bonds and residues to be made. WrapPy does this to make sure objects aren't deleted prematurely. In the case of atoms, bonds and residues this is not a problem because the C++ object keeps a reference to the Python object so the Python object cannot be deleted until molecule.destroy() is called. 3) chimera/__init__.py replaces bonds to metals with pseudo-bonds. A C++ routine to find metal atoms with bonds is needed to avoid creating all the Python atoms and bonds. 4) Atom, bond and residue created triggers don't happen when the molecule is opened with this optimization. Some Chimera features rely on the triggers happening immediately, for example, surface categorizer wants to recategorize atoms whenever bonds are created or deleted. Like the metals complex code, this would have to be done in C++ to avoid creating all Python atoms and bonds immediately when a file is opened. Also the bond triggers should probably indicate when the bond is created, not just when the Python bond is created. So triggers should fire on C++ bond creation. Currently the TrackChanges code keeps a set of the Python objects instead of C++ objects so this would cause all Python bonds to be created on file open. Changes 1), 2) and 3) are small and easy. Change 4) is a more serious problem that needs more consideration. Perhaps the solution for 4) is to keep the trigger and TrackChanges as is, but make surface categorization an idle task. It creates category names for menu entries that could be added to menus whenever the computation finishes. Memory use for PDB models ------------------------- About 2/3 of memory used is C++ atoms/bonds and 1/3 is Python atoms/bonds. VSZ (Mb) RSS (Mb) Chimera start-up 188 23 1jj2, no Python atoms/bonds/residues 360 (grew 172) 167 (grew 144) 1jj2 with Python atoms/bonds/residues 437 (grew 77) 243 (grew 76) These Redhat 8.0 memory use numbers from unix command ps suggest most of the memory is in the C++ data structures, rather than the Python atoms/bonds. Atom coloring optimization -------------------------- Here's a summary of some tests I did coloring atoms and bonds of 2btv (~50,000 atoms) a single color. More details are in a chimera-dev mailing list posting from June 2003 with subject coloring speed. Color atoms/bonds using Actions menu: 1.70 sec Breakdown: Building display list: .88 sec Python color setting: .66 sec Clearing track changes: .11 sec I tested alternate version of the Python color setting code: Minimal for loop for a in m.atoms: a.color = color for b in m.bonds: b.color = color took .52 seconds. The Actions menu code is a little slower because the color assignment is done with a function call. Using setattr() is slower. Using map() is slower. Using C++ set_atom_color() function for a in m.atoms: set_atom_color(a, color) for b in m.bonds: set_bond_color(b, color) took .27 seconds. So attribute setting done by WrapPy may not be as fast as it could be. Using C++ routines color_atoms(atoms, color) color_bonds(bonds, color) where atoms and bonds are lists took .10 second. I believe this time is all taken adding atoms/bonds to the TrackChanges stl::map<PyObject*>. I did similar test for the Python part of "color by element". Coloring atoms by element type: 3.5 sec Breakdown: Build display list and TrackChanges: 1 sec Python code to set colors: 2.44 sec Here are times for various versions of the Python color setting code: Actions menu color by element: 2.44 sec Optimized python coloring: 1.11 sec C++ routine: .12 sec The optimized Python uses a Python dictionary mapping element numbers to Chimera color objects. The current Actions menu version is slower because it maps element numbers to color names, and fetches the color object for each atom. The C++ routine color_atoms_by_element(atom_list) is a simple translation of the Python version. If the display list and track changes overhead is reduced by a factor of 10 (which I believe is not too hard), then moving some Python loops to C++ will make a significant improvement in response time. Besides coloring this could help with atom/bond show/hide, representation changes, large selections like chains or selecting all atoms.
participants (1)
-
Thomas Goddard