Possible to read a trajectory "lazily" or in the background?

Am I right in thinking that when reading a trajectory (e.g. .dcd file), all requested frames from that trajectory are first read in, after which the "coordset" command can be used to work with the trajectory? I am reading frames from an RMF file (using the RMF plugin, the "rmf readtraj" command) which is quite slow, perhaps a second per frame - because RMF stores local coordinates which must be transformed to make the global coordinates ChimeraX is using. So if the user asks to read 100 frames, ChimeraX becomes unresponsive for ~100s, which is not a great user experience. One possible solution would be to read the 100 frames "lazily", e.g. just create 100 copies of the active coordset, then when the user writes "coordset #1 N" for the first time or scrolls to frame N with "coordset slider", read in frame N from the RMF file and overwrite coordset N. That way there would be a 1s delay only the first time that frame is shown. Is this possible? I see there is an "active coordset changed" trigger - perhaps I could attach to that? Alternatively, is there an API for tools to run long-running processes in the background so that the user can work on other things in the meantime, or cancel it if it takes too long? Ben -- ben@salilab.org https://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Hi Ben, I don't understand why the coordinate transform is slow. Are you doing it with a Python loop, one coordinate at a time? ChimeraX can do coordinate transforms on numpy arrays and it should transform easily 100 million coordinates per second. Here's an example import numpy xyz = numpy.zeros((100000, 3), numpy.float32) from chimerax.geometry import Place tf = Place(((1,0,0,3),(0,1,0,-5),(0,0,1,1.8))) # Translation by 3,-5,1.8 and no rotation. new_xyz = tf.transform_points(xyz) new_xyz[:5] array([[ 3. , -5. , 1.8], [ 3. , -5. , 1.8], [ 3. , -5. , 1.8], [ 3. , -5. , 1.8], [ 3. , -5. , 1.8]], dtype=float32) The transformation is a 3 by 4 matrix with the first 3 columns being the rotation matrix and the fourth column being the translation (done after rotation). Tom
On Mar 13, 2023, at 12:14 PM, Ben Webb via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Am I right in thinking that when reading a trajectory (e.g. .dcd file), all requested frames from that trajectory are first read in, after which the "coordset" command can be used to work with the trajectory? I am reading frames from an RMF file (using the RMF plugin, the "rmf readtraj" command) which is quite slow, perhaps a second per frame - because RMF stores local coordinates which must be transformed to make the global coordinates ChimeraX is using. So if the user asks to read 100 frames, ChimeraX becomes unresponsive for ~100s, which is not a great user experience.
One possible solution would be to read the 100 frames "lazily", e.g. just create 100 copies of the active coordset, then when the user writes "coordset #1 N" for the first time or scrolls to frame N with "coordset slider", read in frame N from the RMF file and overwrite coordset N. That way there would be a 1s delay only the first time that frame is shown. Is this possible? I see there is an "active coordset changed" trigger - perhaps I could attach to that?
Alternatively, is there an API for tools to run long-running processes in the background so that the user can work on other things in the meantime, or cancel it if it takes too long?
Ben -- ben@salilab.org https://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle _______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users

On 3/13/23 1:00 PM, Tom Goddard wrote:
I don't understand why the coordinate transform is slow. Are you doing it with a Python loop, one coordinate at a time?
Short answer: yes. Longer answer: RMF stores frames in a tree. A given node in that tree might have local coordinates. It might also have a transformation. So to get the global coordinates of a particular node we may have to apply multiple transformations, from multiple parent nodes. Currently in ChimeraX (and also in Chimera) we do a tree traversal in Python and apply the coordinate transform(s) one node at a time. We might be able to speed that up (e.g. cache lists of nodes that all get the same transformation, or make tree traversal in RMF faster, because that's pretty slow by itself) and that's something I'm looking at in the meantime. Ben -- ben@salilab.org https://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Hi Ben,
On Mar 13, 2023, at 12:14 PM, Ben Webb via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Am I right in thinking that when reading a trajectory (e.g. .dcd file), all requested frames from that trajectory are first read in, after which the "coordset" command can be used to work with the trajectory?
Yes.
I am reading frames from an RMF file (using the RMF plugin, the "rmf readtraj" command) which is quite slow, perhaps a second per frame - because RMF stores local coordinates which must be transformed to make the global coordinates ChimeraX is using. So if the user asks to read 100 frames, ChimeraX becomes unresponsive for ~100s, which is not a great user experience.
One possible solution would be to read the 100 frames "lazily", e.g. just create 100 copies of the active coordset, then when the user writes "coordset #1 N" for the first time or scrolls to frame N with "coordset slider", read in frame N from the RMF file and overwrite coordset N. That way there would be a 1s delay only the first time that frame is shown. Is this possible? I see there is an "active coordset changed" trigger - perhaps I could attach to that?
You might be able to get away with responding to the "active coordset changed” trigger. My cursory inspection of the code didn’t reveal any problems with that approach. Nonetheless, if you implement something and encounter problems, we might have to add an “coordset about to change” trigger — which would not be hard to do if necessary.
Alternatively, is there an API for tools to run long-running processes in the background so that the user can work on other things in the meantime, or cancel it if it takes too long?
We have some job/task management infrastructure, though no GUI control yet. Nonetheless I don’t think this is the way to go because atomic data structures are not thread safe. While you could partially protect yourself by running the final coordinate set update through session.ui.thread_safe, you’d still be in very hot water if the user closed the structure while these threads were executing, as just one problematic scenario. --Eric Eric Pettersen UCSF Computer Graphics Lab
Ben -- ben@salilab.org https://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle _______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
participants (3)
-
Ben Webb
-
Eric Pettersen
-
Tom Goddard