Matchmaker 'restrict to selection' not working as expected.
Hi all, When using Matchmaker, the ‘Further restrict to matching selection’ option does not seem to improve the speed of alignment as much as expected under certain circumstances (when the selection is a subset of a larger chain, as opposed to being an individual chain in a multi chain structure). It seems to still restrict the selection correctly for the final alignment (it is still only aligned on the selected domain), but the “test match” step seems to be conducted on the entire chain, rather than the selection - this makes a big difference to speed for large proteins. For example - I tried aligning two conformations of a protein complex consisting of one chain A ~100 residues, and chain B ~5000 residues. If I select chain A of both structures, and then select ‘Further restrict to matching selection’ for both the reference and the structure to match, the alignment is almost instantaneous - less than 2s. If I perform the same operation, but this time selecting the first 100 residues of chain B in each structure, the alignment takes 3min20s, regardless of whether I leave automatic chain pairing on or select the specific chains to use for alignment manually. It takes only marginally longer - 5 min - if I select the entirety of chain B, a selection that is 50X the size. Incidentally I still get the “test match” message in the status bar even when the selection only contains one chain from each structure, or when I manually select (using the chain pairing options) which chains to align - maybe Chimera is still running test match on the whole chain when it should only do so on the selection (or not at all if the selection only includes one possible match). This seems like a bug? Or am I missing something? Cheers, Oliver.
Hi Oliver, The domain restriction capability was added well after most of MatchMaker had been implemented and I might have done things differently if it was part of the initial design — and I might not have. The way it works is that it doesn’t completely ignore the non-selected part of the chain, it just pretends there is no structure associated with those parts and that the sequence composition (but not length) is unknown. Therefore the sequence-alignment isn’t speeded up by the factor that you would anticipate given the size of the domain vs. the whole sequence. There is some upside though. Let’s say you have a chain of two identical domains connected by a hinge and you have two conformations of that. If you select the second domain in one of the structures and turn on selection restriction, MatchMaker will always match it to the correct domain in the other structure, whereas if you were completely ignoring everything outside the restriction it would be a 50/50 shot as to whether it would match it against the correct domain. The MatchMaker code is pretty intricate, given all its options, and I would be hesitant to change how it works in order to address this. What I would probably do it translate the underlying Needleman-Wunsch sequence alignment code from Python to C++, which would likely make this problem mostly moot. I will undoubtedly do that for ChimeraX. I’m somewhat less likely to do it for Chimera “classic”, though I might back port the code once it’s written for ChimeraX. —Eric Eric Pettersen UCSF Computer Graphics Lab
On Nov 21, 2015, at 5:15 PM, Oliver Clarke <olibclarke@gmail.com> wrote:
Hi all,
When using Matchmaker, the ‘Further restrict to matching selection’ option does not seem to improve the speed of alignment as much as expected under certain circumstances (when the selection is a subset of a larger chain, as opposed to being an individual chain in a multi chain structure).
It seems to still restrict the selection correctly for the final alignment (it is still only aligned on the selected domain), but the “test match” step seems to be conducted on the entire chain, rather than the selection - this makes a big difference to speed for large proteins.
For example - I tried aligning two conformations of a protein complex consisting of one chain A ~100 residues, and chain B ~5000 residues.
If I select chain A of both structures, and then select ‘Further restrict to matching selection’ for both the reference and the structure to match, the alignment is almost instantaneous - less than 2s.
If I perform the same operation, but this time selecting the first 100 residues of chain B in each structure, the alignment takes 3min20s, regardless of whether I leave automatic chain pairing on or select the specific chains to use for alignment manually. It takes only marginally longer - 5 min - if I select the entirety of chain B, a selection that is 50X the size.
Incidentally I still get the “test match” message in the status bar even when the selection only contains one chain from each structure, or when I manually select (using the chain pairing options) which chains to align - maybe Chimera is still running test match on the whole chain when it should only do so on the selection (or not at all if the selection only includes one possible match).
This seems like a bug? Or am I missing something?
Cheers, Oliver.
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
Hi Eric, Thanks for the reply! Not sure I understand the upside - in your scenario, couldn’t you just also select the appropriate domain in the other structure and restrict selection for both target and reference to achieve the same thing? In any case, it would be definitely helpful for edge cases (5000 residue chains…) if this worked differently in ChimeraX. Also, with regards to possible enhancements to matchmaker for ChimeraX, have you considered incorporating an option to force symmetric alignment for symmetric oligomers (e.g. ion channels)? This sort of happens by default if you leave the “iterate by pruning long atom pairs” option off, but it would be nice to take advantage of this option to improve the alignment while still enforcing symmetry. Additionally, and I don’t know if this is practical, when I use the “prune long atom pairs” option, it might be helpful for reporting purposes if Chimera still wrote the RMSD calculated with respect to the original set of atom pairs to the reply log - I am thinking specifically in the case that I select all CAs and then restrict to selection, I would like to be able to say “the R.M.S.D. over all Ca pairs is x Å”, in addition to specifying the RMSD calculated with respect to the pruned set of atom pairs - does that make sense or am I misunderstanding something basic about the way matchmaker works that makes this idea impractical? By the way, do you guys anticipate doing a beta release of ChimeraX at some point? Or you’re doing that internally and planning to just go straight to general release? Cheers, Oliver.
On Nov 23, 2015, at 2:38 PM, Eric Pettersen <pett@cgl.ucsf.edu> wrote:
Hi Oliver, The domain restriction capability was added well after most of MatchMaker had been implemented and I might have done things differently if it was part of the initial design — and I might not have. The way it works is that it doesn’t completely ignore the non-selected part of the chain, it just pretends there is no structure associated with those parts and that the sequence composition (but not length) is unknown. Therefore the sequence-alignment isn’t speeded up by the factor that you would anticipate given the size of the domain vs. the whole sequence. There is some upside though. Let’s say you have a chain of two identical domains connected by a hinge and you have two conformations of that. If you select the second domain in one of the structures and turn on selection restriction, MatchMaker will always match it to the correct domain in the other structure, whereas if you were completely ignoring everything outside the restriction it would be a 50/50 shot as to whether it would match it against the correct domain. The MatchMaker code is pretty intricate, given all its options, and I would be hesitant to change how it works in order to address this. What I would probably do it translate the underlying Needleman-Wunsch sequence alignment code from Python to C++, which would likely make this problem mostly moot. I will undoubtedly do that for ChimeraX. I’m somewhat less likely to do it for Chimera “classic”, though I might back port the code once it’s written for ChimeraX.
—Eric
Eric Pettersen UCSF Computer Graphics Lab
On Nov 21, 2015, at 5:15 PM, Oliver Clarke <olibclarke@gmail.com <mailto:olibclarke@gmail.com>> wrote:
Hi all,
When using Matchmaker, the ‘Further restrict to matching selection’ option does not seem to improve the speed of alignment as much as expected under certain circumstances (when the selection is a subset of a larger chain, as opposed to being an individual chain in a multi chain structure).
It seems to still restrict the selection correctly for the final alignment (it is still only aligned on the selected domain), but the “test match” step seems to be conducted on the entire chain, rather than the selection - this makes a big difference to speed for large proteins.
For example - I tried aligning two conformations of a protein complex consisting of one chain A ~100 residues, and chain B ~5000 residues.
If I select chain A of both structures, and then select ‘Further restrict to matching selection’ for both the reference and the structure to match, the alignment is almost instantaneous - less than 2s.
If I perform the same operation, but this time selecting the first 100 residues of chain B in each structure, the alignment takes 3min20s, regardless of whether I leave automatic chain pairing on or select the specific chains to use for alignment manually. It takes only marginally longer - 5 min - if I select the entirety of chain B, a selection that is 50X the size.
Incidentally I still get the “test match” message in the status bar even when the selection only contains one chain from each structure, or when I manually select (using the chain pairing options) which chains to align - maybe Chimera is still running test match on the whole chain when it should only do so on the selection (or not at all if the selection only includes one possible match).
This seems like a bug? Or am I missing something?
Cheers, Oliver.
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu <mailto:Chimera-users@cgl.ucsf.edu> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
Hi Oliver, I wasn’t saying that it was impossible to achieve the desired result if the selection were treated as if it were “cut out” of the larger sequence, just that it avoided possibly surprising results if you had only selected the domain on one of the structures. Convenience is an upside after all! Nonetheless, the slower matching is a definite downside. But like I said, the first thing I would do is speed up the underlying Needleman-Wunsch matching. It might make the issue moot. I have entered your other two suggestions into our Trac database as RFEs. They are both good ideas. As for the ChimeraX timeline, we are shooting for a beta release in the summer of next year. —Eric Eric Pettersen UCSF Computer Graphics Lab
On Nov 24, 2015, at 2:41 AM, Oliver Clarke <olibclarke@gmail.com> wrote:
Hi Eric,
Thanks for the reply! Not sure I understand the upside - in your scenario, couldn’t you just also select the appropriate domain in the other structure and restrict selection for both target and reference to achieve the same thing? In any case, it would be definitely helpful for edge cases (5000 residue chains…) if this worked differently in ChimeraX.
Also, with regards to possible enhancements to matchmaker for ChimeraX, have you considered incorporating an option to force symmetric alignment for symmetric oligomers (e.g. ion channels)? This sort of happens by default if you leave the “iterate by pruning long atom pairs” option off, but it would be nice to take advantage of this option to improve the alignment while still enforcing symmetry.
Additionally, and I don’t know if this is practical, when I use the “prune long atom pairs” option, it might be helpful for reporting purposes if Chimera still wrote the RMSD calculated with respect to the original set of atom pairs to the reply log - I am thinking specifically in the case that I select all CAs and then restrict to selection, I would like to be able to say “the R.M.S.D. over all Ca pairs is x Å”, in addition to specifying the RMSD calculated with respect to the pruned set of atom pairs - does that make sense or am I misunderstanding something basic about the way matchmaker works that makes this idea impractical?
By the way, do you guys anticipate doing a beta release of ChimeraX at some point? Or you’re doing that internally and planning to just go straight to general release?
Cheers, Oliver.
On Nov 23, 2015, at 2:38 PM, Eric Pettersen <pett@cgl.ucsf.edu <mailto:pett@cgl.ucsf.edu>> wrote:
Hi Oliver, The domain restriction capability was added well after most of MatchMaker had been implemented and I might have done things differently if it was part of the initial design — and I might not have. The way it works is that it doesn’t completely ignore the non-selected part of the chain, it just pretends there is no structure associated with those parts and that the sequence composition (but not length) is unknown. Therefore the sequence-alignment isn’t speeded up by the factor that you would anticipate given the size of the domain vs. the whole sequence. There is some upside though. Let’s say you have a chain of two identical domains connected by a hinge and you have two conformations of that. If you select the second domain in one of the structures and turn on selection restriction, MatchMaker will always match it to the correct domain in the other structure, whereas if you were completely ignoring everything outside the restriction it would be a 50/50 shot as to whether it would match it against the correct domain. The MatchMaker code is pretty intricate, given all its options, and I would be hesitant to change how it works in order to address this. What I would probably do it translate the underlying Needleman-Wunsch sequence alignment code from Python to C++, which would likely make this problem mostly moot. I will undoubtedly do that for ChimeraX. I’m somewhat less likely to do it for Chimera “classic”, though I might back port the code once it’s written for ChimeraX.
—Eric
Eric Pettersen UCSF Computer Graphics Lab
On Nov 21, 2015, at 5:15 PM, Oliver Clarke <olibclarke@gmail.com <mailto:olibclarke@gmail.com>> wrote:
Hi all,
When using Matchmaker, the ‘Further restrict to matching selection’ option does not seem to improve the speed of alignment as much as expected under certain circumstances (when the selection is a subset of a larger chain, as opposed to being an individual chain in a multi chain structure).
It seems to still restrict the selection correctly for the final alignment (it is still only aligned on the selected domain), but the “test match” step seems to be conducted on the entire chain, rather than the selection - this makes a big difference to speed for large proteins.
For example - I tried aligning two conformations of a protein complex consisting of one chain A ~100 residues, and chain B ~5000 residues.
If I select chain A of both structures, and then select ‘Further restrict to matching selection’ for both the reference and the structure to match, the alignment is almost instantaneous - less than 2s.
If I perform the same operation, but this time selecting the first 100 residues of chain B in each structure, the alignment takes 3min20s, regardless of whether I leave automatic chain pairing on or select the specific chains to use for alignment manually. It takes only marginally longer - 5 min - if I select the entirety of chain B, a selection that is 50X the size.
Incidentally I still get the “test match” message in the status bar even when the selection only contains one chain from each structure, or when I manually select (using the chain pairing options) which chains to align - maybe Chimera is still running test match on the whole chain when it should only do so on the selection (or not at all if the selection only includes one possible match).
This seems like a bug? Or am I missing something?
Cheers, Oliver.
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu <mailto:Chimera-users@cgl.ucsf.edu> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users <http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users>
participants (2)
-
Eric Pettersen
-
Oliver Clarke