Parameters choice for match->align and quality assessment

Dear Chimera developers, great thanks to you for possibility to work with your very useful program. I am new in structural alignment studies and have some questions about parameters in match->align. My set cosists of few closely related proteins with same function and conservative spatial structure, but with some divergence in aminoacid sequences except of conservative functional residues. First, what "Residue aligned in column if within cutoff of" influences on and what preferable choice may be in my case. Second, "Superimpose full column" - how can i decide what to select for my set or what play can help me. Third, how can i evaluate quality or correctness of obtained alignment (may be, common procedure). I am sorry for long letter and thanks for your attention to my problem. D. Yegorov.

Dear D, First let me clarify which tools in Chimera you might be using: (A) MatchMaker will superimpose the structures. Also, if you chose the option to "After superposition, compute structure-based multiple sequence alignment" it will call the other tool, Match -> Align. <http://www.cgl.ucsf.edu/chimera/docs/ContributedSoftware/matchmaker/matchmak...> There is also a command version of matchmaker: <http://www.cgl.ucsf.edu/chimera/docs/UsersGuide/midas/mmaker.html> (B) Match -> Align will not superimpose the structures when they are far apart, but it will create a sequence alignment consistent with the 3D superposition that you give it. It can also improve the input 3D superposition by an iterative process, but you need to start with the proteins at least roughly superimposed already. You could make the starting superposition any way you want, but the convenient way in Chimera is with MatchMaker. <http://www.cgl.ucsf.edu/chimera/docs/ContributedSoftware/matchalign/matchali...> It sounds like you might be using both of those tools. If your proteins are closely related, they are probably "easy" to superimpose, and you could just use the Matchmaker default parameter values to start. We have chosen these defaults to work in a broad range of situations. It is only the "hard" cases that might require playing around with the parameters. You would only use Match -> Align if (A) you wanted multiple sequence alignment that goes with the 3D superposition, or (B) you wanted to improve the 3D superposition using the multiple sequence alignment, or (C) you wanted to calculate a score for your structural superposition. If purpose A, the proper cutoff (your question #1) depends on what you want this sequence alignment to mean. Match -> Align will obey your specifications. If you want only the residues with CA-CA distance <4 angstroms to be in the same column, then you would use cutoff 4 angstroms. If there are floppy loops that are farther apart but you think they are evolutionarily equivalent and should be allowed to be aligned in the sequence alignment, you would want to use a larger cutoff. Otherwise (purposes B and C) you could just try using the default settings everywhere (your questions #1 and #2). However, this connects to your question #3, how you do evaluate the results so you know which parameter values are best? Often, people just evaluate the superposition visually. There is no single measure that everybody agrees is the right way to evaluate the superposition. In fact, dozens and maybe even hundreds of different measures have been published! They usually combine RMSD and number of aligned positions in some way. However, we have recently added the calculation of two different measures by Match -> Align, SDM and Q-score. These are described in more detail along with literature references in the Match -> Align documentation: <http://www.cgl.ucsf.edu/chimera/docs/ContributedSoftware/matchalign/matchali...> Those scores will be reported in the Reply Log (under Favorites in the menu), but you need to have a fairly new version of Chimera (version 1.5, so a daily build). If you are concerned that your 3D superposition results might not be the optimum, you can try adjusting parameters and seeing if these scores improve. However, it sounds like your proteins are relatively easy to superimpose, and my guess is that a wide range of settings will give you about the same results. I just wanted you to be fully informed so that in your hands, the tools will be powerful and useful. Finally, this Chimera tutorial includes use of MatchMaker and Match -> Align: <http://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/alignments.html> I hope this helps, Elaine ---------- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Jul 5, 2010, at 9:41 AM, compchem compchems wrote:
Dear Chimera developers, great thanks to you for possibility to work with your very useful program.
I am new in structural alignment studies and have some questions about parameters in match->align.
My set cosists of few closely related proteins with same function and conservative spatial structure, but with some divergence in aminoacid sequences except of conservative functional residues.
First, what "Residue aligned in column if within cutoff of" influences on and what preferable choice may be in my case.
Second, "Superimpose full column" - how can i decide what to select for my set or what play can help me.
Third, how can i evaluate quality or correctness of obtained alignment (may be, common procedure).
I am sorry for long letter and thanks for your attention to my problem.
D. Yegorov.

Dear Elaine, thank you very much for your detailed response on my somewhat careless apply. It's your very kindly and useful help for me. I just thought about letter size and as it turned out missed the important issues. I am sorry for. First, I use two-stage Chimera procedure (MatchMaker and then Match->align) for multiple sequence alignment with the spatial superposition for detailed analysis of variable ligand binding (or, in some cases, non-binding) site in my enzymes (with full sequnces lengths available in 3D structure in analysis, not in sites, of course). Second, I think I understand MatchMaker procedure, parameters and detailed analysis on these subject in your and co-wokers article "Tools for integrated sequence-structure analysis with UCSF Chimera". Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339. PMID: 16836757. Especially about robustness of algorithm for broad range of parameters and even for search scheme (local SW or global NW). This article is wery informative for me. So, I used here default parameters and omit this step in my first appeal. Third, in my first original question I mean not cut-off value, while you answer drew my attention on rationale on this subject (thank!), but on how choice between "at least one other" and "all other" in "Residue aligned in column if within cutoff of" parameter may influence on final result with my set (how robust algorithm for such a choice). Fourth, about my second original question in the light of the goal described above. How choice of "Superimpose full column" between "entire alignment" and "in stretches of at least N the consequtive columns" may change results for my set. Or, perhaps, in my case default parameters will be ok? Fifth, great thanks for material about quality assessment. With my best wishes to you, Dmitry Yegorov.

Dear Dmitry, You're welcome. In Match->Align, "Residue aligned in column if within cutoff of (a) at least one other, or (b) all others", both choices are robust, but it depends what you want. Imagine that you have three residues X,Y,Z from three different structures. It is possible that the CA-CA distance of residues X,Y is less than the cutoff, and of Y,Z is less than the cutoff, but of X,Z is greater than the cutoff. If you choose "at least one other" then X,Y,Z can all three be placed in one column of the output sequence alignment. If you choose the more stringent option, "all others," then X,Y,Z cannot all be in the same column of the sequence alignment, and the alignment will be longer and have more gaps. I usually choose "at least one other" because I prefer a less gappy alignment, but it also depends on what your cutoff is and what you want the columns of the alignment to signify. This setting will also affect fit iteration because only the alignment columns with residues from all structures (no gaps) will be used in the next cycle of fitting. If you really only want fitting where each structure is within the cutoff of every other structure, you would choose "all others". The reason for the "consecutive columns" setting in iteration is that there might be some places where loops from different structures cross or come close to each other, but are not really structurally equivalent. If you use only columns where adjacent columns are also full (do not contain gaps) it will exclude those isolated positions where loops just happen to cross but are not really structurally equivalent. Again I would recommend the default, fitting using only the full columns where there are at least three full columns together. If you think the results could be improved, then you could try changing parameters and seeing if the scores improve, but definitely try the default values first. From your description, there is no reason to expect you would need different values. I hope this helps, and best wishes to you too! Elaine ---------- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Jul 6, 2010, at 11:19 AM, compchem compchems wrote:
Dear Elaine, thank you very much for your detailed response on my somewhat careless apply. It's your very kindly and useful help for me. I just thought about letter size and as it turned out missed the important issues. I am sorry for.
First, I use two-stage Chimera procedure (MatchMaker and then Match->align) for multiple sequence alignment with the spatial superposition for detailed analysis of variable ligand binding (or, in some cases, non-binding) site in my enzymes (with full sequnces lengths available in 3D structure in analysis, not in sites, of course).
Second, I think I understand MatchMaker procedure, parameters and detailed analysis on these subject in your and co-wokers article "Tools for integrated sequence-structure analysis with UCSF Chimera". Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339. PMID: 16836757. Especially about robustness of algorithm for broad range of parameters and even for search scheme (local SW or global NW). This article is wery informative for me. So, I used here default parameters and omit this step in my first appeal.
Third, in my first original question I mean not cut-off value, while you answer drew my attention on rationale on this subject (thank!), but on how choice between "at least one other" and "all other" in "Residue aligned in column if within cutoff of" parameter may influence on final result with my set (how robust algorithm for such a choice).
Fourth, about my second original question in the light of the goal described above. How choice of "Superimpose full column" between "entire alignment" and "in stretches of at least N the consequtive columns" may change results for my set. Or, perhaps, in my case default parameters will be ok?
Fifth, great thanks for material about quality assessment.
With my best wishes to you, Dmitry Yegorov.
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://www.cgl.ucsf.edu/mailman/listinfo/chimera-users
participants (2)
-
compchem compchems
-
Elaine Meng