Re: [Chimera-users] Analysis of the multiple alignments + structures

2 Sep 2014

      Hi Elaine,

thanks alot for such detailed suggestions!

In my case the situation might be much simplier because I'd like to i) make
conclusions about structure-function relationships in the set of olfactory
receptors proteins which are
-) all rhodopsin-like (A) GPCRs sharing 40-70% sequence identify  -) have
the same functional-relevant motifs  -) interact only with one Golf type of
transdusser  -) !! but !! could interact with big range of odorants
(ligands). Taking into account that TM bundle of those proteins has big
conservation in comparison to its extracellular (expecially loop) part I
should to focus on the extracellular (less conservative) part which are
mainly involved in the ligand binding (! because I'd like to predict
mutations which affect ligand binding first of all). BTW taking into
account very hight level of the redundancy in GPCRs sequence (within the
rhodopsin-like subfamily less that 40% of homology on sequence) but hight
conservation of the structure of all there proteins (rmsd <4 A) might the
conclusion about high tollerance to the mutations of these proteins towards
the mutations (affecting function) be made (the phenomen which is called
like buffering in evolutional biology)?

Some teqnichal question: is it possible in Chimera (using mut align plugin
for instance)  to redunce number of sequences ussed in the  conservation
diagrams as well as in the in multiple alignments diagrams (for inctance to
hide some sequences from the all set which will not been used)?  E.g:
Previously I've had to make new aligmnet.fasta files with different number
of sequences and load it each time in the Chimera when I need to compare
different number of sequences. But in fact I'd like to do it within one
open Chimera session with the set of maximum number of sequences hidding
the unused sequenses.

Thanks alot for help,

James

2014-09-02 0:07 GMT+04:00 Elaine Meng <meng@cgl.ucsf.edu>:
...
Hi James,
That’s a really broad question… usually that’s why people are interested
in showing conservation on some structure: to highlight the important
residues, and in combination with structural analysis and/or mutagenesis,
to suggest their roles in structure and function.  I don’t have specific
papers in mind, but there should be no shortage if you try a few keyword
searches.
There is a bit more discussion in my page on “sources of sequence
alignments” as to choosing the set of sequences, or source (web database or
web server) of a set of sequences for calculating conservation.
<http://www.cgl.ucsf.edu/home/meng/sources.html>
(That page is also linked to the “mapping sequence conservation” tutorial.)
However, your question is really about making judgement calls as they
pertain to your specific research project, which may be beyond the scope of
this Chimera list, and further, I don’t claim to be an authority on the
subject.  Logic would dictate that if you are looking for residues
important in function X, then you would want to include only sequences of
proteins that perform function X, but if you are looking for patterns of
conservation among related functions X,Y,Z in some set of homologous
proteins, you would use a broader set.    Maybe it is known what percent
IDs give the proper boundaries in your situation, but more often it is not
known.  By boundaries I mean how much variability can be in your set
without including some proteins that don’t have the function of interest.
The percent ID filtering you mention is a little bit different - that
filtering would remove sequences from the set that are similar to another
within the specified cutoff, but could be considered separately from the
issue of how broadly variable the set is in the first place (for example,
only class A GPCRs, only amine neurotransmitter-binding GPCRs, or only
Gs-stimulating GPCRs).
In other words, a data set could be big for at least two different
reasons: (1) it’s broad, (2) it’s redundant. For redundancy-filtering, I’d
personally only filter down to get an alignment that is not too big for
Chimera, and err on the side of including lots of sequences as long as the
overall range of the set is appropriate for the function of interest.
There are  sequence-weighting options in the AL2CO conservation scoring in
Chimera that may help to mitigate any under- and over-representation of
subsets of the sequences.
Disclaimer:  this is just my opinion as someone who has dabbled in the
area, and others with more experience may have divergent views or better
ideas of how to attack the problem!
I hope this helps,
Elaine
-----
Elaine C. Meng, Ph.D.
UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
Department of Pharmaceutical Chemistry
University of California, San Francisco
On Sep 1, 2014, at 2:55 AM, James Starlight <jmsstarlight@gmail.com>
wrote:
...
Elaine,
I have not fully finished your tutorial but now I have one question:
have you seen some intresting papers covering my problem: to make
prediction of the functional properties of some amino acid (motifs) based
on the analysis of the 3D structure of the protein under interest together
with the analysis of sequences of closely related homologues?
In particular I'm interested of how much sequences should I include to
the MSA and what threshold for the seq identity (agains my target protein)
should be chosen. E.g In case where I deal with the set of G-protein
coupled receptors (which has low sequence similarity but hight structure
conservation): I've obtained 2 different pictures of the conservative a.a
motifs in cases where i've used i) only several templates with low sequence
(40%) identity VS ii) where I have used alot of sequenses with begger
identity (up to 60%). In the latter cases I've obtained much bigger
conservation in the motifs seen based on the analysis of SS( which is
trivial!) where in the i) case- there were only several highly conservative
motifs. Does it means that the analysis of BIG datasets with bigger
sequence identity produce bigger unsertaintly in the final results because
we can conclude about what conservative elements are *really* functional
importnat?
James