Analysis of the multiple alignments + structures

James Starlight

29 Aug 2014 29 Aug '14

3:52 a.m.

Dear Chimera Users! I'd like to perform analysis of the set of the sequences (presented in the FASTA multiple-alignment file) together with the some (not all) X-ray structures available for several of the sequences presented in its database to detect conservative motifs (from sequence) and make conclusions about its functional importance (based on the analysis of the Xray data) to make conclusions between conservation of sequence and it's functional relevance. Could you provide me with the some useful Chimera plugging as well as for some tutorials Kind regards, James

Attachments:

attachment.html (text/html — 666 bytes)

Show replies by date

Elaine Meng

29 Aug 29 Aug

10:23 a.m.

Hi James, It happens that I just finished making a new online tutorial “mapping sequence conservation onto structures with Chimera”!! <http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/outline.html> The tutorial has lots of links and details. I’m not aware of plugins, but Chimera’s Multalign Viewer is very rich in features for this type of work; it includes a variety of options for calculating and displaying conservation, mostly provided via the included AL2CO program from the Grishin group. Anyway, that and much more is described in the new tutorial and links therein. Some of the tutorials in the User’s Guide (included with Chimera download) also address this area, with a bit less detail: Sequences and Structures: <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/super.html> Superpositions and Alignments: <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/alignments.html> Of course, besides the sequence-related features, you could still use general structure analysis (H-bonds, Rotamers, etc.) to investigate your sequence-structure hypotheses. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Aug 29, 2014, at 3:52 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...

Dear Chimera Users! I'd like to perform analysis of the set of the sequences (presented in the FASTA multiple-alignment file) together with the some (not all) X-ray structures available for several of the sequences presented in its database to detect conservative motifs (from sequence) and make conclusions about its functional importance (based on the analysis of the Xray data) to make conclusions between conservation of sequence and it's functional relevance. Could you provide me with the some useful Chimera plugging as well as for some tutorials Kind regards, James

Elaine Meng

10:43 a.m.

Hi James, These tutorials may also be worth a look, or at least a glance to see if they cover things you want to know about: Structure Analysis and Comparison <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/squalene.html> Functional Annotation Scenario: The Structure-Function Linkage Database (SFLD) and Chimera <http://www.rbvi.ucsf.edu/chimera/data/sfld2014/sfld2014.html> Elaine On Aug 29, 2014, at 10:23 AM, Elaine Meng <meng@cgl.ucsf.edu> wrote:

...

Hi James, It happens that I just finished making a new online tutorial “mapping sequence conservation onto structures with Chimera”!!

<http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/outline.html>

The tutorial has lots of links and details. I’m not aware of plugins, but Chimera’s Multalign Viewer is very rich in features for this type of work; it includes a variety of options for calculating and displaying conservation, mostly provided via the included AL2CO program from the Grishin group. Anyway, that and much more is described in the new tutorial and links therein.

Some of the tutorials in the User’s Guide (included with Chimera download) also address this area, with a bit less detail:

Sequences and Structures: <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/super.html> Superpositions and Alignments: <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/alignments.html>

Of course, besides the sequence-related features, you could still use general structure analysis (H-bonds, Rotamers, etc.) to investigate your sequence-structure hypotheses. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Aug 29, 2014, at 3:52 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Dear Chimera Users! I'd like to perform analysis of the set of the sequences (presented in the FASTA multiple-alignment file) together with the some (not all) X-ray structures available for several of the sequences presented in its database to detect conservative motifs (from sequence) and make conclusions about its functional importance (based on the analysis of the Xray data) to make conclusions between conservation of sequence and it's functional relevance. Could you provide me with the some useful Chimera plugging as well as for some tutorials Kind regards, James

_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

James Starlight

1 Sep 1 Sep

12:36 a.m.

Thanks alot!! James 2014-08-29 19:43 GMT+02:00 Elaine Meng <meng@cgl.ucsf.edu>:

...

Hi James, These tutorials may also be worth a look, or at least a glance to see if they cover things you want to know about:

Structure Analysis and Comparison <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/squalene.html>

Functional Annotation Scenario: The Structure-Function Linkage Database (SFLD) and Chimera <http://www.rbvi.ucsf.edu/chimera/data/sfld2014/sfld2014.html>

Elaine

On Aug 29, 2014, at 10:23 AM, Elaine Meng <meng@cgl.ucsf.edu> wrote:

...
Hi James, It happens that I just finished making a new online tutorial “mapping sequence conservation onto structures with Chimera”!!

<http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/outline.html>

The tutorial has lots of links and details. I’m not aware of plugins, but Chimera’s Multalign Viewer is very rich in features for this type of work; it includes a variety of options for calculating and displaying conservation, mostly provided via the included AL2CO program from the Grishin group. Anyway, that and much more is described in the new tutorial and links therein.

Some of the tutorials in the User’s Guide (included with Chimera download) also address this area, with a bit less detail:

Sequences and Structures: < http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/super.html> Superpositions and Alignments: < http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/alignments.html

Of course, besides the sequence-related features, you could still use general structure analysis (H-bonds, Rotamers, etc.) to investigate your sequence-structure hypotheses. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Aug 29, 2014, at 3:52 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Dear Chimera Users! I'd like to perform analysis of the set of the sequences (presented in the FASTA multiple-alignment file) together with the some (not all) X-ray structures available for several of the sequences presented in its database to detect conservative motifs (from sequence) and make conclusions about its functional importance (based on the analysis of the Xray data) to make conclusions between conservation of sequence and it's functional relevance. Could you provide me with the some useful Chimera plugging as well as for some tutorials Kind regards, James

_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

James Starlight

2:55 a.m.

Elaine, I have not fully finished your tutorial but now I have one question: have you seen some intresting papers covering my problem: to make prediction of the functional properties of some amino acid (motifs) based on the analysis of the 3D structure of the protein under interest together with the analysis of sequences of closely related homologues? In particular I'm interested of how much sequences should I include to the MSA and what threshold for the seq identity (agains my target protein) should be chosen. E.g In case where I deal with the set of G-protein coupled receptors (which has low sequence similarity but hight structure conservation): I've obtained 2 different pictures of the conservative a.a motifs in cases where i've used i) only several templates with low sequence (40%) identity VS ii) where I have used alot of sequenses with begger identity (up to 60%). In the latter cases I've obtained much bigger conservation in the motifs seen based on the analysis of SS( which is trivial!) where in the i) case- there were only several highly conservative motifs. Does it means that the analysis of BIG datasets with bigger sequence identity produce bigger unsertaintly in the final results because we can conclude about what conservative elements are *really* functional importnat? James 2014-09-01 11:36 GMT+04:00 James Starlight <jmsstarlight@gmail.com>:

...

Thanks alot!!

James

2014-08-29 19:43 GMT+02:00 Elaine Meng <meng@cgl.ucsf.edu>:

Hi James,

...
These tutorials may also be worth a look, or at least a glance to see if they cover things you want to know about:

Structure Analysis and Comparison <http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/squalene.html

...
Functional Annotation Scenario: The Structure-Function Linkage Database (SFLD) and Chimera <http://www.rbvi.ucsf.edu/chimera/data/sfld2014/sfld2014.html>

Elaine

On Aug 29, 2014, at 10:23 AM, Elaine Meng <meng@cgl.ucsf.edu> wrote:

...
Hi James, It happens that I just finished making a new online tutorial “mapping sequence conservation onto structures with Chimera”!!

<http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/outline.html>

The tutorial has lots of links and details. I’m not aware of plugins, but Chimera’s Multalign Viewer is very rich in features for this type of work; it includes a variety of options for calculating and displaying conservation, mostly provided via the included AL2CO program from the Grishin group. Anyway, that and much more is described in the new tutorial and links therein.

Some of the tutorials in the User’s Guide (included with Chimera download) also address this area, with a bit less detail:

Sequences and Structures: < http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/super.html> Superpositions and Alignments: < http://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/tutorials/alignments.html

Of course, besides the sequence-related features, you could still use general structure analysis (H-bonds, Rotamers, etc.) to investigate your sequence-structure hypotheses. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Aug 29, 2014, at 3:52 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Dear Chimera Users! I'd like to perform analysis of the set of the sequences (presented in the FASTA multiple-alignment file) together with the some (not all) X-ray structures available for several of the sequences presented in its database to detect conservative motifs (from sequence) and make conclusions about its functional importance (based on the analysis of the Xray data) to make conclusions between conservation of sequence and it's functional relevance. Could you provide me with the some useful Chimera plugging as well as for some tutorials Kind regards, James

_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

Elaine Meng

1:07 p.m.

Hi James, That’s a really broad question… usually that’s why people are interested in showing conservation on some structure: to highlight the important residues, and in combination with structural analysis and/or mutagenesis, to suggest their roles in structure and function. I don’t have specific papers in mind, but there should be no shortage if you try a few keyword searches. There is a bit more discussion in my page on “sources of sequence alignments” as to choosing the set of sequences, or source (web database or web server) of a set of sequences for calculating conservation. <http://www.cgl.ucsf.edu/home/meng/sources.html> (That page is also linked to the “mapping sequence conservation” tutorial.) However, your question is really about making judgement calls as they pertain to your specific research project, which may be beyond the scope of this Chimera list, and further, I don’t claim to be an authority on the subject. Logic would dictate that if you are looking for residues important in function X, then you would want to include only sequences of proteins that perform function X, but if you are looking for patterns of conservation among related functions X,Y,Z in some set of homologous proteins, you would use a broader set. Maybe it is known what percent IDs give the proper boundaries in your situation, but more often it is not known. By boundaries I mean how much variability can be in your set without including some proteins that don’t have the function of interest. The percent ID filtering you mention is a little bit different - that filtering would remove sequences from the set that are similar to another within the specified cutoff, but could be considered separately from the issue of how broadly variable the set is in the first place (for example, only class A GPCRs, only amine neurotransmitter-binding GPCRs, or only Gs-stimulating GPCRs). In other words, a data set could be big for at least two different reasons: (1) it’s broad, (2) it’s redundant. For redundancy-filtering, I’d personally only filter down to get an alignment that is not too big for Chimera, and err on the side of including lots of sequences as long as the overall range of the set is appropriate for the function of interest. There are sequence-weighting options in the AL2CO conservation scoring in Chimera that may help to mitigate any under- and over-representation of subsets of the sequences. Disclaimer: this is just my opinion as someone who has dabbled in the area, and others with more experience may have divergent views or better ideas of how to attack the problem! I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Sep 1, 2014, at 2:55 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...

Elaine,

I have not fully finished your tutorial but now I have one question: have you seen some intresting papers covering my problem: to make prediction of the functional properties of some amino acid (motifs) based on the analysis of the 3D structure of the protein under interest together with the analysis of sequences of closely related homologues?

In particular I'm interested of how much sequences should I include to the MSA and what threshold for the seq identity (agains my target protein) should be chosen. E.g In case where I deal with the set of G-protein coupled receptors (which has low sequence similarity but hight structure conservation): I've obtained 2 different pictures of the conservative a.a motifs in cases where i've used i) only several templates with low sequence (40%) identity VS ii) where I have used alot of sequenses with begger identity (up to 60%). In the latter cases I've obtained much bigger conservation in the motifs seen based on the analysis of SS( which is trivial!) where in the i) case- there were only several highly conservative motifs. Does it means that the analysis of BIG datasets with bigger sequence identity produce bigger unsertaintly in the final results because we can conclude about what conservative elements are *really* functional importnat?

James

James Starlight

2 Sep 2 Sep

2:12 a.m.

Hi Elaine, thanks alot for such detailed suggestions! In my case the situation might be much simplier because I'd like to i) make conclusions about structure-function relationships in the set of olfactory receptors proteins which are -) all rhodopsin-like (A) GPCRs sharing 40-70% sequence identify -) have the same functional-relevant motifs -) interact only with one Golf type of transdusser -) !! but !! could interact with big range of odorants (ligands). Taking into account that TM bundle of those proteins has big conservation in comparison to its extracellular (expecially loop) part I should to focus on the extracellular (less conservative) part which are mainly involved in the ligand binding (! because I'd like to predict mutations which affect ligand binding first of all). BTW taking into account very hight level of the redundancy in GPCRs sequence (within the rhodopsin-like subfamily less that 40% of homology on sequence) but hight conservation of the structure of all there proteins (rmsd <4 A) might the conclusion about high tollerance to the mutations of these proteins towards the mutations (affecting function) be made (the phenomen which is called like buffering in evolutional biology)? Some teqnichal question: is it possible in Chimera (using mut align plugin for instance) to redunce number of sequences ussed in the conservation diagrams as well as in the in multiple alignments diagrams (for inctance to hide some sequences from the all set which will not been used)? E.g: Previously I've had to make new aligmnet.fasta files with different number of sequences and load it each time in the Chimera when I need to compare different number of sequences. But in fact I'd like to do it within one open Chimera session with the set of maximum number of sequences hidding the unused sequenses. Thanks alot for help, James 2014-09-02 0:07 GMT+04:00 Elaine Meng <meng@cgl.ucsf.edu>:

...

Hi James, That’s a really broad question… usually that’s why people are interested in showing conservation on some structure: to highlight the important residues, and in combination with structural analysis and/or mutagenesis, to suggest their roles in structure and function. I don’t have specific papers in mind, but there should be no shortage if you try a few keyword searches.

There is a bit more discussion in my page on “sources of sequence alignments” as to choosing the set of sequences, or source (web database or web server) of a set of sequences for calculating conservation. <http://www.cgl.ucsf.edu/home/meng/sources.html> (That page is also linked to the “mapping sequence conservation” tutorial.)

However, your question is really about making judgement calls as they pertain to your specific research project, which may be beyond the scope of this Chimera list, and further, I don’t claim to be an authority on the subject. Logic would dictate that if you are looking for residues important in function X, then you would want to include only sequences of proteins that perform function X, but if you are looking for patterns of conservation among related functions X,Y,Z in some set of homologous proteins, you would use a broader set. Maybe it is known what percent IDs give the proper boundaries in your situation, but more often it is not known. By boundaries I mean how much variability can be in your set without including some proteins that don’t have the function of interest.

The percent ID filtering you mention is a little bit different - that filtering would remove sequences from the set that are similar to another within the specified cutoff, but could be considered separately from the issue of how broadly variable the set is in the first place (for example, only class A GPCRs, only amine neurotransmitter-binding GPCRs, or only Gs-stimulating GPCRs). In other words, a data set could be big for at least two different reasons: (1) it’s broad, (2) it’s redundant. For redundancy-filtering, I’d personally only filter down to get an alignment that is not too big for Chimera, and err on the side of including lots of sequences as long as the overall range of the set is appropriate for the function of interest. There are sequence-weighting options in the AL2CO conservation scoring in Chimera that may help to mitigate any under- and over-representation of subsets of the sequences.

Disclaimer: this is just my opinion as someone who has dabbled in the area, and others with more experience may have divergent views or better ideas of how to attack the problem!

I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Sep 1, 2014, at 2:55 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Elaine,

I have not fully finished your tutorial but now I have one question: have you seen some intresting papers covering my problem: to make prediction of the functional properties of some amino acid (motifs) based on the analysis of the 3D structure of the protein under interest together with the analysis of sequences of closely related homologues?

In particular I'm interested of how much sequences should I include to the MSA and what threshold for the seq identity (agains my target protein) should be chosen. E.g In case where I deal with the set of G-protein coupled receptors (which has low sequence similarity but hight structure conservation): I've obtained 2 different pictures of the conservative a.a motifs in cases where i've used i) only several templates with low sequence (40%) identity VS ii) where I have used alot of sequenses with begger identity (up to 60%). In the latter cases I've obtained much bigger conservation in the motifs seen based on the analysis of SS( which is trivial!) where in the i) case- there were only several highly conservative motifs. Does it means that the analysis of BIG datasets with bigger sequence identity produce bigger unsertaintly in the final results because we can conclude about what conservative elements are *really* functional importnat?

James

Elaine Meng

10:53 a.m.

Hi James, On the first question: I don't know, you would have to use your own judgement. One thought is that the overall fold structure is only part of function; for example, a point mutation might prevent binding a specific odorant ligand without disrupting the protein fold. On the second question: sorry no, Chimera doesn't do any %ID-filtering. You would have to use another program, as in Case 3 of the "mapping sequence conservation" tutorial. <http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/outline.html> The only other program I've tried for this purpose is JalView, but I had some issues with the results (it seemed to count positions where both sequences have gaps as an identity position). If your alignment doesn't have many gaps, however, this wouldn't affect the results too much. Also, I'm pretty sure there are many other programs to do this; I just haven't tried them myself. Best, Elaine On Sep 2, 2014, at 2:12 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...

Hi Elaine,

thanks alot for such detailed suggestions!

In my case the situation might be much simplier because I'd like to i) make conclusions about structure-function relationships in the set of olfactory receptors proteins which are -) all rhodopsin-like (A) GPCRs sharing 40-70% sequence identify -) have the same functional-relevant motifs -) interact only with one Golf type of transdusser -) !! but !! could interact with big range of odorants (ligands). Taking into account that TM bundle of those proteins has big conservation in comparison to its extracellular (expecially loop) part I should to focus on the extracellular (less conservative) part which are mainly involved in the ligand binding (! because I'd like to predict mutations which affect ligand binding first of all). BTW taking into account very hight level of the redundancy in GPCRs sequence (within the rhodopsin-like subfamily less that 40% of homology on sequence) but hight conservation of the structure of all there proteins (rmsd <4 A) might the conclusion about high tollerance to the mutations of these proteins towards the mutations (affecting function) be made (the phenomen which is called like buffering in evolutional biology)?

Some teqnichal question: is it possible in Chimera (using mut align plugin for instance) to redunce number of sequences ussed in the conservation diagrams as well as in the in multiple alignments diagrams (for inctance to hide some sequences from the all set which will not been used)? E.g: Previously I've had to make new aligmnet.fasta files with different number of sequences and load it each time in the Chimera when I need to compare different number of sequences. But in fact I'd like to do it within one open Chimera session with the set of maximum number of sequences hidding the unused sequenses.

Thanks alot for help,

James

2014-09-02 0:07 GMT+04:00 Elaine Meng <meng@cgl.ucsf.edu>: Hi James, That’s a really broad question… usually that’s why people are interested in showing conservation on some structure: to highlight the important residues, and in combination with structural analysis and/or mutagenesis, to suggest their roles in structure and function. I don’t have specific papers in mind, but there should be no shortage if you try a few keyword searches.

There is a bit more discussion in my page on “sources of sequence alignments” as to choosing the set of sequences, or source (web database or web server) of a set of sequences for calculating conservation. <http://www.cgl.ucsf.edu/home/meng/sources.html> (That page is also linked to the “mapping sequence conservation” tutorial.)

However, your question is really about making judgement calls as they pertain to your specific research project, which may be beyond the scope of this Chimera list, and further, I don’t claim to be an authority on the subject. Logic would dictate that if you are looking for residues important in function X, then you would want to include only sequences of proteins that perform function X, but if you are looking for patterns of conservation among related functions X,Y,Z in some set of homologous proteins, you would use a broader set. Maybe it is known what percent IDs give the proper boundaries in your situation, but more often it is not known. By boundaries I mean how much variability can be in your set without including some proteins that don’t have the function of interest.

The percent ID filtering you mention is a little bit different - that filtering would remove sequences from the set that are similar to another within the specified cutoff, but could be considered separately from the issue of how broadly variable the set is in the first place (for example, only class A GPCRs, only amine neurotransmitter-binding GPCRs, or only Gs-stimulating GPCRs). In other words, a data set could be big for at least two different reasons: (1) it’s broad, (2) it’s redundant. For redundancy-filtering, I’d personally only filter down to get an alignment that is not too big for Chimera, and err on the side of including lots of sequences as long as the overall range of the set is appropriate for the function of interest. There are sequence-weighting options in the AL2CO conservation scoring in Chimera that may help to mitigate any under- and over-representation of subsets of the sequences.

Disclaimer: this is just my opinion as someone who has dabbled in the area, and others with more experience may have divergent views or better ideas of how to attack the problem!

I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Sep 1, 2014, at 2:55 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Elaine,

I have not fully finished your tutorial but now I have one question: have you seen some intresting papers covering my problem: to make prediction of the functional properties of some amino acid (motifs) based on the analysis of the 3D structure of the protein under interest together with the analysis of sequences of closely related homologues?

In particular I'm interested of how much sequences should I include to the MSA and what threshold for the seq identity (agains my target protein) should be chosen. E.g In case where I deal with the set of G-protein coupled receptors (which has low sequence similarity but hight structure conservation): I've obtained 2 different pictures of the conservative a.a motifs in cases where i've used i) only several templates with low sequence (40%) identity VS ii) where I have used alot of sequenses with begger identity (up to 60%). In the latter cases I've obtained much bigger conservation in the motifs seen based on the analysis of SS( which is trivial!) where in the i) case- there were only several highly conservative motifs. Does it means that the analysis of BIG datasets with bigger sequence identity produce bigger unsertaintly in the final results because we can conclude about what conservative elements are *really* functional importnat?

James

_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

Eric Pettersen

11:32 a.m.

On Sep 2, 2014, at 10:53 AM, Elaine Meng <meng@cgl.ucsf.edu> wrote:

...

...
Some teqnichal question: is it possible in Chimera (using mut align plugin for instance) to redunce number of sequences ussed in the conservation diagrams as well as in the in multiple alignments diagrams (for inctance to hide some sequences from the all set which will not been used)? E.g: Previously I've had to make new aligmnet.fasta files with different number of sequences and load it each time in the Chimera when I need to compare different number of sequences. But in fact I'd like to do it within one open Chimera session with the set of maximum number of sequences hidding the unused sequences.

What Elaine said about there being no percent-ID filtering is correct, nonetheless there are a couple of tips here for working with a subset of sequences. You can drag out a region on the main alignment that covers the sequences you are interested and then directly create an alignment from them with the Edit/Region->New Window menu item. If you have trouble with that (I did for one specific alignment file and will be fixing the problem after I send this mail!) you can also use File/Save As to save the region to a file (make sure to click the "Restrict save to active region" button on the Save As dialog) and then open that file. --Eric Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu

James Starlight

3 Sep 3 Sep

5:11 a.m.

Thank you very much for suggestions! Regarding techniqal question: Indeed I've found that the JalView + Chimera Mutalign give me best results! ;) Elaine, how do you think might the *selective* analysis of the alignments of *different* branches of the phylogenetic tree of the olfactory receptors be more effective for the the prediction of the mutations affecting its specifity and affinity towards different clases of ligands (odors)? I've noticed that the same aproach have been used in your 2003 study (Evolutionary trace of GPCRs) where you made conclusions (mainly based on the selectivy analys of the Opsins) about mutations affecting i) ligand binding site, iii) constitutive activity as well as ii) coupling to the G-protein. Kind regards, James 2014-09-02 22:32 GMT+04:00 Eric Pettersen <pett@cgl.ucsf.edu>:

...

On Sep 2, 2014, at 10:53 AM, Elaine Meng <meng@cgl.ucsf.edu> wrote:

Some teqnichal question: is it possible in Chimera (using mut align plugin for instance) to redunce number of sequences ussed in the conservation diagrams as well as in the in multiple alignments diagrams (for inctance to hide some sequences from the all set which will not been used)? E.g: Previously I've had to make new aligmnet.fasta files with different number of sequences and load it each time in the Chimera when I need to compare different number of sequences. But in fact I'd like to do it within one open Chimera session with the set of maximum number of sequences hidding the unused sequences.

What Elaine said about there being no percent-ID filtering is correct, nonetheless there are a couple of tips here for working with a subset of sequences. You can drag out a region on the main alignment that covers the sequences you are interested and then directly create an alignment from them with the Edit/Region->New Window menu item. If you have trouble with that (I did for one specific alignment file and will be fixing the problem after I send this mail!) you can also use File/Save As to save the region to a file (make sure to click the "Restrict save to active region" button on the Save As dialog) and then open that file.

--Eric

Eric Pettersen

UCSF Computer Graphics Lab

http://www.cgl.ucsf.edu

Elaine Meng

10:36 a.m.

Hi James, I’m glad you found a solution for the technical problem. I’m not really offering general research consulting on this list, and don’t claim to be an expert whose opinions should be followed. That said, my general opinion is that evolutionary trace can offer additional insights to just looking at the conservation in the columns of one big alignment. Potential downsides are that for evolutionary trace you need a reliable phylogenetic tree, and to be careful about the range and balance of the data set of sequences. In other words, tracing evolution is more difficult than just aligning the sequences, but then you can see correlations of specific mutations with evolutionary divergences. Rather than asking me, better to look in the literature. I realize that is a daunting task, as there are many papers in this general area. The Lichtarge group is continuing to explore evolutionary trace, and many other groups are using similar approaches. My main involvement in the 2004 Madabushi et al. paper was to compile the list of experimentally evaluated GPCR mutations and their literature references! Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Sep 3, 2014, at 5:11 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...

Thank you very much for suggestions! Regarding techniqal question: Indeed I've found that the JalView + Chimera Mutalign give me best results! ;)

Elaine, how do you think might the *selective* analysis of the alignments of *different* branches of the phylogenetic tree of the olfactory receptors be more effective for the the prediction of the mutations affecting its specifity and affinity towards different clases of ligands (odors)? I've noticed that the same aproach have been used in your 2003 study (Evolutionary trace of GPCRs) where you made conclusions (mainly based on the selectivy analys of the Opsins) about mutations affecting i) ligand binding site, iii) constitutive activity as well as ii) coupling to the G-protein.

Kind regards,

James

James Starlight

4 Sep 4 Sep

12:14 a.m.

Elaine, thank you very much! I'll try to check literature. Best, James 2014-09-03 19:36 GMT+02:00 Elaine Meng <meng@cgl.ucsf.edu>:

...

Hi James, I’m glad you found a solution for the technical problem.

I’m not really offering general research consulting on this list, and don’t claim to be an expert whose opinions should be followed. That said, my general opinion is that evolutionary trace can offer additional insights to just looking at the conservation in the columns of one big alignment. Potential downsides are that for evolutionary trace you need a reliable phylogenetic tree, and to be careful about the range and balance of the data set of sequences. In other words, tracing evolution is more difficult than just aligning the sequences, but then you can see correlations of specific mutations with evolutionary divergences.

Rather than asking me, better to look in the literature. I realize that is a daunting task, as there are many papers in this general area. The Lichtarge group is continuing to explore evolutionary trace, and many other groups are using similar approaches. My main involvement in the 2004 Madabushi et al. paper was to compile the list of experimentally evaluated GPCR mutations and their literature references!

Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Sep 3, 2014, at 5:11 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Thank you very much for suggestions! Regarding techniqal question: Indeed I've found that the JalView + Chimera Mutalign give me best results! ;)

Elaine, how do you think might the *selective* analysis of the alignments of *different* branches of the phylogenetic tree of the olfactory receptors be more effective for the the prediction of the mutations affecting its specifity and affinity towards different clases of ligands (odors)? I've noticed that the same aproach have been used in your 2003 study (Evolutionary trace of GPCRs) where you made conclusions (mainly based on the selectivy analys of the Opsins) about mutations affecting i) ligand binding site, iii) constitutive activity as well as ii) coupling to the G-protein.

Kind regards,

James

James Starlight

11:39 p.m.

Elaine, onequestion about your 2003 GPCRs study: in this paper you've used the method called differential evolution tracing to detect functionally-relevant amino acids for rhodopsin only (substration of the conservative residues found in rhodopsin from those found in whole set of A-class GPCRs). I wounder if there any possibility for this method in Chimera's mutalign (in case where I have big alignment with thesequences of all olfactory receptors) to substract residues found in one philogenetic branch (corresponded to specified sub-class or group of those receptors) from the residues common to all olfactory receptors or (ii) substract residues common for the all olfactory receptors from those common to all A-class GPCRs? BTW do you know open-access data-bases consisted of the evolutional (alignments) information to the GPCRs? it's intresting to compare my alignment with some references. Thanks for help, James 2014-09-04 11:14 GMT+04:00 James Starlight <jmsstarlight@gmail.com>:

...

Elaine, thank you very much!

I'll try to check literature.

Best,

James

2014-09-03 19:36 GMT+02:00 Elaine Meng <meng@cgl.ucsf.edu>:

...
Hi James, I’m glad you found a solution for the technical problem.

I’m not really offering general research consulting on this list, and don’t claim to be an expert whose opinions should be followed. That said, my general opinion is that evolutionary trace can offer additional insights to just looking at the conservation in the columns of one big alignment. Potential downsides are that for evolutionary trace you need a reliable phylogenetic tree, and to be careful about the range and balance of the data set of sequences. In other words, tracing evolution is more difficult than just aligning the sequences, but then you can see correlations of specific mutations with evolutionary divergences.

Rather than asking me, better to look in the literature. I realize that is a daunting task, as there are many papers in this general area. The Lichtarge group is continuing to explore evolutionary trace, and many other groups are using similar approaches. My main involvement in the 2004 Madabushi et al. paper was to compile the list of experimentally evaluated GPCR mutations and their literature references!

Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Sep 3, 2014, at 5:11 AM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Thank you very much for suggestions! Regarding techniqal question: Indeed I've found that the JalView + Chimera Mutalign give me best results! ;)

Elaine, how do you think might the *selective* analysis of the alignments of *different* branches of the phylogenetic tree of the olfactory receptors be more effective for the the prediction of the mutations affecting its specifity and affinity towards different clases of ligands (odors)? I've noticed that the same aproach have been used in your 2003 study (Evolutionary trace of GPCRs) where you made conclusions (mainly based on the selectivy analys of the Opsins) about mutations affecting i) ligand binding site, iii) constitutive activity as well as ii) coupling to the G-protein.

Kind regards,

James

Elaine Meng

5 Sep 5 Sep

9:59 a.m.

HI James, Sorry, Chimera does not do evolutionary trace calculations. I would definitely have mentioned it if it did! In Chimera, if you input a tree along with your alignment, you can click a node on the tree and easily extract only the sequences for that node, but there is nothing to compare the residues from one set of sequences with those from another set of sequences, other than looking at the separate alignment windows yourself. Also, Chimera cannot create the tree, it must be read in from a file. <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/fr...> … see the section on Trees: <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/mu...> … example image of tree+alignment in Chimera: <http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/redoxin/ConSurf-1hd2...> Although I’m still very interested in GPCRs, I have not actively worked on them for >10 years, so I’m not the best person to ask for current information. I know there is a GPCRdb, but I don’t know if it has been kept up to date. You could try googling or pubmed-searching for “gpcr alignments” or similar terms. Two ideas for evolutionary trace: (1) The Lichtarge group has an evolutionary trace server. <http://mammoth.bcm.tmc.edu/ETserver.html> At first I thought you could only put in a PDB ID and not have any control over what sequences are used (which might not be useful for your research), but then I noticed a “GPCRs” link on the left-hand side. Click that and it lets you choose specific GPCR subsets, which sounds exactly like the kind of calculation you wanted. I don’t know if it includes your specific groups of interest, however, and I don’t see a way to use your own alignments. <http://mammoth.bcm.tmc.edu/gpcr/diff_GPCRA.html> (2) JEvTrace, a Java implementation of evolutionary trace. <http://compbio.berkeley.edu/people/marcin/jevtrace/> I have one correction to that page. It says you can show results using the MSF Viewer tool in Chimera, which no longer exists. You can still use Chimera, just the Multalign Viewer tool instead (Multalign Viewer menu: "File… Load SCF/Seqsel File” will open the file from JEvTrace). I’ll email the author and tell him to update that information. Those are all my ideas. If neither helps you, you may have to continue searching for programs or other tools and instructions on how to do the steps of the analysis yourself. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco On Sep 4, 2014, at 11:39 PM, James Starlight <jmsstarlight@gmail.com> wrote:

...

Elaine,

onequestion about your 2003 GPCRs study: in this paper you've used the method called differential evolution tracing to detect functionally-relevant amino acids for rhodopsin only (substration of the conservative residues found in rhodopsin from those found in whole set of A-class GPCRs). I wounder if there any possibility for this method in Chimera's mutalign (in case where I have big alignment with thesequences of all olfactory receptors) to substract residues found in one philogenetic branch (corresponded to specified sub-class or group of those receptors) from the residues common to all olfactory receptors or (ii) substract residues common for the all olfactory receptors from those common to all A-class GPCRs? BTW do you know open-access data-bases consisted of the evolutional (alignments) information to the GPCRs? it's intresting to compare my alignment with some references.

Thanks for help,

James

James Starlight

6 Sep 6 Sep

midnight

Hi Elaine, thank you very much It was really useful for me! I plan to look for the post-doc for myself on next year switching from the modeling and theoretical chemistry to structural bioinformatics coupled with the phylogenetic analysis (I don't know yet precise name of this discipline in the computational biology) so It will be very helpful! All the best, James 2014-09-05 20:59 GMT+04:00 Elaine Meng <meng@cgl.ucsf.edu>:

...

HI James, Sorry, Chimera does not do evolutionary trace calculations. I would definitely have mentioned it if it did! In Chimera, if you input a tree along with your alignment, you can click a node on the tree and easily extract only the sequences for that node, but there is nothing to compare the residues from one set of sequences with those from another set of sequences, other than looking at the separate alignment windows yourself. Also, Chimera cannot create the tree, it must be read in from a file. < http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/fr...

...
… see the section on Trees: < http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/mu...

...
… example image of tree+alignment in Chimera: < http://www.rbvi.ucsf.edu/chimera/data/tutorials/systems/redoxin/ConSurf-1hd2...

...
Although I’m still very interested in GPCRs, I have not actively worked on them for >10 years, so I’m not the best person to ask for current information. I know there is a GPCRdb, but I don’t know if it has been kept up to date. You could try googling or pubmed-searching for “gpcr alignments” or similar terms.

Two ideas for evolutionary trace:

(1) The Lichtarge group has an evolutionary trace server. <http://mammoth.bcm.tmc.edu/ETserver.html>

At first I thought you could only put in a PDB ID and not have any control over what sequences are used (which might not be useful for your research), but then I noticed a “GPCRs” link on the left-hand side. Click that and it lets you choose specific GPCR subsets, which sounds exactly like the kind of calculation you wanted. I don’t know if it includes your specific groups of interest, however, and I don’t see a way to use your own alignments. <http://mammoth.bcm.tmc.edu/gpcr/diff_GPCRA.html>

(2) JEvTrace, a Java implementation of evolutionary trace. <http://compbio.berkeley.edu/people/marcin/jevtrace/>

I have one correction to that page. It says you can show results using the MSF Viewer tool in Chimera, which no longer exists. You can still use Chimera, just the Multalign Viewer tool instead (Multalign Viewer menu: "File… Load SCF/Seqsel File” will open the file from JEvTrace). I’ll email the author and tell him to update that information.

Those are all my ideas. If neither helps you, you may have to continue searching for programs or other tools and instructions on how to do the steps of the analysis yourself. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab Department of Pharmaceutical Chemistry University of California, San Francisco

On Sep 4, 2014, at 11:39 PM, James Starlight <jmsstarlight@gmail.com> wrote:

...
Elaine,

onequestion about your 2003 GPCRs study: in this paper you've used the method called differential evolution tracing to detect functionally-relevant amino acids for rhodopsin only (substration of the conservative residues found in rhodopsin from those found in whole set of A-class GPCRs). I wounder if there any possibility for this method in Chimera's mutalign (in case where I have big alignment with thesequences of all olfactory receptors) to substract residues found in one philogenetic branch (corresponded to specified sub-class or group of those receptors) from the residues common to all olfactory receptors or (ii) substract residues common for the all olfactory receptors from those common to all A-class GPCRs? BTW do you know open-access data-bases consisted of the evolutional (alignments) information to the GPCRs? it's intresting to compare my alignment with some references.

Thanks for help,

James

4226

Age (days ago)

4234

Last active (days ago)

List overview

Download

14 comments

3 participants

participants (3)

Elaine Meng
Eric Pettersen
James Starlight

Analysis of the multiple alignments + structures

tags

participants (3)