
Dear ChimeraX team, I wondered if it is possible to limit a BLAST search to only a single species when querying a selected database. This would facilitate identifying homologs in your favorite model organism, rather than picking up the most closely related homolog overall (i.e when using the alphafold database). Many thanks, Matthias

Hi Matthias, There is no option to restrict the search itself. However, when you get the results back, you can show the "Species" column and (optionally) sort the results by that. See screenshot attached. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Resource for Biocomputing, Visualization, and Informatics Department of Pharmaceutical Chemistry University of California, San Francisco
On Jan 27, 2025, at 11:12 PM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX team, I wondered if it is possible to limit a BLAST search to only a single species when querying a selected database. This would facilitate identifying homologs in your favorite model organism, rather than picking up the most closely related homolog overall (i.e when using the alphafold database). Many thanks, Matthias

Hi Elaine, Thanks a lot for the reply. I was aware of the filtering, but constraining the search would be more efficient. If this could be implemented in a future release of ChimeraX, I believe it would be useful for many :) Thanks a lot and best wishes, Matthias Sent from Outlook for Android<https://aka.ms/AAb9ysg> ________________________________ From: Elaine Meng <meng@cgl.ucsf.edu> Sent: Tuesday, January 28, 2025 5:47:48 PM To: Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at> Cc: ChimeraX-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] Limit blast search to species Hi Matthias, There is no option to restrict the search itself. However, when you get the results back, you can show the "Species" column and (optionally) sort the results by that. See screenshot attached. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Resource for Biocomputing, Visualization, and Informatics Department of Pharmaceutical Chemistry University of California, San Francisco [cid:e31f632a-0abd-490c-8114-88dd104aaea7@eurprd06.prod.outlook.com]
On Jan 27, 2025, at 11:12 PM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX team, I wondered if it is possible to limit a BLAST search to only a single species when querying a selected database. This would facilitate identifying homologs in your favorite model organism, rather than picking up the most closely related homolog overall (i.e when using the alphafold database). Many thanks, Matthias

Hi Matthias, The species name in the Alphafold database sometimes includes the strain name. Would you want the search to find all species matching the initial part of the species text? I added to ChimeraX 1.9 a tool called Similar Structures under menu Tools / Structure Analysis that can search for structures using Foldseek or MMseqs2. You might be interested this when searching the AlphaFold database since the BLAST search is very slow for the 200 million structures in that database where foldseek and mmseqs2 are quite fast (tens of seconds). The Similar Structures tools is different from the BLAST tool in that it is intended to compare the database structures to a query, so you have to have a query structure, not just a query sequence to use it. https://www.rbvi.ucsf.edu/chimerax/data/foldseek-jul2024/foldseek.html I considered adding a species-specific search option for the similar structures command but didn't add that and instead I have used Elaine's suggestion when using it to sort by the Species column. But I agree an option to show only the results for one species would be useful. I was looking at influenza proteins where the species usually had a strain appended (when searching PDB) so I would definitely need the matching to just match a specified prefix "influenza". The search web services being used don't support specifying a species, so it won't speed up the search. ChimeraX will just get the results for all species but only show you the ones for the species you specified. Because of that the limits on the maximum number of results won't be just for the species you specify but for all results. Tom Example output search Alphafold database with MMseqs2 (uses RCSB web service, took about 10 seconds), sorted by Species column. 
On Jan 28, 2025, at 10:56 AM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Elaine,
Thanks a lot for the reply. I was aware of the filtering, but constraining the search would be more efficient. If this could be implemented in a future release of ChimeraX, I believe it would be useful for many :)
Thanks a lot and best wishes, Matthias
Sent from Outlook for Android <https://aka.ms/AAb9ysg> From: Elaine Meng <meng@cgl.ucsf.edu> Sent: Tuesday, January 28, 2025 5:47:48 PM To: Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at> Cc: ChimeraX-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] Limit blast search to species
Hi Matthias, There is no option to restrict the search itself. However, when you get the results back, you can show the "Species" column and (optionally) sort the results by that. See screenshot attached. I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Resource for Biocomputing, Visualization, and Informatics Department of Pharmaceutical Chemistry University of California, San Francisco
<Screenshot 2025-01-28 at 8.42.57 AM.png>
On Jan 27, 2025, at 11:12 PM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX team, I wondered if it is possible to limit a BLAST search to only a single species when querying a selected database. This would facilitate identifying homologs in your favorite model organism, rather than picking up the most closely related homolog overall (i.e when using the alphafold database). Many thanks, Matthias
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/

Hi Tom, thanks a lot for the detailed reply! Hi Matthias, The species name in the Alphafold database sometimes includes the strain name. Would you want the search to find all species matching the initial part of the species text? Yes, I was envisioning a similar function as the uniport or NCBI BLAST tools where you can restrict by taxonomy – personally I would not be interesting in restricting to specific strains, but perhaps other users are. Could this be handed flexibly? I added to ChimeraX 1.9 a tool called Similar Structures under menu Tools / Structure Analysis that can search for structures using Foldseek or MMseqs2. You might be interested this when searching the AlphaFold database since the BLAST search is very slow for the 200 million structures in that database where foldseek and mmseqs2 are quite fast (tens of seconds). The Similar Structures tools is different from the BLAST tool in that it is intended to compare the database structures to a query, so you have to have a query structure, not just a query sequence to use it. https://www.rbvi.ucsf.edu/chimerax/data/foldseek-jul2024/foldseek.html Just checked It out, what a great tool!! In my specific case I was interested to search only a small domain/short linear motif present in my POI - from what I saw, foldseek always searches the entire chain? I thought I could simply delete the residues I am not interested in from the model prior to performing Foldseek, but this throws an error: Foldseek query failed: unsupported operand type(s) for /: 'str' and 'int'. Are there other ways to search only a selection of my model? I considered adding a species-specific search option for the similar structures command but didn't add that and instead I have used Elaine's suggestion when using it to sort by the Species column. But I agree an option to show only the results for one species would be useful. I was looking at influenza proteins where the species usually had a strain appended (when searching PDB) so I would definitely need the matching to just match a specified prefix "influenza". The search web services being used don't support specifying a species, so it won't speed up the search. ChimeraX will just get the results for all species but only show you the ones for the species you specified. Because of that the limits on the maximum number of results won't be just for the species you specify but for all results. Hm, the limitations for maximum number of results could be an issue. In my case, searching for 1000 hits in the alphafold database did not reveal any hits in my species of interest owing to high conservation between species. A shame that the webservices don’t support specifying species …This would be the ideal solution : ) Thanks again also for opening the ticket! Best wishes, Matthias

Hi Matthias, Regarding limiting the foldseek query to a part of a chain: The full sequence is still in the metadata of the chain even though you delete some of the atomic coordinates. To only search with part of a chain, one (somewhat tedious) way is to delete the rest of the chain, save as PDB, text-edit the PDB file to remove all the SEQRES records, open the PDB file, use foldseek search. <https://rbvi.ucsf.edu/chimerax/docs/user/tools/foldseek.html> I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Resource for Biocomputing, Visualization, and Informatics Department of Pharmaceutical Chemistry University of California, San Francisco
On Jan 28, 2025, at 1:00 PM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Just checked It out, what a great tool!! In my specific case I was interested to search only a small domain/short linear motif present in my POI - from what I saw, foldseek always searches the entire chain? I thought I could simply delete the residues I am not interested in from the model prior to performing Foldseek, but this throws an error: Foldseek query failed: unsupported operand type(s) for /: 'str' and 'int'. Are there other ways to search only a selection of my model?

Hi Matthias, Foldseek is intended to find proteins with similar folds. So I don't think it can handle short linear motifs -- it needs at least a domain with multiple secondary structure elements. The error you saw is fixed in the daily build. I think that error only happens when Foldseek returns no results, and in your case that might be because Foldseek has some minimum number of residues. It might make more sense to use mmseqs2 to search for your short linear motif -- you can choose mmseqs2 search in the Similar Structures tool. But then again sequence searches of 10 or fewer residues are likely to produce far too many hits. So I am not sure how you search for short linear motifs. You can use a small e-value cutoff to try to get nearly exact sequence matches using the mmseqs2 sequence search command command in ChimeraX "sequence search" https://www.cgl.ucsf.edu/chimerax/docs/user/commands/sequence.html#search The Foldseek server hosted by the Steinegger lab that developed Foldseek limits results to 1000 hits. The mmseqs2 search is from RCSB and you can specify the maximum number of hits using the "sequence search" command (default is 1000). But there is a maximum to the maximum imposed by the server of 10,000 as described in this RCSB page: https://search.rcsb.org/#pagination. I think the RCSB search probably allows narrowing the search to a species but the ChimeraX interface does not handle that. So you could go directly to RCSB in a web browser and do an advanced search of AlphaFold database that would be equivalent to what ChimeraX can do but offers more options. The ChimeraX BLAST search uses a UCSF server that our lab hosts. BLAST is a bit antiquated, even the PDB does not use it, and we would have to add a species filtering to the server. It is using the blastp program on the server which does allow filtering by taxonomy as described here: https://www.ncbi.nlm.nih.gov/books/NBK569846/ Tom
On Jan 28, 2025, at 1:00 PM, Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at> wrote:
Hi Tom, thanks a lot for the detailed reply!
Hi Matthias,
The species name in the Alphafold database sometimes includes the strain name. Would you want the search to find all species matching the initial part of the species text?
Yes, I was envisioning a similar function as the uniport or NCBI BLAST tools where you can restrict by taxonomy – personally I would not be interesting in restricting to specific strains, but perhaps other users are. Could this be handed flexibly?
I added to ChimeraX 1.9 a tool called Similar Structures under menu Tools / Structure Analysis that can search for structures using Foldseek or MMseqs2. You might be interested this when searching the AlphaFold database since the BLAST search is very slow for the 200 million structures in that database where foldseek and mmseqs2 are quite fast (tens of seconds). The Similar Structures tools is different from the BLAST tool in that it is intended to compare the database structures to a query, so you have to have a query structure, not just a query sequence to use it.
https://www.rbvi.ucsf.edu/chimerax/data/foldseek-jul2024/foldseek.html
Just checked It out, what a great tool!! In my specific case I was interested to search only a small domain/short linear motif present in my POI - from what I saw, foldseek always searches the entire chain? I thought I could simply delete the residues I am not interested in from the model prior to performing Foldseek, but this throws an error: Foldseek query failed: unsupported operand type(s) for /: 'str' and 'int'. Are there other ways to search only a selection of my model?
I considered adding a species-specific search option for the similar structures command but didn't add that and instead I have used Elaine's suggestion when using it to sort by the Species column. But I agree an option to show only the results for one species would be useful. I was looking at influenza proteins where the species usually had a strain appended (when searching PDB) so I would definitely need the matching to just match a specified prefix "influenza". The search web services being used don't support specifying a species, so it won't speed up the search. ChimeraX will just get the results for all species but only show you the ones for the species you specified. Because of that the limits on the maximum number of results won't be just for the species you specify but for all results.
Hm, the limitations for maximum number of results could be an issue. In my case, searching for 1000 hits in the alphafold database did not reveal any hits in my species of interest owing to high conservation between species. A shame that the webservices don’t support specifying species …This would be the ideal solution : )
Thanks again also for opening the ticket! Best wishes,
Matthias

Hi Matthias, Elaine, Foldseek is a structure-based search, in other words it uses the atom positions. So just deleting the structure residues will restrict what Foldseek uses to the residues with atom coordinates. As far as I know Foldseek makes no use of residues listed in the sequence that do not have coordinates, so it is not necessary to change the sequence information in the structure file to use Foldseek on a trimmed down structure. Tom
On Jan 28, 2025, at 1:15 PM, Elaine Meng via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Matthias, Regarding limiting the foldseek query to a part of a chain:
The full sequence is still in the metadata of the chain even though you delete some of the atomic coordinates.
To only search with part of a chain, one (somewhat tedious) way is to delete the rest of the chain, save as PDB, text-edit the PDB file to remove all the SEQRES records, open the PDB file, use foldseek search.
<https://rbvi.ucsf.edu/chimerax/docs/user/tools/foldseek.html>
I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Resource for Biocomputing, Visualization, and Informatics Department of Pharmaceutical Chemistry University of California, San Francisco
On Jan 28, 2025, at 1:00 PM, Vorländer,Matthias Kopano via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Just checked It out, what a great tool!! In my specific case I was interested to search only a small domain/short linear motif present in my POI - from what I saw, foldseek always searches the entire chain? I thought I could simply delete the residues I am not interested in from the model prior to performing Foldseek, but this throws an error: Foldseek query failed: unsupported operand type(s) for /: 'str' and 'int'. Are there other ways to search only a selection of my model?
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
participants (3)
-
Elaine Meng
-
Tom Goddard
-
Vorländer,Matthias Kopano