Hi Matthias,

  Foldseek is intended to find proteins with similar folds.  So I don't think it can handle short linear motifs -- it needs at least a domain with multiple secondary structure elements.   The error you saw is fixed in the daily build.  I think that error only happens when Foldseek returns no results, and in your case that might be because Foldseek has some minimum number of residues.  It might make more sense to use mmseqs2 to search for your short linear motif -- you can choose mmseqs2 search in the Similar Structures tool.  But then again sequence searches of 10 or fewer residues are likely to produce far too many hits.  So I am not sure how you search for short linear motifs.  You can use a small e-value cutoff to try to get nearly exact sequence matches using the mmseqs2 sequence search command command in ChimeraX "sequence search"

https://www.cgl.ucsf.edu/chimerax/docs/user/commands/sequence.html#search

The Foldseek server hosted by the Steinegger lab that developed Foldseek limits results to 1000 hits.  The mmseqs2 search is from RCSB and you can specify the maximum number of hits using the "sequence search" command (default is 1000).  But there is a maximum to the maximum imposed by the server of 10,000 as described in this RCSB page: https://search.rcsb.org/#pagination.  I think the RCSB search probably allows narrowing the search to a species but the ChimeraX interface does not handle that.  So you could go directly to RCSB in a web browser and do an advanced search of AlphaFold database that would be equivalent to what ChimeraX can do but offers more options.  The ChimeraX BLAST search uses a UCSF server that our lab hosts.  BLAST is a bit antiquated, even the PDB does not use it, and we would have to add a species filtering to the server.  It is using the blastp program on the server which does allow filtering by taxonomy as described here: https://www.ncbi.nlm.nih.gov/books/NBK569846/

Tom


On Jan 28, 2025, at 1:00 PM, Vorländer,Matthias Kopano <matthias.vorlaender@imp.ac.at> wrote:

 
Hi Tom, thanks a lot for the detailed reply!


Hi Matthias,
 
  The species name in the Alphafold database sometimes includes the strain name.  Would you want the search to find all species matching the initial part of the species text?
 
 
Yes, I was envisioning a similar function as the uniport or NCBI BLAST tools where you can restrict by taxonomy – personally I would not be interesting in restricting to specific strains, but perhaps other users are. Could this be handed flexibly?
 
  I added to ChimeraX 1.9 a tool called Similar Structures under menu Tools / Structure Analysis that can search for structures using Foldseek or MMseqs2.  You might be interested this when searching the AlphaFold database since the BLAST search is very slow for the 200 million structures in that database where foldseek and mmseqs2 are quite fast (tens of seconds).  The Similar Structures tools is different from the BLAST tool in that it is intended to compare the database structures to a query, so you have to have a query structure, not just a query sequence to use it.
 
 
Just checked It out, what a great tool!! In my specific case I was interested to search only a small domain/short linear motif present in my POI -  from what I saw, foldseek always searches the entire chain? I thought I could simply delete the residues I am not interested in from the model prior to performing Foldseek, but this throws an error: Foldseek query failed: unsupported operand type(s) for /: 'str' and 'int'. Are there other ways to search only a selection of my model?

I considered adding a species-specific search option for the similar structures command but didn't add that and instead I have used Elaine's suggestion when using it to sort by the Species column.  But I agree an option to show only the results for one species would be useful.  I was looking at influenza proteins where the species usually had a strain appended (when searching PDB) so I would definitely need the matching to just match a specified prefix "influenza".  The search web services being used don't support specifying a species, so it won't speed up the search.  ChimeraX will just get the results for all species but only show you the ones for the species you specified.  Because of that the limits on the maximum number of results won't be just for the species you specify but for all results.
 
Hm, the limitations for maximum number of results could be an issue. In my case, searching for 1000 hits in the alphafold database did not reveal any hits in my species of interest owing to high conservation between species. A shame that the webservices don’t support specifying species …This would be the ideal solution : )
 
Thanks again also for opening the ticket! Best wishes,
 
Matthias