Hi Tom, thanks a lot for the detailed reply!
Hi Matthias,
The species name in the Alphafold database sometimes includes the strain name. Would you want the search to find all species matching the initial part of the species text?
Yes, I was envisioning a similar function as the uniport or NCBI BLAST tools where you can restrict by taxonomy – personally I would not be interesting in restricting to
specific strains, but perhaps other users are. Could this be handed flexibly?
I added to ChimeraX 1.9 a tool called Similar Structures under menu Tools / Structure Analysis that can search for structures using Foldseek or MMseqs2. You might be interested this when searching the AlphaFold database since the BLAST search is very slow for the 200 million structures in that database where foldseek and mmseqs2 are quite fast (tens of seconds). The Similar Structures tools is different from the BLAST tool in that it is intended to compare the database structures to a query, so you have to have a query structure, not just a query sequence to use it.
Just checked It out, what a great tool!! In my specific case I was interested to search only a small domain/short linear motif present in my POI - from what I saw, foldseek
always searches the entire chain? I thought I could simply delete the residues I am not interested in from the model prior to performing Foldseek, but this throws an error:
Foldseek query failed: unsupported operand type(s) for /: 'str' and 'int'.
Are there other ways to search only a selection of my model?
I considered adding a species-specific search option for the similar structures command but didn't add that and instead I have used Elaine's suggestion when using it to sort by the Species column. But I agree an option to show only the results for one species would be useful. I was looking at influenza proteins where the species usually had a strain appended (when searching PDB) so I would definitely need the matching to just match a specified prefix "influenza". The search web services being used don't support specifying a species, so it won't speed up the search. ChimeraX will just get the results for all species but only show you the ones for the species you specified. Because of that the limits on the maximum number of results won't be just for the species you specify but for all results.
Hm, the limitations for maximum number of results could be an issue. In my case, searching for 1000 hits in the alphafold database did not reveal any hits in my species
of interest owing to high conservation between species. A shame that the webservices don’t support specifying species …This would be the ideal solution : )
Thanks again also for opening the ticket! Best wishes,
Matthias