Hi Jim,

  You can trust that ColabFold did not use any PDB templates.  The fact that it did not output the template_domian_names.json file confirms that.  It is not at all surprising that it got the the prediction right without templates -- if it has hundreds or thousands of sequences aligned and there are structures in the PDB that the neural net was trained on with similar folds then it doesn't need any templates.  It is very surprising that your second domain position rotated 90 degrees.  Very likely the AlphaFold prediction is not at all confident of the original packing or this new one.  You should look at the PAE confidence estimates from AlphaFold to see if AlphaFold thinks the packing has any chance of being correct.

https://www.rbvi.ucsf.edu/chimerax/data/pae-apr2022/pae.html

  Tom


On May 9, 2023, at 4:08 PM, James Raymond <raymond@unlv.nevada.edu> wrote:

The predicted structure for the well-known domain was essentially the same when the PDB option was not selected.
The output also didn't have the template_domain_names.json file, so it didn't appear to use existing PDB structures.
Is it possible that it did use existing PDB structures without recording it in the results? , 
If it didn't, it's amazing that it got the same results without using the x-ray diffraction results.

One difference was that not checking the PDB option caused the 2nd domain of the protein to rotate 90 degrees with respect to the first domain.
Thank you for your help.
Jim




On Tue, May 9, 2023 at 2:47 PM James Raymond <raymond@unlv.nevada.edu> wrote:
Hi Tom,
Very interesting. thank you. I will try it again without checking the PDB option to see if it makes a difference.
Jim


On Tue, May 9, 2023 at 2:25 PM Tom Goddard <goddard@sonic.net> wrote:
Hi Jim,

  As Elaine says, ChimeraX is using ColabFold, which is an optimized version of AlphaFold.  So you should cite ColabFold (and AlphaFold and ChimeraX).

  Also your description of the template use is a bit misleading.  First of all, AlphaFold only uses at most 4 PDB templates per chain.  The templates you saw in the JSON file came from ColabFold.  It is confusing, because AlphaFold will find up to 20 templates per chain, but then it only uses the 4 best scoring.  I think that is always the first 4 of the 20.  These details are from reading the AlphaFold code on GitHub.  Unfortunately the actual workings of AlphaFold are poorly described in the AlphaFold publications.  For instance, you will also see in the code that any template with more than 95% sequence identity is not used.  That seems wrong, since you would of course want to use the best sequence matches, but I think it is a hold-over from the code that trained AlphaFold where they didn't want exact template matches.  This is typical of AlphaFold code, it is shoddy work where DeepMind seemed mostly interested in showing off machine learning, with little concern for actually producing the best biology results.  One last comment about the AlphaFold use of templates.  In almost all cases AlphaFold more or less ignores the templates.  In other words, it produces nearly the same structure whether it uses templates or not.  From hundreds of runs on different sequences my experience has been that if AlphaFold gives a high confidence prediction (say mostly plddt > 80), then even if it uses an experimental structure as a template that differs, it will always prefer its predicted conformation over the experimental structure template.  So only in cases where AlphaFold is not confident of its prediction will templates possibly help.  Basically the template just is a starting structure, and if AlphaFold is at all confident in another conformation it converges to that other conformation and that template makes no difference.

        Tom
        ChimeraX AlphaFold developer



> On May 9, 2023, at 2:06 PM, Elaine Meng via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
>
> Hi Jim,
> We (ChimeraX developers) are not the AlphaFold developers and I personally haven't looked at the details of any of its output files, nor am I an expert on your particular protein of interest.  So although your inference sounds reasonable, I can't say anything authoritative about it.
>
> ....except that you should say that the calculation is actually done with ColabFold, as mentioned in the documentation:
> <https://rbvi.ucsf.edu/chimerax/docs/user/tools/alphafold.html>
>
> Maybe a careful reading of the AlphaFold and ColabFold papers would help to confirm the rest of your description.
>
> Best,
> Elaine
> -----
> Elaine C. Meng, Ph.D.                       
> UCSF Chimera(X) team
> Department of Pharmaceutical Chemistry
> University of California, San Francisco
>
>
>> On May 9, 2023, at 1:38 PM, James Raymond via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
>>
>> Hi Elaine,
>> I am writing up some results. Would you please see if I am describing our method correctly? 
>> Our protein has 2 domains, one whose structure is well known and for which many PDBs are available, and another that is novel.
>> The PDB IDs below came from your file entitled af512_template_domain_names.json
>>
>> "Protein structure was predicted with AlphaFold2 (Jumper et al., 2021) as implemented by UCSF ChimeraX v. 1.6rc (Pettersen et al., 2021) with the option to use PDB templates. In practice, this meant that the N-terminal DUF3494 domain of the protein was predicted based on several existing PDB structures (7bwy_A, 7bwx_A, 3uyu_A, 6a8k_A, 3wp9_A, 4nu3_A, 4nu3_B, 6bg8_B, 4nu2_A, 3vn3_A, 6eio_A, 4nuh_A, 7dc5_B, 5uyt_A, 5uyt_B, 5uyt_A, 5uyt_B) and the C-terminal domain was predicted de novo by AlphaFold."
>>
>> Is it OK?
>>
>> thank you
>> Jim
>
>
> _______________________________________________
> ChimeraX-users mailing list
> ChimeraX-users@cgl.ucsf.edu
> Manage subscription:
> https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
>