Protein-protein interactions assessed through Boltz using ChimeraX consistently scoring quite low
Greetings, We’ve been working with largish data sets and testing for protein-protein interactions using AF2, AF3 and Boltz. For many of these tests we’ve run the prediction programs locally on our supercomputer. We’ve also done a good amount of primary testing and double-checking using web-based interfaces available for CoLab(AF2) and AF3. I just recently started using Boltz2 through the ChimeraX interface. What I’ve noticed so far is a pattern where the Boltz prediction through ChimeraX is giving much lower interaction scores, based on ipTM and PAEs, than Boltz through our supercomputer as well as AF2 and AF3 through our super computer or via the web. I’ve tried running multiple models, seeds, and increasing recycles to 3 as well as checking the two different boxes (Use steering potential vs. Use MSA cache). The commands I’ve run might look like: boltz predict #1/A #1/B recycle 3 seed 4 samples 5. In any case, I’m puzzled as to why the scores are consistently lower. I recognize that I may be doing something wrong. For example, I seem to need to start out by running a first round of Boltz using the interaction window. Then once I get the first prediction, I can next invoke the “boltz predict” as above using different parameters on the command line. (These parameters are not part of the menus in the Boltz window that I can find.) The different runs, seeds, and models, I’m getting do give slightly different results, but they all seem to be under predicting what we know to be are quite strong interactions. I should say that we also see some differences between local and web-based AF3 and AF2 results, but they aren’t as extreme as what we’re seeing from Boltz from what we can tell. Any thoughts? Thanks greatly in advance! David David S. Fay Ph.D. Professor, Department of Molecular Biology Director (PD/PI), National Institutes of Health Wyoming INBRE University of Wyoming email: davidfay@uwyo.edu
Hi David, I haven't observed much difference in Boltz 2 confidence scores on different operating systems (Linux, Mac, Windows) running both the ChimeraX Boltz version (which has modifications to allow it to run on Mac and Windows) and the standard github jwohlwend Boltz. So I don't have an explanation of why your supercomputer Boltz runs would be significantly different from you ChimeraX Boltz runs, except for the obvious factors that you are aware of, that Boltz has various parameters (number of recycles, use of templates) that could effect predictions. If you provide me a pair of sequences and what ipTM scores, or PAE matrix file, for good and bad predictions, I could try running the sequences on some of my machines and see how they compare to your results. Actually probably best if you zip up the full prediction input and output and provide it to me, since I might see something in the logss or other files the gives a clue about why the results differ. You are right that not all the ChimeraX Boltz options are available in the GUI. Some are only in the ChimeraX "boltz predict" command. This is typical of ChimeraX, where the command has more advanced options that we do not put in the GUI. Tom
On Apr 22, 2026, at 8:38 AM, David S. Fay via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Greetings,
We’ve been working with largish data sets and testing for protein-protein interactions using AF2, AF3 and Boltz. For many of these tests we’ve run the prediction programs locally on our supercomputer. We’ve also done a good amount of primary testing and double-checking using web-based interfaces available for CoLab(AF2) and AF3. I just recently started using Boltz2 through the ChimeraX interface.
What I’ve noticed so far is a pattern where the Boltz prediction through ChimeraX is giving much lower interaction scores, based on ipTM and PAEs, than Boltz through our supercomputer as well as AF2 and AF3 through our super computer or via the web. I’ve tried running multiple models, seeds, and increasing recycles to 3 as well as checking the two different boxes (Use steering potential vs. Use MSA cache). The commands I’ve run might look like: boltz predict #1/A #1/B recycle 3 seed 4 samples 5. In any case, I’m puzzled as to why the scores are consistently lower.
I recognize that I may be doing something wrong. For example, I seem to need to start out by running a first round of Boltz using the interaction window. Then once I get the first prediction, I can next invoke the “boltz predict” as above using different parameters on the command line. (These parameters are not part of the menus in the Boltz window that I can find.) The different runs, seeds, and models, I’m getting do give slightly different results, but they all seem to be under predicting what we know to be are quite strong interactions. I should say that we also see some differences between local and web-based AF3 and AF2 results, but they aren’t as extreme as what we’re seeing from Boltz from what we can tell.
Any thoughts? Thanks greatly in advance!
David
David S. Fay Ph.D. Professor, Department of Molecular Biology Director (PD/PI), National Institutes of Health Wyoming INBRE University of Wyoming email: davidfay@uwyo.edu
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
participants (2)
-
David S. Fay -
Tom Goddard