Questions about alphafold interfaces output discrepancies

Hi! I am dabbling with the alphafold interfaces command, and I cannot make sense of the data. In my user case, the table synthesis produced by the command says: Models Confident pairs #Res1 #Res2 4 23 12 12 opening the best model produces these relevant lines in the log 200 atoms, 188 bonds, 24 residues, 1 model selected alphafold contacts last-opened & /A toAtoms last-opened & /B distance 4.0 maxPae 5.0 Found 14 residue or atom pairs within distance 4 with pae <= 5 in the csv, the same model has these relevant columns: distance, max_pae, num_res1, num_res2, num_interface_res1, 4, 5, 246, 246, 20, num_interface_res2, num_interface_res_pairs, num_confident_pairs, 20, 42, 23, interface_res_num1, interface_res_num2 12 values 12 values I am struggling a bit to reconcile all these numbers. The 24 residues selected by chimera after opening are not anywhere else. I suppose these are #Res1 + #Res2 of the synthesis (and interface_res_num1 + interface_res_num2 in the csv)? Both the log synthesis and the csv report 23 as number of confident pairs, but the log message after opening best model says 14 residues (I assume these are those within the distance/pae constraint?) Why Confident pairs is not #Res1 + #Res2? I am also a little bit puzzled by the csv where I read num_interface_res_pairs = 42 and num_interface_res1, num_interface_res2 20 ( I was expecting either 21,21 for num_interface_res or 40 for num_interface_res_pairs. I attach the prediction I was using for diagnosing the problem. cheers, -- Bruno Hay Mele, PhD 2D-20, Biology Dept., University of Naples Federico II https://github.com/bhym/ | +39 081 67 9118

Hi Bruno, First I should say that the developer of this feature is away for a few days so you may need to wait a while for a better answer. I will try to offer some thoughts but I don't understand what you are doing. Firstly, the "alphafold interfaces" command is for batch processing of multiple protein-protein dimer predictions (i.e. a bunch of different pairs of protein sequences to help figure out which proteins might really bind each other). It seems that you are only predicting the structure of one particular pair of proteins. Are you indeed trying to figure out if they actually bind to each other or not? The table you are getting may not make much sense if you are not giving it the type of data it expects (multiple different dimer predictions of different sequences in same folder). My original understanding was that #Res1 and #Res2 are not a quantity of residues (how many residues) but the actual residue number in the structure, i.e. residue 12, but I agree that doesn't really make sense either. I don't see anything in the "alphafold interfaces" help about it making any selection so I have no idea what is selecting the 24 residues. I would guess it was some other action you performed that was not part of the "alphafold interfaces" command. <https://rbvi.ucsf.edu/chimerax/docs/user/commands/alphafold.html#batch> These batch commands (link above) are meant for advanced users with typically hundreds of predictions of different complexes. For a single prediction (you are only looking a specific pair of protein chains) it may make sense to use a different command for analysis, say "alphafold contacts" <https://rbvi.ucsf.edu/chimerax/docs/user/commands/alphafold.html#contacts> As the help for "alphafold interfaces" says, it is considering as possibly real binders the dimers with at least N pairs of resides across the interface with PAE less than some value and atoms within some distance (defaults 10, 5.0, and 4.0, respectively but these are user-adjustable). So you could just use "alphafold contacts" with some PAE and distance cutoffs as you feel appropriate between the two chains and see how many lines you get for your particular dimer. If I do this with your data I get 20 pseudobonds showing the pairs that meet the criteria. From the Log: open /Users/meng/Desktop/o15305_a108e/fold_o15305_a108e_model_0.cif open /Users/meng/Desktop/o15305_a108e/fold_o15305_a108e_full_data_0.json alphafold contacts /A toAtoms /B distance 4 maxPae 5 Found 20 residue or atom pairs within distance 4 with pae <= 5 I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Resource for Biocomputing, Visualization, and Informatics Department of Pharmaceutical Chemistry University of California, San Francisco
On May 19, 2025, at 7:56 AM, Bruno Hay Mele via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi!
I am dabbling with the alphafold interfaces command, and I cannot make sense of the data.
In my user case, the table synthesis produced by the command says:
Models Confident pairs #Res1 #Res2 4 23 12 12
opening the best model produces these relevant lines in the log
200 atoms, 188 bonds, 24 residues, 1 model selected
alphafold contacts last-opened & /A toAtoms last-opened & /B distance 4.0 maxPae 5.0 Found 14 residue or atom pairs within distance 4 with pae <= 5
in the csv, the same model has these relevant columns:
distance, max_pae, num_res1, num_res2, num_interface_res1, 4, 5, 246, 246, 20,
num_interface_res2, num_interface_res_pairs, num_confident_pairs, 20, 42, 23,
interface_res_num1, interface_res_num2 12 values 12 values
I am struggling a bit to reconcile all these numbers.
The 24 residues selected by chimera after opening are not anywhere else.
I suppose these are #Res1 + #Res2 of the synthesis (and interface_res_num1 + interface_res_num2 in the csv)?
Both the log synthesis and the csv report 23 as number of confident pairs, but the log message after opening best model says 14 residues (I assume these are those within the distance/pae constraint?)
Why Confident pairs is not #Res1 + #Res2?
I am also a little bit puzzled by the csv where I read num_interface_res_pairs = 42 and num_interface_res1, num_interface_res2 20 ( I was expecting either 21,21 for num_interface_res or 40 for num_interface_res_pairs.
I attach the prediction I was using for diagnosing the problem.
cheers, -- Bruno Hay Mele, PhD 2D-20, Biology Dept., University of Naples Federico II https://github.com/bhym/ | +39 081 67 9118 <o15305_a108e.zip>_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/

Hi Bruno, Here are some explanations to help you understand the numbers. First if it says 23 confident pairs between 12 residues in one chain and 12 residues in the other chain it means just what it says. There are 23 pairs of residues which are which have atoms within 4A and PAE <= 5A. There are 12 residues in one chain that are in proximity to 12 in another. How many pairs are in proximity? You can't tell unless you inspect the structure since one residue in chain A may be near 3 residues in chain B, or near 5 residues in chain B or near 1 residue in chain B. So of course the number of confident pairs has no relation to the sum of the two sets of residues involved from the 2 chains. Interface residues are simply ones that meet the distance requirement to being within 4A of a residue in another chain. Confident residue pairs also have PAE <= 5. So of course the number of interface residues that are involved in confident pairs are a subset of the interface residues, and in many cases will be a much smaller number of residues than the total number of interface residues. The one question you raise that is a bit trickier is why alphafold interfaces and alphafold contacts report different numbers of confident pairs. There are two factors. The main one is that interfaces considers the minimum PAE from res1 to res2 and from res2 to res1, while alphafold contacts only considers pae from res1 to res2 (ie it is asymmetrical, alphafold contacts #1/A to #1/B gives different result than alphafold contacts #1/B to #1/A). The PAE values are not a symmetric matrix, the value from res1 to res2 is different than res2 to res1. If you don't understand why study the definition of PAE. So interfaces and contacts will give different results because one symmetrizes the PAE (using the minimum of the 2 values) while the other does not. A second reason why there are differences is that interfaces was developed to work with ColabFold, ie AlphaFold 2 which uses specific file names that contain parts "rank_1", "rank_2".... AlphaFold 3 doesn't have "rank" in th! e file name and so the interfaces code does not know the rank order. So when it Logs the table of results it is not necessarily giving the results for the highest rank prediction. Another pitfall of trying to use alphafold interfaces on AF3 results that it was not designed for is that AF2 only had PAE for residues, while AF3 also has PAE values for individual atoms for ligands and non-standard residues. If there are atom PAE values then the alphafold interfaces command will give completely wrong results since it was written before AF3 came out and only knows about AF2 per-residue PAE. It looks to me that the AF3 prediction you attached has only per-residue PAE so the results are meaningful and interfaces.csv contains the values for all of the 5 predictions so you can sort out the rank problem. Tom
On May 19, 2025, at 7:56 AM, Bruno Hay Mele via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi!
I am dabbling with the alphafold interfaces command, and I cannot make sense of the data.
In my user case, the table synthesis produced by the command says:
Models Confident pairs #Res1 #Res2 4 23 12 12
opening the best model produces these relevant lines in the log
200 atoms, 188 bonds, 24 residues, 1 model selected
alphafold contacts last-opened & /A toAtoms last-opened & /B distance 4.0 maxPae 5.0 Found 14 residue or atom pairs within distance 4 with pae <= 5
in the csv, the same model has these relevant columns:
distance, max_pae, num_res1, num_res2, num_interface_res1, 4, 5, 246, 246, 20,
num_interface_res2, num_interface_res_pairs, num_confident_pairs, 20, 42, 23,
interface_res_num1, interface_res_num2 12 values 12 values
I am struggling a bit to reconcile all these numbers.
The 24 residues selected by chimera after opening are not anywhere else.
I suppose these are #Res1 + #Res2 of the synthesis (and interface_res_num1 + interface_res_num2 in the csv)?
Both the log synthesis and the csv report 23 as number of confident pairs, but the log message after opening best model says 14 residues (I assume these are those within the distance/pae constraint?)
Why Confident pairs is not #Res1 + #Res2?
I am also a little bit puzzled by the csv where I read num_interface_res_pairs = 42 and num_interface_res1, num_interface_res2 20 ( I was expecting either 21,21 for num_interface_res or 40 for num_interface_res_pairs.
I attach the prediction I was using for diagnosing the problem.
cheers, -- Bruno Hay Mele, PhD 2D-20, Biology Dept., University of Naples Federico II https://github.com/bhym/ | +39 081 67 9118 <o15305_a108e.zip>_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/

Dear Tom, Dear Elaine Thanks for your thorough explanation. I am working with 42 structures, all produced by the af3 server. The one I sent was an MRE. @Tom: To sum up, 1. Those outputs summarise the interface properties in terms of contacts, and the most synthetic indicator of interface confidence/stability [1] is confident pairs. When contextualising this indicator, one should remember that since proximity can be many-to-one and confidence is based on PAE, the number of confident pairs cannot be related to the sum of residues involved from the two chains. Also, when using AF3 models, one should be sure not to have nonstandard residues and/or ligands and rely on the CSV. [1] Maybe stability is too much... P.S.: Regarding netiquette, should I also reply to your address or the mailing list? -- Bruno Hay Mele, PhD 2D-20, Biology Dept., University of Naples Federico II https://github.com/bhym/ | +39 081 67 9118

Dear Bruno and ChimeraX team, I apologize if I am addressing my request incorrectly, as this is the first time I am using this mailing list option. I would like to follow up on the functionality of the alphafold interface command: I ran the command with default parameters (distance <=4, PAE <=5), got the following results in the log *4 of 478 dimers have 10 or more confident residue interactionsspanning <= 4 Angstroms with predicted aligned error <= 5 Angstroms.* they were recorded in the csv file as expected, with the headings of *distance* * max_pae* *4* *5* I run the command the second time with more relaxed parameters (or so I though) of the distance <=5 and PAE<=6, here is the log record of the command: *alphafold interfaces E:\2BC_and_LDs\AF_predictions\PDB_structures_true\Extracted_ALL distance 5 maxPae 6 resultsFile Interfaces_5_6.csv,* And got the following result record in the log: *4 of 478 dimers have 10 or more confident residue interactionsspanning <= 5.0 Angstroms with predicted aligned error <= 6.0 Angstroms.* I.e. it seems that the command worked with the new parameters, but in fact the csv file again shows the headings of *distance* * max_pae* *4* *5* and the results in the new file are identical to the previous run with the default parameters. I should mention that I removed the first csv result file from the directory before running the command the second time with new parameters. Am I doing something wrong with providing the alphafold interface with new parameters? Thank you. George On Tue, May 20, 2025 at 4:08 AM Bruno Hay Mele via ChimeraX-users < chimerax-users@cgl.ucsf.edu> wrote:
Dear Tom, Dear Elaine
Thanks for your thorough explanation.
I am working with 42 structures, all produced by the af3 server. The one I sent was an MRE.
@Tom: To sum up, 1. Those outputs summarise the interface properties in terms of contacts, and the most synthetic indicator of interface confidence/stability [1] is confident pairs. When contextualising this indicator, one should remember that since proximity can be many-to-one and confidence is based on PAE, the number of confident pairs cannot be related to the sum of residues involved from the two chains. Also, when using AF3 models, one should be sure not to have nonstandard residues and/or ligands and rely on the CSV.
[1] Maybe stability is too much...
P.S.: Regarding netiquette, should I also reply to your address or the mailing list?
-- Bruno Hay Mele, PhD 2D-20, Biology Dept., University of Naples Federico II https://github.com/bhym/ | +39 081 67 9118 _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
-- *----------------------------------------* *George A. Belov, Ph.D.*Professor Department of Veterinary Medicine University of Maryland Va-Md College of Veterinary Medicine 8075 Greenmead Dr., room 1215 College Park, MD 20742 Phone: 301-314-1259

Hi George, Thanks for catching this bug in the alphafold interfaces command. The code failed to pass the specified distance and max_pae parameters down to the key routine that computes the dimer confidences so that routine just used the default values. I've fixed it in the ChimeraX daily build and 1.10 release candidates that are made every night. Versions dated May 21, 2025 on the download page will have this fix. https://www.rbvi.ucsf.edu/chimerax/download.html Tom
On May 20, 2025, at 1:44 PM, George Belov via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear Bruno and ChimeraX team, I apologize if I am addressing my request incorrectly, as this is the first time I am using this mailing list option. I would like to follow up on the functionality of the alphafold interface command: I ran the command with default parameters (distance <=4, PAE <=5), got the following results in the log
4 of 478 dimers have 10 or more confident residue interactions spanning <= 4 Angstroms with predicted aligned error <= 5 Angstroms.
they were recorded in the csv file as expected, with the headings of
distance max_pae 4 5
I run the command the second time with more relaxed parameters (or so I though) of the distance <=5 and PAE<=6, here is the log record of the command:
alphafold interfaces E:\2BC_and_LDs\AF_predictions\PDB_structures_true\Extracted_ALL distance 5 maxPae 6 resultsFile Interfaces_5_6.csv,
And got the following result record in the log:
4 of 478 dimers have 10 or more confident residue interactions spanning <= 5.0 Angstroms with predicted aligned error <= 6.0 Angstroms.
I.e. it seems that the command worked with the new parameters, but in fact the csv file again shows the headings of distance max_pae 4 5
and the results in the new file are identical to the previous run with the default parameters. I should mention that I removed the first csv result file from the directory before running the command the second time with new parameters.
Am I doing something wrong with providing the alphafold interface with new parameters? Thank you. George
On Tue, May 20, 2025 at 4:08 AM Bruno Hay Mele via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> wrote:
Dear Tom, Dear Elaine
Thanks for your thorough explanation.
I am working with 42 structures, all produced by the af3 server. The one I sent was an MRE.
@Tom: To sum up, 1. Those outputs summarise the interface properties in terms of contacts, and the most synthetic indicator of interface confidence/stability [1] is confident pairs. When contextualising this indicator, one should remember that since proximity can be many-to-one and confidence is based on PAE, the number of confident pairs cannot be related to the sum of residues involved from the two chains. Also, when using AF3 models, one should be sure not to have nonstandard residues and/or ligands and rely on the CSV.
[1] Maybe stability is too much...
P.S.: Regarding netiquette, should I also reply to your address or the mailing list?
-- Bruno Hay Mele, PhD 2D-20, Biology Dept., University of Naples Federico II https://github.com/bhym/ | +39 081 67 9118 _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
-- ---------------------------------------- George A. Belov, Ph.D. Professor Department of Veterinary Medicine University of Maryland Va-Md College of Veterinary Medicine 8075 Greenmead Dr., room 1215 College Park, MD 20742 Phone: 301-314-1259 _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
participants (4)
-
Bruno Hay Mele
-
Elaine Meng
-
George Belov
-
Tom Goddard