Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
Dear ChimeraX developers, Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA. Here I link some examples which should capture most use cases: 1. ma-bak-cepc-0944<https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0944> includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs. 2. ma-dm-hisrep-003<https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep-003> includes a full PAE matrix. 3. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip<https://sibcloud-my.sharepoint.com/:u:/g/personal/gerardo_tauriello_sib_swiss/EejVfLC-osVEqwt1yu8S2B8BK_8dX1osrfEDWeWOilhkDQ?e=JqYgIO> package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file. Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome. To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation): 1. Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that _ma_qa_metric.type == "PAE". 2. Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it: * Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif). * In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores". * In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it: * Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path. * Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url. * Fetch file named file_path from that zip file. 3. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category. Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc<https://modelarchive.org/doi/10.5452/ma-bak-cepc>, ma-coffe-slac<https://modelarchive.org/doi/10.5452/ma-coffe-slac>, ma-tbvar3d<https://modelarchive.org/doi/10.5452/ma-tbvar3d>, ma-ornl-sphdiv<https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv>, ma-t3vr3<https://modelarchive.org/doi/10.5452/ma-t3vr3>, ma-low-csi<https://modelarchive.org/doi/10.5452/ma-low-csi>, ma-ombbaf2<https://modelarchive.org/doi/10.5452/ma-ombbaf2>, ma-rap-bacsu<https://modelarchive.org/doi/10.5452/ma-rap-bacsu>, ma-nmpfamsdb<https://modelarchive.org/doi/10.5452/ma-nmpfamsdb>, ma-jd-viral<https://modelarchive.org/doi/10.5452/ma-jd-viral>, ma-dm-prc<https://modelarchive.org/doi/10.5452/ma-dm-prc>, ma-dm-hisrep<https://modelarchive.org/doi/10.5452/ma-dm-hisrep>, ma-osf-ppp2r2a<https://modelarchive.org/doi/10.5452/ma-osf-ppp2r2a>. All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt). It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed. Regards, Gerardo
Hi Gerardo, I made an example ChimeraX command "modelcif pae" that reads pairwise scores included directly in the ModelCIF structure file and shows them as an AlphaFold PAE plot. Here's the code and an example using it https://rbvi.github.io/chimerax-recipes/modelcif_pae/modelcif_pae.html To handle all the cases you describe well would take more work, for example, where the scores are in separate files and need to be fetched. And the user should be provided a list of scores if more than one is available. And scores where you might want to show high values (e.g. probability) instead of low values won't work well with the ChimeraX "alphafold contacts" command which shows interactions between domains but is designed for predicted aligned error (PAE) values and filters to only show low values. And how to handle sparse values (where every residue pair does not have a score) needs some more thought. For that I made a defaultScore option so the full matrix of all-by-all residues is first filled with the default score. Properly handling all the cases of more general scores than AlphaFold PAE is a good bit of work and would likely be used far less than AlphaFold PAE, so I'm not sure I can justify the effort to support all those capabilities. But maybe this start I made will encourage someone else to provide a more complete extension to ChimeraX. Tom 
On Sep 26, 2024, at 11:42 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX developers,
Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA.
Here I link some examples which should capture most use cases: ma-bak-cepc-0944 <https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0944> includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs. ma-dm-hisrep-003 <https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep-003> includes a full PAE matrix. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip <https://sibcloud-my.sharepoint.com/:u:/g/personal/gerardo_tauriello_sib_swiss/EejVfLC-osVEqwt1yu8S2B8BK_8dX1osrfEDWeWOilhkDQ?e=JqYgIO> package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file.
Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome. To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation): Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id <http://ma_qa_metric.id/>) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that_ma_qa_metric.type == "PAE". Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it: Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif). In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores". In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it: Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path. Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url. Fetch file named file_path from that zip file. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category.
Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc <https://modelarchive.org/doi/10.5452/ma-bak-cepc>, ma-coffe-slac <https://modelarchive.org/doi/10.5452/ma-coffe-slac>, ma-tbvar3d <https://modelarchive.org/doi/10.5452/ma-tbvar3d>, ma-ornl-sphdiv <https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv>, ma-t3vr3 <https://modelarchive.org/doi/10.5452/ma-t3vr3>, ma-low-csi <https://modelarchive.org/doi/10.5452/ma-low-csi>, ma-ombbaf2 <https://modelarchive.org/doi/10.5452/ma-ombbaf2>, ma-rap-bacsu <https://modelarchive.org/doi/10.5452/ma-rap-bacsu>, ma-nmpfamsdb <https://modelarchive.org/doi/10.5452/ma-nmpfamsdb>, ma-jd-viral <https://modelarchive.org/doi/10.5452/ma-jd-viral>, ma-dm-prc <https://modelarchive.org/doi/10.5452/ma-dm-prc>, ma-dm-hisrep <https://modelarchive.org/doi/10.5452/ma-dm-hisrep>, ma-osf-ppp2r2a <https://modelarchive.org/doi/10.5452/ma-osf-ppp2r2a>. All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt).
It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed.
Regards, Gerardo
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Dear Tom, First of all: thank you very much for that recipe. Seems to work great. As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following: 1. Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric 2. Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA) 3. An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944" One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left. I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case. In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉 Regards, Gerardo ________________________________ From: Tom Goddard <goddard@sonic.net> Sent: Wednesday, October 2, 2024 05:20 To: Gerardo Tauriello Cc: chimerax-users@cgl.ucsf.edu Subject: [EXTERN] Re: [chimerax-users] Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive Hi Gerardo, I made an example ChimeraX command "modelcif pae" that reads pairwise scores included directly in the ModelCIF structure file and shows them as an AlphaFold PAE plot. Here's the code and an example using it https://rbvi.github.io/chimerax-recipes/modelcif_pae/modelcif_pae.html To handle all the cases you describe well would take more work, for example, where the scores are in separate files and need to be fetched. And the user should be provided a list of scores if more than one is available. And scores where you might want to show high values (e.g. probability) instead of low values won't work well with the ChimeraX "alphafold contacts" command which shows interactions between domains but is designed for predicted aligned error (PAE) values and filters to only show low values. And how to handle sparse values (where every residue pair does not have a score) needs some more thought. For that I made a defaultScore option so the full matrix of all-by-all residues is first filled with the default score. Properly handling all the cases of more general scores than AlphaFold PAE is a good bit of work and would likely be used far less than AlphaFold PAE, so I'm not sure I can justify the effort to support all those capabilities. But maybe this start I made will encourage someone else to provide a more complete extension to ChimeraX. Tom [pombe_h3_h4_dpb3.png][pombe_h3_h4_dpb3_pae.png] On Sep 26, 2024, at 11:42 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote: Dear ChimeraX developers, Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA. Here I link some examples which should capture most use cases: 1. ma-bak-cepc-0944<https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0944> includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs. 2. ma-dm-hisrep-003<https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep-003> includes a full PAE matrix. 3. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip<https://sibcloud-my.sharepoint.com/:u:/g/personal/gerardo_tauriello_sib_swiss/EejVfLC-osVEqwt1yu8S2B8BK_8dX1osrfEDWeWOilhkDQ?e=JqYgIO> package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file. Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome. To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation): 1. Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id<http://ma_qa_metric.id/>) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that_ma_qa_metric.type == "PAE". 2. Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it: * Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif). * In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores". * In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it: * Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path. * Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url. * Fetch file named file_path from that zip file. 3. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category. Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc<https://modelarchive.org/doi/10.5452/ma-bak-cepc>, ma-coffe-slac<https://modelarchive.org/doi/10.5452/ma-coffe-slac>, ma-tbvar3d<https://modelarchive.org/doi/10.5452/ma-tbvar3d>, ma-ornl-sphdiv<https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv>, ma-t3vr3<https://modelarchive.org/doi/10.5452/ma-t3vr3>, ma-low-csi<https://modelarchive.org/doi/10.5452/ma-low-csi>, ma-ombbaf2<https://modelarchive.org/doi/10.5452/ma-ombbaf2>, ma-rap-bacsu<https://modelarchive.org/doi/10.5452/ma-rap-bacsu>, ma-nmpfamsdb<https://modelarchive.org/doi/10.5452/ma-nmpfamsdb>, ma-jd-viral<https://modelarchive.org/doi/10.5452/ma-jd-viral>, ma-dm-prc<https://modelarchive.org/doi/10.5452/ma-dm-prc>, ma-dm-hisrep<https://modelarchive.org/doi/10.5452/ma-dm-hisrep>, ma-osf-ppp2r2a<https://modelarchive.org/doi/10.5452/ma-osf-ppp2r2a>. All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt). It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed. Regards, Gerardo _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu<mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu<mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
On Oct 8, 2024, at 3:45 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
3. An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"
Just wanted to say that it is typically better to integrate into the generic "open" command than to have your own special-purpose open command. That way, your fetch format will show up in the "open formats" command output, and your fetcher can be listed in the Fetch By ID dialog. Integrating a new network fetch into the open command is described here: Bundle Example- Fetch from Network Database — ChimeraX 1.9 documentation <https://www.cgl.ucsf.edu/chimerax/docs/devel/tutorials/tutorial_fetch.html> --Eric Eric Pettersen UCSF Computer Graphics Lab
Hi Gerardo, Thanks for adding the Model Archive fetch and improved PAE fetch that can get scores from associated files. This all looks useful enough that I think it should go into the ChimeraX distribution. So I added it to the daily build (October 8, 2024 and newer). Let me know if you don't want your code added to ChimeraX. I made fetching and opening the Model Archive files use the standard syntax open ma-bak-cepc-0944 from modelarchive and it doesn't automatically get the PAE, but I added an option to do that open ma-bak-cepc-0944 from modelarchive pae true To handle scores that are not PAE such as probability in the range of 0-1 I added the palette and range options to the "modelcif pae" command, for example, modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1 For just showing PAE scores after the Model Archive file has been opened you can just use modelcif pae #1 More details are in the ChimeraX ticket https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/16088 Tom
On Oct 8, 2024, at 3:45 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear Tom,
First of all: thank you very much for that recipe. Seems to work great.
As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following: Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA) An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"
One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left. I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case.
In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉
Regards, Gerardo
From: Tom Goddard <goddard@sonic.net> Sent: Wednesday, October 2, 2024 05:20 To: Gerardo Tauriello Cc: chimerax-users@cgl.ucsf.edu Subject: [EXTERN] Re: [chimerax-users] Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
Hi Gerardo,
I made an example ChimeraX command "modelcif pae" that reads pairwise scores included directly in the ModelCIF structure file and shows them as an AlphaFold PAE plot. Here's the code and an example using it
https://rbvi.github.io/chimerax-recipes/modelcif_pae/modelcif_pae.html
To handle all the cases you describe well would take more work, for example, where the scores are in separate files and need to be fetched. And the user should be provided a list of scores if more than one is available. And scores where you might want to show high values (e.g. probability) instead of low values won't work well with the ChimeraX "alphafold contacts" command which shows interactions between domains but is designed for predicted aligned error (PAE) values and filters to only show low values. And how to handle sparse values (where every residue pair does not have a score) needs some more thought. For that I made a defaultScore option so the full matrix of all-by-all residues is first filled with the default score.
Properly handling all the cases of more general scores than AlphaFold PAE is a good bit of work and would likely be used far less than AlphaFold PAE, so I'm not sure I can justify the effort to support all those capabilities. But maybe this start I made will encourage someone else to provide a more complete extension to ChimeraX.
Tom
<pombe_h3_h4_dpb3.png><pombe_h3_h4_dpb3_pae.png>
On Sep 26, 2024, at 11:42 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX developers,
Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA.
Here I link some examples which should capture most use cases: ma-bak-cepc-0944 <https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0944> includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs. ma-dm-hisrep-003 <https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep-003> includes a full PAE matrix. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip <https://sibcloud-my.sharepoint.com/:u:/g/personal/gerardo_tauriello_sib_swiss/EejVfLC-osVEqwt1yu8S2B8BK_8dX1osrfEDWeWOilhkDQ?e=JqYgIO> package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file.
Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome. To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation): Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id <http://ma_qa_metric.id/>) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that_ma_qa_metric.type == "PAE". Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it: Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif). In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores". In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it: Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path. Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url. Fetch file named file_path from that zip file. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category.
Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc <https://modelarchive.org/doi/10.5452/ma-bak-cepc>, ma-coffe-slac <https://modelarchive.org/doi/10.5452/ma-coffe-slac>, ma-tbvar3d <https://modelarchive.org/doi/10.5452/ma-tbvar3d>, ma-ornl-sphdiv <https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv>, ma-t3vr3 <https://modelarchive.org/doi/10.5452/ma-t3vr3>, ma-low-csi <https://modelarchive.org/doi/10.5452/ma-low-csi>, ma-ombbaf2 <https://modelarchive.org/doi/10.5452/ma-ombbaf2>, ma-rap-bacsu <https://modelarchive.org/doi/10.5452/ma-rap-bacsu>, ma-nmpfamsdb <https://modelarchive.org/doi/10.5452/ma-nmpfamsdb>, ma-jd-viral <https://modelarchive.org/doi/10.5452/ma-jd-viral>, ma-dm-prc <https://modelarchive.org/doi/10.5452/ma-dm-prc>, ma-dm-hisrep <https://modelarchive.org/doi/10.5452/ma-dm-hisrep>, ma-osf-ppp2r2a <https://modelarchive.org/doi/10.5452/ma-osf-ppp2r2a>. All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt).
It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed.
Regards, Gerardo
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
<modelcif_pae.py.zip>_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Hi Gerardo, And thanks for spotting the PAE plot mouse hover was reporting the wrong PAE score using the swapped the matrix row and column. I fixed that. Tom
On Oct 8, 2024, at 6:13 PM, Tom Goddard <goddard@sonic.net> wrote:
Hi Gerardo,
Thanks for adding the Model Archive fetch and improved PAE fetch that can get scores from associated files. This all looks useful enough that I think it should go into the ChimeraX distribution. So I added it to the daily build (October 8, 2024 and newer). Let me know if you don't want your code added to ChimeraX. I made fetching and opening the Model Archive files use the standard syntax
open ma-bak-cepc-0944 from modelarchive
and it doesn't automatically get the PAE, but I added an option to do that
open ma-bak-cepc-0944 from modelarchive pae true
To handle scores that are not PAE such as probability in the range of 0-1 I added the palette and range options to the "modelcif pae" command, for example,
modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1
For just showing PAE scores after the Model Archive file has been opened you can just use
modelcif pae #1
More details are in the ChimeraX ticket
https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/16088
Tom
On Oct 8, 2024, at 3:45 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear Tom,
First of all: thank you very much for that recipe. Seems to work great.
As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following: Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA) An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"
One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left. I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case.
In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉
Regards, Gerardo
From: Tom Goddard <goddard@sonic.net> Sent: Wednesday, October 2, 2024 05:20 To: Gerardo Tauriello Cc: chimerax-users@cgl.ucsf.edu Subject: [EXTERN] Re: [chimerax-users] Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
Hi Gerardo,
I made an example ChimeraX command "modelcif pae" that reads pairwise scores included directly in the ModelCIF structure file and shows them as an AlphaFold PAE plot. Here's the code and an example using it
https://rbvi.github.io/chimerax-recipes/modelcif_pae/modelcif_pae.html
To handle all the cases you describe well would take more work, for example, where the scores are in separate files and need to be fetched. And the user should be provided a list of scores if more than one is available. And scores where you might want to show high values (e.g. probability) instead of low values won't work well with the ChimeraX "alphafold contacts" command which shows interactions between domains but is designed for predicted aligned error (PAE) values and filters to only show low values. And how to handle sparse values (where every residue pair does not have a score) needs some more thought. For that I made a defaultScore option so the full matrix of all-by-all residues is first filled with the default score.
Properly handling all the cases of more general scores than AlphaFold PAE is a good bit of work and would likely be used far less than AlphaFold PAE, so I'm not sure I can justify the effort to support all those capabilities. But maybe this start I made will encourage someone else to provide a more complete extension to ChimeraX.
Tom
<pombe_h3_h4_dpb3.png><pombe_h3_h4_dpb3_pae.png>
On Sep 26, 2024, at 11:42 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX developers,
Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA.
Here I link some examples which should capture most use cases: ma-bak-cepc-0944 <https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0944> includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs. ma-dm-hisrep-003 <https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep-003> includes a full PAE matrix. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip <https://sibcloud-my.sharepoint.com/:u:/g/personal/gerardo_tauriello_sib_swiss/EejVfLC-osVEqwt1yu8S2B8BK_8dX1osrfEDWeWOilhkDQ?e=JqYgIO> package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file.
Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome. To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation): Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id <http://ma_qa_metric.id/>) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that_ma_qa_metric.type == "PAE". Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it: Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif). In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores". In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it: Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path. Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url. Fetch file named file_path from that zip file. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category.
Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc <https://modelarchive.org/doi/10.5452/ma-bak-cepc>, ma-coffe-slac <https://modelarchive.org/doi/10.5452/ma-coffe-slac>, ma-tbvar3d <https://modelarchive.org/doi/10.5452/ma-tbvar3d>, ma-ornl-sphdiv <https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv>, ma-t3vr3 <https://modelarchive.org/doi/10.5452/ma-t3vr3>, ma-low-csi <https://modelarchive.org/doi/10.5452/ma-low-csi>, ma-ombbaf2 <https://modelarchive.org/doi/10.5452/ma-ombbaf2>, ma-rap-bacsu <https://modelarchive.org/doi/10.5452/ma-rap-bacsu>, ma-nmpfamsdb <https://modelarchive.org/doi/10.5452/ma-nmpfamsdb>, ma-jd-viral <https://modelarchive.org/doi/10.5452/ma-jd-viral>, ma-dm-prc <https://modelarchive.org/doi/10.5452/ma-dm-prc>, ma-dm-hisrep <https://modelarchive.org/doi/10.5452/ma-dm-hisrep>, ma-osf-ppp2r2a <https://modelarchive.org/doi/10.5452/ma-osf-ppp2r2a>. All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt).
It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed.
Regards, Gerardo
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
<modelcif_pae.py.zip>_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Hi Gerardo, I decided to replace the metricId option (an integer) with metricName (e.g. "PAE" or "contact probability") since the user likely would not know which integer id to use. Also in my previous example I gave the command option names with underscore characters, but usually the options names instead use camel case, for example, modelcif pae #1 metricName "contact probability" defaultScore 0 palette rainbow range 0,1 Tom
On Oct 8, 2024, at 6:13 PM, Tom Goddard <goddard@sonic.net> wrote:
Hi Gerardo,
Thanks for adding the Model Archive fetch and improved PAE fetch that can get scores from associated files. This all looks useful enough that I think it should go into the ChimeraX distribution. So I added it to the daily build (October 8, 2024 and newer). Let me know if you don't want your code added to ChimeraX. I made fetching and opening the Model Archive files use the standard syntax
open ma-bak-cepc-0944 from modelarchive
and it doesn't automatically get the PAE, but I added an option to do that
open ma-bak-cepc-0944 from modelarchive pae true
To handle scores that are not PAE such as probability in the range of 0-1 I added the palette and range options to the "modelcif pae" command, for example,
modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1
For just showing PAE scores after the Model Archive file has been opened you can just use
modelcif pae #1
More details are in the ChimeraX ticket
https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/16088
Tom
On Oct 8, 2024, at 3:45 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear Tom,
First of all: thank you very much for that recipe. Seems to work great.
As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following: Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA) An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"
One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left. I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case.
In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉
Regards, Gerardo
From: Tom Goddard <goddard@sonic.net> Sent: Wednesday, October 2, 2024 05:20 To: Gerardo Tauriello Cc: chimerax-users@cgl.ucsf.edu Subject: [EXTERN] Re: [chimerax-users] Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
Hi Gerardo,
I made an example ChimeraX command "modelcif pae" that reads pairwise scores included directly in the ModelCIF structure file and shows them as an AlphaFold PAE plot. Here's the code and an example using it
https://rbvi.github.io/chimerax-recipes/modelcif_pae/modelcif_pae.html
To handle all the cases you describe well would take more work, for example, where the scores are in separate files and need to be fetched. And the user should be provided a list of scores if more than one is available. And scores where you might want to show high values (e.g. probability) instead of low values won't work well with the ChimeraX "alphafold contacts" command which shows interactions between domains but is designed for predicted aligned error (PAE) values and filters to only show low values. And how to handle sparse values (where every residue pair does not have a score) needs some more thought. For that I made a defaultScore option so the full matrix of all-by-all residues is first filled with the default score.
Properly handling all the cases of more general scores than AlphaFold PAE is a good bit of work and would likely be used far less than AlphaFold PAE, so I'm not sure I can justify the effort to support all those capabilities. But maybe this start I made will encourage someone else to provide a more complete extension to ChimeraX.
Tom
<pombe_h3_h4_dpb3.png><pombe_h3_h4_dpb3_pae.png>
On Sep 26, 2024, at 11:42 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Dear ChimeraX developers,
Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA.
Here I link some examples which should capture most use cases: ma-bak-cepc-0944 <https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0944> includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs. ma-dm-hisrep-003 <https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep-003> includes a full PAE matrix. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip <https://sibcloud-my.sharepoint.com/:u:/g/personal/gerardo_tauriello_sib_swiss/EejVfLC-osVEqwt1yu8S2B8BK_8dX1osrfEDWeWOilhkDQ?e=JqYgIO> package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file.
Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome. To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation): Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id <http://ma_qa_metric.id/>) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that_ma_qa_metric.type == "PAE". Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it: Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif). In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores". In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it: Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path. Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url. Fetch file named file_path from that zip file. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category.
Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc <https://modelarchive.org/doi/10.5452/ma-bak-cepc>, ma-coffe-slac <https://modelarchive.org/doi/10.5452/ma-coffe-slac>, ma-tbvar3d <https://modelarchive.org/doi/10.5452/ma-tbvar3d>, ma-ornl-sphdiv <https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv>, ma-t3vr3 <https://modelarchive.org/doi/10.5452/ma-t3vr3>, ma-low-csi <https://modelarchive.org/doi/10.5452/ma-low-csi>, ma-ombbaf2 <https://modelarchive.org/doi/10.5452/ma-ombbaf2>, ma-rap-bacsu <https://modelarchive.org/doi/10.5452/ma-rap-bacsu>, ma-nmpfamsdb <https://modelarchive.org/doi/10.5452/ma-nmpfamsdb>, ma-jd-viral <https://modelarchive.org/doi/10.5452/ma-jd-viral>, ma-dm-prc <https://modelarchive.org/doi/10.5452/ma-dm-prc>, ma-dm-hisrep <https://modelarchive.org/doi/10.5452/ma-dm-hisrep>, ma-osf-ppp2r2a <https://modelarchive.org/doi/10.5452/ma-osf-ppp2r2a>. All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt).
It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed.
Regards, Gerardo
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
<modelcif_pae.py.zip>_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Dear Tom, Thanks to you and your team for including this in the ChimeraX distribution. I am happy to have it in there and your changes all look very reasonable to me. Regards, Gerardo ________________________________ From: Tom Goddard <goddard@sonic.net> Sent: Thursday, October 10, 2024 03:54 To: Gerardo Tauriello Cc: chimerax-users@cgl.ucsf.edu Subject: Re: [chimerax-users] [EXTERN] Re: Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive Hi Gerardo, I decided to replace the metricId option (an integer) with metricName (e.g. "PAE" or "contact probability") since the user likely would not know which integer id to use. Also in my previous example I gave the command option names with underscore characters, but usually the options names instead use camel case, for example, modelcif pae #1 metricName "contact probability" defaultScore 0 palette rainbow range 0,1 Tom On Oct 8, 2024, at 6:13 PM, Tom Goddard <goddard@sonic.net> wrote: Hi Gerardo, Thanks for adding the Model Archive fetch and improved PAE fetch that can get scores from associated files. This all looks useful enough that I think it should go into the ChimeraX distribution. So I added it to the daily build (October 8, 2024 and newer). Let me know if you don't want your code added to ChimeraX. I made fetching and opening the Model Archive files use the standard syntax open ma-bak-cepc-0944 from modelarchive and it doesn't automatically get the PAE, but I added an option to do that open ma-bak-cepc-0944 from modelarchive pae true To handle scores that are not PAE such as probability in the range of 0-1 I added the palette and range options to the "modelcif pae" command, for example, modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1 For just showing PAE scores after the Model Archive file has been opened you can just use modelcif pae #1 More details are in the ChimeraX ticket https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/16088 Tom
Dear Tom, Checking out the daily build, I noticed that files without local-pairwise values do not have a clean error handling anymore. E.g. "open ma-asfv-asfvg-001 from modelarchive pae true" is an example of a file without local-pairwise values and fails at "metric_id = met_ids[0] if met_ids else values[0][5]" because met_ids is empty in that case... Also there is a caveat for files like ma-tbvar3d-15 which include a non-polymer. There I think that "alphafold pae" expects per-atom PAE values for the non-polymer and so we end up with a PAE JSON file with a wrongly sized matrix. That one is an edge case within ModelArchive and so it can be considered low priority until we properly handle AF3 PAE matrices in ModelCIF. Regards, Gerardo
Thanks Gerardo! I have fixed the error reporting when no ModelCIF pairwise scores are found. You are right that the ChimeraX AlphaFold PAE plotting requires the pairwise score matrix to have rows for each non-polymer atoms. So ChimeraX won't currently plot pairwise scores for Model Archive structures with non-polymer residues. This could be fixed. But as you note the most important case to fix will be for AlphaFold 3 PAE values and currently ModelCIF does not have a table that can handle the mix of per-residue and per-atom PAE values that AlphaFold 3 reports. So let's wait until ModelCIF figures out how to handle AlphaFold 3 PAE scores then we can fix ChimeraX reading those from ModelCIF. I made a ChimeraX ticket for this issue https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/16120 Tom
On Oct 11, 2024, at 9:30 AM, Gerardo Tauriello <gerardo.tauriello@unibas.ch> wrote:
Dear Tom,
Checking out the daily build, I noticed that files without local-pairwise values do not have a clean error handling anymore. E.g. "open ma-asfv-asfvg-001 from modelarchive pae true" is an example of a file without local-pairwise values and fails at "metric_id = met_ids[0] if met_ids else values[0][5]" because met_ids is empty in that case...
Also there is a caveat for files like ma-tbvar3d-15 which include a non-polymer. There I think that "alphafold pae" expects per-atom PAE values for the non-polymer and so we end up with a PAE JSON file with a wrongly sized matrix. That one is an edge case within ModelArchive and so it can be considered low priority until we properly handle AF3 PAE matrices in ModelCIF.
Regards, Gerardo
participants (3)
-
Eric Pettersen
-
Gerardo Tauriello
-
Tom Goddard