Hi Gerardo,

  I decided to replace the metricId option (an integer) with metricName (e.g. "PAE" or "contact probability") since the user likely would not know which integer id to use.  Also in my previous example I gave the command option names with underscore characters, but usually the options names instead use camel case, for example,

modelcif pae #1 metricName "contact probability" defaultScore 0 palette rainbow range 0,1

  Tom


On Oct 8, 2024, at 6:13 PM, Tom Goddard <goddard@sonic.net> wrote:

Hi Gerardo,

  Thanks for adding the Model Archive fetch and improved PAE fetch that can get scores from associated files.  This all looks useful enough that I think it should go into the ChimeraX distribution.  So I added it to the daily build (October 8, 2024 and newer).  Let me know if you don't want your code added to ChimeraX.  I made fetching and opening the Model Archive files use the standard syntax

open ma-bak-cepc-0944 from modelarchive

and it doesn't automatically get the PAE, but I added an option to do that

open ma-bak-cepc-0944 from modelarchive pae true

To handle scores that are not PAE such as probability in the range of 0-1 I added the palette and range options to the "modelcif pae" command, for example,

modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1

For just showing PAE scores after the Model Archive file has been opened you can just use

modelcif pae #1

More details are in the ChimeraX ticket


  Tom

On Oct 8, 2024, at 3:45 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:

Dear Tom,

First of all: thank you very much for that recipe. Seems to work great.

As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following:
  1. Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric
  2. Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA)
  3. An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"

One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left.
I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case.

In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉

Regards,
Gerardo



From: Tom Goddard <goddard@sonic.net>
Sent: Wednesday, October 2, 2024 05:20
To: Gerardo Tauriello
Cc: chimerax-users@cgl.ucsf.edu
Subject: [EXTERN] Re: [chimerax-users] Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
 
Hi Gerardo,

  I made an example ChimeraX command "modelcif pae" that reads pairwise scores included directly in the ModelCIF structure file and shows them as an AlphaFold PAE plot.  Here's the code and an example using it


To handle all the cases you describe well would take more work, for example, where the scores are in separate files and need to be fetched.  And the user should be provided a list of scores if more than one is available.  And scores where you might want to show high values (e.g. probability) instead of low values won't work well with the ChimeraX "alphafold contacts" command which shows interactions between domains but is designed for predicted aligned error (PAE) values and filters to only show low values.  And how to handle sparse values (where every residue pair does not have a score) needs some more thought.  For that I made a defaultScore option so the full matrix of all-by-all residues is first filled with the default score.

  Properly handling all the cases of more general scores than AlphaFold PAE is a good bit of work and would likely be used far less than AlphaFold PAE, so I'm not sure I can justify the effort to support all those capabilities.  But maybe this start I made will encourage someone else to provide a more complete extension to ChimeraX.

Tom

<pombe_h3_h4_dpb3.png><pombe_h3_h4_dpb3_pae.png>

On Sep 26, 2024, at 11:42 AM, Gerardo Tauriello via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:

Dear ChimeraX developers,

Following up on discussions in https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA; https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other local-pairwise scores) values from ModelCIF files as used by MA.

Here I link some examples which should capture most use cases:
  1. ma-bak-cepc-0944 includes 2 local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs.
  2. ma-dm-hisrep-003 includes a full PAE matrix.
  3. In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked modelcif_examples_for_chimerax.zip package includes pae_associated_file where the pairwise scores are directly in a separate file instead of being in a zip file and pae_in_main_file which includes the pairwise scores directly in the main ModelCIF file.

Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome.
To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation):
  1. Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id) for rows with ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that_ma_qa_metric.type == "PAE".
  2. Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it:
    1. Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif).
    2. In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via _ma_entry_associated_files.file_url for the _ma_entry_associated_files row with file_content == "local pairwise QA scores".
    3. In a separate file via _ma_associated_archive_file_details (as in all MA entries such as ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it:
      1. Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep archive_file_id and file_path.
      2. Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url.
      3. Fetch file named file_path from that zip file.
  3. Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with model_id, label_asym_id, label_seq_id, and (redundantly) label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category.

Here is the current list of model sets in MA which include PAE matrices: ma-bak-cepc, ma-coffe-slac, ma-tbvar3d, ma-ornl-sphdiv, ma-t3vr3, ma-low-csi, ma-ombbaf2, ma-rap-bacsu, ma-nmpfamsdb, ma-jd-viral, ma-dm-prc, ma-dm-hisrep, ma-osf-ppp2r2a.
All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt).

It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed.

Regards,
Gerardo

_______________________________________________
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu
To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu
Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/

<modelcif_pae.py.zip>_______________________________________________
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu
To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu
Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/