Dear ChimeraX developers,
Following up on discussions in
https://github.com/chaidiscovery/chai-lab/issues/52 and the large amounts of models which have been deposited in ModelArchive (MA;
https://modelarchive.org/) in the meantime, I would propose to support loading PAE (or other
local-pairwise scores) values from ModelCIF files as used by MA.
Here I link some examples which should capture most use cases:
- ma-bak-cepc-0944 includes 2
local-pairwise scores (PAE and "contact probability") and only stores scores for inter-chain residue-pairs.
- ma-dm-hisrep-003 includes a full PAE matrix.
-
In all cases in MA, entries contain a single accompanying zip file which also contains the local pairwise scores which are extracted into a separate file to keep the main ModelCIF file at a manageable size. The linked
modelcif_examples_for_chimerax.zip package includes
pae_associated_file where the
pairwise scores are directly in a separate file instead of being in a zip file and
pae_in_main_file which includes the
pairwise scores directly in the main ModelCIF file.
Already now ChimeraX is able to parse the ma_qa_metric_local category in ModelCIF to do per-residue coloring for pLDDT etc which is awesome.
To make sense of ma_qa_metric_local_pairwise one needs the following (I try to be detailed to simplify the implementation):
- Parse ma_qa_metric and keep the IDs (_ma_qa_metric.id) for rows with
ma_qa_metric.mode == "local-pairwise". If we want to restrict this to PAE matrices, one would additionally check that
_ma_qa_metric.type == "PAE".
- Look for the _ma_qa_metric_local_pairwise category. There are 3 options on where to find it:
- Directly in the main ModelCIF file (as in pae_in_main_file/003-Spombe_H3-H4_tetramer_DPB3.cif).
- In a separate file via _ma_entry_associated_files (as in pae_associated_file/003-Spombe_H3-H4_tetramer_DPB3.cif). You can find the file via
_ma_entry_associated_files.file_url for the _ma_entry_associated_files row with
file_content == "local pairwise QA scores".
- In a separate file via _ma_associated_archive_file_details (as in all MA entries such as
ma-bak-cepc-0944 and ma-dm-hisrep-003). Procedure to get it:
- Find row in _ma_associated_archive_file_details with file_content == "local pairwise QA scores" and keep
archive_file_id and file_path.
- Find zip file in _ma_entry_associated_files with id == archive_file_id and download via file_url.
- Fetch file named file_path from that zip file.
- Within the _ma_qa_metric_local_pairwise extract scores for all given residue pairs for metric_id matching the desired IDs from step 1. Each residue is identified with
model_id, label_asym_id, label_seq_id, and (redundantly)
label_comp_id. There are no guarantees on the order of the data or completeness of the data but each referenced residue is guaranteed to exist in the atom_site category.
Here is the current list of model sets in MA which include PAE matrices:
ma-bak-cepc,
ma-coffe-slac,
ma-tbvar3d,
ma-ornl-sphdiv,
ma-t3vr3,
ma-low-csi,
ma-ombbaf2,
ma-rap-bacsu,
ma-nmpfamsdb,
ma-jd-viral,
ma-dm-prc,
ma-dm-hisrep,
ma-osf-ppp2r2a.
All together MA has over 200'000 models with PAE matrices which all supplement the AFDB in some way (e.g. protein complexes, viral proteins and other proteins not covered in UniProt).
It would be great to include support for ModelArchive's ModelCIF files in ChimeraX and I am happy to provide further information and help as needed.
Regards,
Gerardo