
Hi Ben, It would be nice to show templates and sequence alignments used for predicted models from AlphaFold and Modeller. We could output an html table in the log that lists the templates with a link to show the sequence alignment and a link to load and align the template if it is from the PDB. The AlphaFold models in the EBI AlphaFold database don't appear to say what template structures were used, for instance, I looked at AF-P12004-F1-model_v1.cif https://alphafold.ebi.ac.uk/entry/P12004 <https://alphafold.ebi.ac.uk/entry/P12004> I believe AlphaFold2 finds the 20 best matching structures in the PDB and uses 4 (not sure how they are selected). I've run AlphaFold many times and the log output says what the 20 matches are but does not appear to say which 4 structures it actually used -- pretty unfortunate. The AlphaFold per-residue confidence scores are in an mmCIF table _ma_qa_metric_local: # loop_ _ma_qa_metric_local.label_asym_id _ma_qa_metric_local.label_comp_id _ma_qa_metric_local.label_seq_id _ma_qa_metric_local.metric_id _ma_qa_metric_local.metric_value _ma_qa_metric_local.model_id _ma_qa_metric_local.ordinal_id A MET 1 2 91.95 1 1 A PHE 2 2 96.89 1 1 A GLU 3 2 98.01 1 1 A ALA 4 2 98.08 1 1 A ARG 5 2 97.76 1 1 A LEU 6 2 96.16 1 1 .. Currently ChimeraX colors AlphaFold models by confidence using the same scores taken from the bfactor column of the atom site table. The Model Archive example you gave as an example (https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250 <https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250>) has no templates sequences or alignments in the mmCIF file, and no per-residue scores, but does have some global scores. Your ModBase example (https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12... <https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12...>) does have a template sequence and alignment and global scores but no per-residue scores # loop_ _ma_template_ref_db_details.template_id _ma_template_ref_db_details.db_name _ma_template_ref_db_details.db_accession_code 1 PDB 3nc1 # loop_ _ma_template_poly.template_id _ma_template_poly.seq_one_letter_code _ma_template_poly.seq_one_letter_code_can 1 DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLIDYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEEYPEHRTNFFLLLQAVNSHCFPAFLAIPPAQFKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLIDYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEEYPEHRTNFFLLLQAVNSHCFPAFLAIPPAQFKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN # loop_ _ma_alignment.ordinal_id _ma_alignment.alignment_id _ma_alignment.target_template_flag _ma_alignment.sequence 1 1 2 DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLI---------DYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEE---------YPEHRTNFFLLLQAVNSHCFPAFLAIPPAQ---FKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN 2 1 1 DSYVETLDSMIELFKDYKPGSITLENITRLCQTL-GLESFTEELSNELSR--LSTASKIIVIDVDYNKKQDRIQDVKLVLASNFDNFDYFNQRDGEHEKSNILLNSLTKYPDLKAFHNNLKFLYLLDAYSHIESDSTSHNNGSSDKSLDSSNASFNNQGKLDLFKYFTELSHYIRQCFQDNCCDFKVRTNLNDKFGIYILTQGINGKEVPLAKIYLEENKSDSQYRFYEYIYSQETKSWINESAENFSNGISLVMEIVANAKESNYTDLIWFPEDFISPELIIDKVTCSSNSSSSPPIIDLFSNNNYNSRIQLMNDFTTKLINIKKFDISNDNLDLISEILKWV------------QWSRIVLQNVFKLVSTPSSNSNSSELEPDYQAPFSTSTKDKNSSTSNTE # loop_ _ma_qa_metric.id _ma_qa_metric.name _ma_qa_metric.description _ma_qa_metric.type _ma_qa_metric.mode _ma_qa_metric.other_details _ma_qa_metric.software_group_id 1 MPQS 'ModPipe Quality Score' other global 'composite score, values >1.1 are considered reliable' 1 2 zDOPE 'Normalized DOPE' zscore global . 2 3 'TSVMod RMSD' 'TSVMod predicted RMSD (MSALL)' distance global . . 4 'TSVMod NO35' 'TSVMod predicted native overlap (MSALL)' other global . . # loop_ _ma_qa_metric_global.ordinal_id _ma_qa_metric_global.model_id _ma_qa_metric_global.metric_id _ma_qa_metric_global.metric_value 1 1 1 0.665346 2 1 2 -0.11 3 1 3 14.527 4 1 4 0.036 So it looks like only ModBase would currently benefit from ChimeraX reading template sequences and alignments. I do not think it would be too hard to implement it. I've made a ChimeraX feature request for that https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5601 Tom
On Nov 12, 2021, at 10:20 AM, Ben Webb via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Do you have any plans to extend ChimeraX's mmCIF reader to parse and display metadata on theoretical models, such as quality scores or the alignments to template structures?
The folks at PDB have recently done a lot of work to standardize this metadata in the MA mmCIF dictionary: https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/
The dictionary has already been adopted by ModelArchive (e.g. AlphaFold2 models) and by ModBase (Modeller models) and I believe that other repositories such as SwissModel are also moving in that direction. See e.g. mmCIF downloads at https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250 https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12...
(My ulterior motive: we've previously built Chimera web data files to download a ModBase model and the accompanying alignment, and display them in Chimera; now that this data is embedded in the mmCIF file, in principle ChimeraX could do this itself in a less clunky and not ModBase-specific fashion.)
Ben -- ben@salilab.org https://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle _______________________________________________ ChimeraX-users mailing list ChimeraX-users@cgl.ucsf.edu Manage subscription: https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users