Auto-associate define attribute/render by attribute

Hi Chimera team, I’m working on a script to automatically render proteins in a complex by attributes (in this case, a number of scores obtained from different immunogenicity prediction algorithms). I currently have the script outputting attribute files as intended, but ideally I’d like to define the same attributes across all the chains of the model, which are made up of a number of different proteins. Is there any way for me to define the attribute file, or utilise some functionality to auto-associate an attribute file with all of the chains which have the correct sequence (similarly to how the sequence alignment input works at the moment?). Currently the attribute file is only using ‘numbered’ coordinates (:1, :2, :3…etc) , rather than the actual sequence. Since there’s no chain information in the attribute files, I’m not sure how I can go about connecting it to the right structure, other than manually at the moment? Many thanks, Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738

Hi Joe, Not sure I understand completely, but did you know that you could include chain in the specification in the attribute file? E.g. instead of “:3” it could be “:3.A” for residue 3 in chain A. It is any command-line specifier, so it could also include the model number. <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/defineattrib/defineattrib.html#attrfile> You would still have to figure out the numbering correspondence to write the attribute files, however. An alternative approach is to create a custom header file for the sequence or sequence alignment in Multalign Viewer, which would automatically also become an attribute of any associated structures. <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/multalignviewer.html#mavAttributes> <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/defineheader.html#headerformat> I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Nov 26, 2019, at 7:40 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
Hi Chimera team, I’m working on a script to automatically render proteins in a complex by attributes (in this case, a number of scores obtained from different immunogenicity prediction algorithms). I currently have the script outputting attribute files as intended, but ideally I’d like to define the same attributes across all the chains of the model, which are made up of a number of different proteins.
Is there any way for me to define the attribute file, or utilise some functionality to auto-associate an attribute file with all of the chains which have the correct sequence (similarly to how the sequence alignment input works at the moment?). Currently the attribute file is only using ‘numbered’ coordinates (:1, :2, :3…etc) , rather than the actual sequence.
Since there’s no chain information in the attribute files, I’m not sure how I can go about connecting it to the right structure, other than manually at the moment? Many thanks, Joe

Hi Elaine, Thank you for clarifying, I had missed the option for any atom-spec (I was following one of the examples but there didn't appear to be one addressing chains). As you say, this still requires some a priori knowledge of 'what's what' in the structures though. An added complication is that the structures are grouped in to a single model, and need to be split to be treated as different models (though I suppose in practice this makes little difference as its either a case of specifying a chain or a model). Consequently, this is why I was hoping that sequence-based auto-assignment could be possible. The sequence/alignment headers looks like they have potential, but this leads me to another 2 questions: - So far, it is complaining that the numeric values I'm using are less than one or greater than the alignment length. The data appears to possibly be longer than the structure (but is shorter than the corresponding gene sequence) so I assume some residues are not present in the mature structure. However, it complains about Position 402 of the file, which should still be in the structure displayed as far as I can tell.According to the docs, numeric values that fall outside [0, 1] should be converted for the histogram, but retained as attributes, so I don't think the numeric values are an issue. Any idea why Chimera is complaining? I've attached the header file I'm using. The model in question is PDB 6j0n (chains T, U, V, W, X, Y). - Secondly, what would be the equivalent python/chimera command interface for loading header files such that I can render them by these attributes (the latter of which I'm already au fait with). Thanks again, Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey <https://twitter.com/JRJHealey> | Website <http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk | ORCID: orcid.org/0000-0002-9569-6738 On 26/11/2019, 17:55, "Elaine Meng" <meng@cgl.ucsf.edu> wrote: Hi Joe, Not sure I understand completely, but did you know that you could include chain in the specification in the attribute file? E.g. instead of “:3” it could be “:3.A” for residue 3 in chain A. It is any command-line specifier, so it could also include the model number. <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/defineattrib/defineattrib.html#attrfile> You would still have to figure out the numbering correspondence to write the attribute files, however. An alternative approach is to create a custom header file for the sequence or sequence alignment in Multalign Viewer, which would automatically also become an attribute of any associated structures. <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/multalignviewer.html#mavAttributes> <http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/defineheader.html#headerformat> I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco > On Nov 26, 2019, at 7:40 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote: > > Hi Chimera team, > I’m working on a script to automatically render proteins in a complex by attributes (in this case, a number of scores obtained from different immunogenicity prediction algorithms). I currently have the script outputting attribute files as intended, but ideally I’d like to define the same attributes across all the chains of the model, which are made up of a number of different proteins. > > Is there any way for me to define the attribute file, or utilise some functionality to auto-associate an attribute file with all of the chains which have the correct sequence (similarly to how the sequence alignment input works at the moment?). Currently the attribute file is only using ‘numbered’ coordinates (:1, :2, :3…etc) , rather than the actual sequence. > > Since there’s no chain information in the attribute files, I’m not sure how I can go about connecting it to the right structure, other than manually at the moment? > Many thanks, > Joe

Hi Joe, Those chains are two different sequences. Pvc12 is chains P,Q,R,S,T,U with >900 positions in the sequence Pvc4 is chains V,W,X,Y,Z,a with 410 positions in the sequence So I can load the headers from your file for the former sequences (I tried chain T) but not the latter, because your file specifies up to position 436. Python is beyond my skill set, though… somebody else would have to advise on that. An aside: I note that it was much easier to tell which chains are the same sequence as each other in ChimeraX, which is also a lot faster on big structures like this. Unfortunately we don’t yet have it reading custom header files or making the headers into attributes, or else I would suggest using ChimeraX instead. Below is a screenshot from opening 6j0n in ChimeraX and clicking the “Pvc12” and “Pvc4” links in the Chain Description table that automatically appears when the file is opened. Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Nov 27, 2019, at 1:43 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
Hi Elaine,
Thank you for clarifying, I had missed the option for any atom-spec (I was following one of the examples but there didn't appear to be one addressing chains). As you say, this still requires some a priori knowledge of 'what's what' in the structures though. An added complication is that the structures are grouped in to a single model, and need to be split to be treated as different models (though I suppose in practice this makes little difference as its either a case of specifying a chain or a model).
Consequently, this is why I was hoping that sequence-based auto-assignment could be possible. The sequence/alignment headers looks like they have potential, but this leads me to another 2 questions:
- So far, it is complaining that the numeric values I'm using are less than one or greater than the alignment length. The data appears to possibly be longer than the structure (but is shorter than the corresponding gene sequence) so I assume some residues are not present in the mature structure. However, it complains about Position 402 of the file, which should still be in the structure displayed as far as I can tell.According to the docs, numeric values that fall outside [0, 1] should be converted for the histogram, but retained as attributes, so I don't think the numeric values are an issue. Any idea why Chimera is complaining? I've attached the header file I'm using. The model in question is PDB 6j0n (chains T, U, V, W, X, Y).
- Secondly, what would be the equivalent python/chimera command interface for loading header files such that I can render them by these attributes (the latter of which I'm already au fait with).
Thanks again,
Joe
On Nov 26, 2019, at 7:40 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
Hi Chimera team, I’m working on a script to automatically render proteins in a complex by attributes (in this case, a number of scores obtained from different immunogenicity prediction algorithms). I currently have the script outputting attribute files as intended, but ideally I’d like to define the same attributes across all the chains of the model, which are made up of a number of different proteins.
Is there any way for me to define the attribute file, or utilise some functionality to auto-associate an attribute file with all of the chains which have the correct sequence (similarly to how the sequence alignment input works at the moment?). Currently the attribute file is only using ‘numbered’ coordinates (:1, :2, :3…etc) , rather than the actual sequence.
Since there’s no chain information in the attribute files, I’m not sure how I can go about connecting it to the right structure, other than manually at the moment? Many thanks, Joe
<ChouFasman.txt>_______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

On Nov 27, 2019, at 1:07 PM, Elaine Meng <meng@cgl.ucsf.edu> wrote:
Python is beyond my skill set, though… somebody else would have to advise on that.
The first thing I should say is that Multalign Viewer is only available as a graphical tool, so you will only be able to script this for a Chimera that is running its graphical interface, not in any headless “batch” mode. Okay, with that out of the way, the first job is to get a Python instance of Multalign Viewer showing the sequence you want with the chains you want associated. Unlike ChimeraX, which associates on a per-chain basis, Chimera associates on a per-structure basis, so you will have to first split your structure apart by chains (with the “split” command). If you have the pertinent sequence in a file, you could do this: from MultAlignViewer.MAViewer import MAViewer mav = MAViewer(“full-path-to-sequence-file”) If you don’t have a file, but you know you want to use the sequence of chain C, you could do this: from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): try: seq = m.sequence(‘C’) except KeyError: continue from MultAlignViewer.MAViewer import MAViewer mav = MAViewer([seq]) You then add your custom header file with: mav.readHeaderFile(“full-path-to-header-file”) After that, the residues will have the necessary attribute (prefixed with “mav”, so if the header attribute is ChouFasmanImmuno, the residue attribute is mavChouFasmanImmuno). —Eric Eric Pettersen UCSF Computer Graphics Lab

Hi Eric, I’ve been giving this some more thought. Is there perhaps a way to brute force this? My thinking at the moment is, given that I’ll know the sequence, positions, and scores I want to use, all I really need to do is somehow set the attributes of the relevant object, perhaps circumventing MAV altogether? My main question is then, how are attributes for the models currently stored? Does `mavAttributeName` become an actual attribute of the `molecule.residue[n]`? Broadly I was thinking something like this might work based on your code suggestion below: * Split the open model to get individual models for the chains * Iterate the list of models, and test whether their .sequence() matches the sequence I expect (perhaps via alignment if not perfect matches). I could read the attribute file in alongside this in base python, which perhaps means I don’t need to use the defined format too, and therefore can encode the seq/score/position all in one go. * If it does, assign the relevant attribute name/score to the residue/model (depending on how exactly the attributes are stored at the moment). As simple as Molecule.Residue.mavAttributeName? Though of course there wouldn’t necessarily need to be a “mav” in there. * If not, move on to the next model * If I can ‘force’ the attributes on to the residues etc, I can then trigger the rendering without needed to use MAV? Presumably there’s a way to ‘inject’ the namespace with an object equivalent to that which MAV/DefineAttribute would achieve, and I can do the association ‘manually’? Many thanks, Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738 From: Eric Pettersen <pett@cgl.ucsf.edu> Reply to: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Date: Wednesday, 27 November 2019 at 23:56 To: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Cc: "Healey, Joseph" <J.Healey.1@warwick.ac.uk> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute On Nov 27, 2019, at 1:07 PM, Elaine Meng <meng@cgl.ucsf.edu<mailto:meng@cgl.ucsf.edu>> wrote: Python is beyond my skill set, though… somebody else would have to advise on that. The first thing I should say is that Multalign Viewer is only available as a graphical tool, so you will only be able to script this for a Chimera that is running its graphical interface, not in any headless “batch” mode. Okay, with that out of the way, the first job is to get a Python instance of Multalign Viewer showing the sequence you want with the chains you want associated. Unlike ChimeraX, which associates on a per-chain basis, Chimera associates on a per-structure basis, so you will have to first split your structure apart by chains (with the “split” command). If you have the pertinent sequence in a file, you could do this: from MultAlignViewer.MAViewer import MAViewer mav = MAViewer(“full-path-to-sequence-file”) If you don’t have a file, but you know you want to use the sequence of chain C, you could do this: from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): try: seq = m.sequence(‘C’) except KeyError: continue from MultAlignViewer.MAViewer import MAViewer mav = MAViewer([seq]) You then add your custom header file with: mav.readHeaderFile(“full-path-to-header-file”) After that, the residues will have the necessary attribute (prefixed with “mav”, so if the header attribute is ChouFasmanImmuno, the residue attribute is mavChouFasmanImmuno). —Eric Eric Pettersen UCSF Computer Graphics Lab

As long as we are talking about exact (sub)sequence matches, I think it is reasonably simple to do in Python: # pairs of sequences and corresponding values you want to assign… seq_vals = [ (“ADTY…whatever…PLHE”, (1.97, 2.14, … same len as sequence…, 0.88, 1.22)), (“…next sequence…”, (…next set of values…)) ] from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): for seq in m.sequences(): for target_seq, vals in seq_vals: try: i = str(seq).index(target_seq) except ValueError: continue for offset, val in enumerate(vals): r = seq.residues[i+offset] if r: r.chouFasmanImmuno = val A few notes: 1) If there is missing structure, there maybe not be a residue that corresponds to a sequence position, which is why there is an “if r:” test in the script. The full sequence is known from the SEQRES records in the PDB file. 2) Avoid assigning an attribute name starting with an upper-case letter. Symbolic constants start with upper-case letters and the render-by-attr code will screen out such attributes from its interface. There is a way around this by registering the attribute with the SimpleSession module (which will also get the attribute to save in sessions). I can provide details if needed. 3) You may want to read the sequences and values from a file rather than have them directly in the script. You might also want to set multiple attributes, in which case you would include the the attribute name along with the sequence and values, and change the corresponding ‘for’ loop and change ‘r.attrName = val’ to setattr(r, attrName, val). I don’t know how good your Python juju is, but I can provide guidance as needed. —Eric
On Nov 29, 2019, at 8:22 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
Hi Eric,
I’ve been giving this some more thought. Is there perhaps a way to brute force this?
My thinking at the moment is, given that I’ll know the sequence, positions, and scores I want to use, all I really need to do is somehow set the attributes of the relevant object, perhaps circumventing MAV altogether? My main question is then, how are attributes for the models currently stored? Does `mavAttributeName` become an actual attribute of the `molecule.residue[n]`?
Broadly I was thinking something like this might work based on your code suggestion below:
Split the open model to get individual models for the chains Iterate the list of models, and test whether their .sequence() matches the sequence I expect (perhaps via alignment if not perfect matches). I could read the attribute file in alongside this in base python, which perhaps means I don’t need to use the defined format too, and therefore can encode the seq/score/position all in one go. If it does, assign the relevant attribute name/score to the residue/model (depending on how exactly the attributes are stored at the moment). As simple as Molecule.Residue.mavAttributeName? Though of course there wouldn’t necessarily need to be a “mav” in there. If not, move on to the next model If I can ‘force’ the attributes on to the residues etc, I can then trigger the rendering without needed to use MAV?
Presumably there’s a way to ‘inject’ the namespace with an object equivalent to that which MAV/DefineAttribute would achieve, and I can do the association ‘manually’?
Many thanks,
Joe
Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey <https://twitter.com/JRJHealey> | Website <http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk <mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738 <http://orcid.org/0000-0002-9569-6738>
From: Eric Pettersen <pett@cgl.ucsf.edu <mailto:pett@cgl.ucsf.edu>> Reply to: "chimera-users@cgl.ucsf.edu <mailto:chimera-users@cgl.ucsf.edu> BB" <chimera-users@cgl.ucsf.edu <mailto:chimera-users@cgl.ucsf.edu>> Date: Wednesday, 27 November 2019 at 23:56 To: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Cc: "Healey, Joseph" <J.Healey.1@warwick.ac.uk> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute
On Nov 27, 2019, at 1:07 PM, Elaine Meng <meng@cgl.ucsf.edu <mailto:meng@cgl.ucsf.edu>> wrote:
Python is beyond my skill set, though… somebody else would have to advise on that.
The first thing I should say is that Multalign Viewer is only available as a graphical tool, so you will only be able to script this for a Chimera that is running its graphical interface, not in any headless “batch” mode. Okay, with that out of the way, the first job is to get a Python instance of Multalign Viewer showing the sequence you want with the chains you want associated. Unlike ChimeraX, which associates on a per-chain basis, Chimera associates on a per-structure basis, so you will have to first split your structure apart by chains (with the “split” command). If you have the pertinent sequence in a file, you could do this:
from MultAlignViewer.MAViewer import MAViewer mav = MAViewer(“full-path-to-sequence-file”)
If you don’t have a file, but you know you want to use the sequence of chain C, you could do this:
from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): try: seq = m.sequence(‘C’) except KeyError: continue from MultAlignViewer.MAViewer import MAViewer mav = MAViewer([seq])
You then add your custom header file with:
mav.readHeaderFile(“full-path-to-header-file”)
After that, the residues will have the necessary attribute (prefixed with “mav”, so if the header attribute is ChouFasmanImmuno, the residue attribute is mavChouFasmanImmuno).
—Eric
Eric Pettersen UCSF Computer Graphics Lab
_______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu <mailto:Chimera-users@cgl.ucsf.edu> Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users <http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users>

Hi Eric, That looks pretty similar to what I had come up with yeah (here I’m assuming an exact sequence to attribute correspondence over the full length of the chain). This is approximately the approach I was using: from chimera import openModels, Molecule from chimera import runCommand as rc # assume a list of values <= len(seq) read in from a file rc(“split”) for m in openModels.list(modelTypes=[Molecule]): # insert logic for pairing sequences/input files to chains by sequence matching # some heuristic like Levenshtein distance will probably suffice for now or I can replace this with pairwise alignment from BioPython or something. for r, val in zip(m.residues, list_of_vals): setattr(r, ‘chouFasmanImmuno’, val) rc(“rangecol chouFasmanImmuno ….”) On your notes: 1. - The various trys/ifs are a good idea so I’ll incorporate those, thanks! 2. - Thanks for the tip on the capital letters – this was going to be my next question as I couldn’t get the RBA dialogue to show the new attributes so this probably explains it. I’m not fussy about the case-sensitivity of the attributes so I’ll just stick to camel/lowercase. * On a related note, is there a need to programmatically ‘refresh’ rangecol/RBA after defining the attribute, as there is with the dialogue box, or should it be happy once the attr is set? 3. - This was exactly my plan! 😉 the residue attribute numbers for various different immunogenicity algorithms are coming from a separate suite of tools, so a pandas dataframe will probably be the input (perhaps as a csv). Initially my thinking was to use the Define Attribute format and encode the sequence in a #comment, but actually this approach frees me entirely from needing to follow that format which I think is better. My python is pretty good these days so it’s only when I run up against the inner workings of chimera that I need to consult you guys really! This definitely feels like a simpler approach than messing with DefineAttribute or MAV, so I think it’s the way to go. Thanks again for the help. Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738 From: Eric Pettersen <pett@cgl.ucsf.edu> Reply to: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Date: Monday, 2 December 2019 at 22:40 To: "Healey, Joseph" <J.Healey.1@warwick.ac.uk> Cc: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute As long as we are talking about exact (sub)sequence matches, I think it is reasonably simple to do in Python: # pairs of sequences and corresponding values you want to assign… seq_vals = [ (“ADTY…whatever…PLHE”, (1.97, 2.14, … same len as sequence…, 0.88, 1.22)), (“…next sequence…”, (…next set of values…)) ] from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): for seq in m.sequences(): for target_seq, vals in seq_vals: try: i = str(seq).index(target_seq) except ValueError: continue for offset, val in enumerate(vals): r = seq.residues[i+offset] if r: r.chouFasmanImmuno = val A few notes: 1) If there is missing structure, there maybe not be a residue that corresponds to a sequence position, which is why there is an “if r:” test in the script. The full sequence is known from the SEQRES records in the PDB file. 2) Avoid assigning an attribute name starting with an upper-case letter. Symbolic constants start with upper-case letters and the render-by-attr code will screen out such attributes from its interface. There is a way around this by registering the attribute with the SimpleSession module (which will also get the attribute to save in sessions). I can provide details if needed. 3) You may want to read the sequences and values from a file rather than have them directly in the script. You might also want to set multiple attributes, in which case you would include the the attribute name along with the sequence and values, and change the corresponding ‘for’ loop and change ‘r.attrName = val’ to setattr(r, attrName, val). I don’t know how good your Python juju is, but I can provide guidance as needed. —Eric On Nov 29, 2019, at 8:22 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk>> wrote: Hi Eric, I’ve been giving this some more thought. Is there perhaps a way to brute force this? My thinking at the moment is, given that I’ll know the sequence, positions, and scores I want to use, all I really need to do is somehow set the attributes of the relevant object, perhaps circumventing MAV altogether? My main question is then, how are attributes for the models currently stored? Does `mavAttributeName` become an actual attribute of the `molecule.residue[n]`? Broadly I was thinking something like this might work based on your code suggestion below: · Split the open model to get individual models for the chains · Iterate the list of models, and test whether their .sequence() matches the sequence I expect (perhaps via alignment if not perfect matches). I could read the attribute file in alongside this in base python, which perhaps means I don’t need to use the defined format too, and therefore can encode the seq/score/position all in one go. · If it does, assign the relevant attribute name/score to the residue/model (depending on how exactly the attributes are stored at the moment). As simple as Molecule.Residue.mavAttributeName? Though of course there wouldn’t necessarily need to be a “mav” in there. o If not, move on to the next model · If I can ‘force’ the attributes on to the residues etc, I can then trigger the rendering without needed to use MAV? Presumably there’s a way to ‘inject’ the namespace with an object equivalent to that which MAV/DefineAttribute would achieve, and I can do the association ‘manually’? Many thanks, Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738<http://orcid.org/0000-0002-9569-6738> From: Eric Pettersen <pett@cgl.ucsf.edu<mailto:pett@cgl.ucsf.edu>> Reply to: "chimera-users@cgl.ucsf.edu<mailto:chimera-users@cgl.ucsf.edu> BB" <chimera-users@cgl.ucsf.edu<mailto:chimera-users@cgl.ucsf.edu>> Date: Wednesday, 27 November 2019 at 23:56 To: "chimera-users@cgl.ucsf.edu<mailto:chimera-users@cgl.ucsf.edu> BB" <chimera-users@cgl.ucsf.edu<mailto:chimera-users@cgl.ucsf.edu>> Cc: "Healey, Joseph" <J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk>> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute On Nov 27, 2019, at 1:07 PM, Elaine Meng <meng@cgl.ucsf.edu<mailto:meng@cgl.ucsf.edu>> wrote: Python is beyond my skill set, though… somebody else would have to advise on that. The first thing I should say is that Multalign Viewer is only available as a graphical tool, so you will only be able to script this for a Chimera that is running its graphical interface, not in any headless “batch” mode. Okay, with that out of the way, the first job is to get a Python instance of Multalign Viewer showing the sequence you want with the chains you want associated. Unlike ChimeraX, which associates on a per-chain basis, Chimera associates on a per-structure basis, so you will have to first split your structure apart by chains (with the “split” command). If you have the pertinent sequence in a file, you could do this: from MultAlignViewer.MAViewer import MAViewer mav = MAViewer(“full-path-to-sequence-file”) If you don’t have a file, but you know you want to use the sequence of chain C, you could do this: from chimera import openModels, Molecule for m in openModels.list(modelTypes=[Molecule]): try: seq = m.sequence(‘C’) except KeyError: continue from MultAlignViewer.MAViewer import MAViewer mav = MAViewer([seq]) You then add your custom header file with: mav.readHeaderFile(“full-path-to-header-file”) After that, the residues will have the necessary attribute (prefixed with “mav”, so if the header attribute is ChouFasmanImmuno, the residue attribute is mavChouFasmanImmuno). —Eric Eric Pettersen UCSF Computer Graphics Lab _______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu<mailto:Chimera-users@cgl.ucsf.edu> Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

On Dec 3, 2019, at 1:15 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
On a related note, is there a need to programmatically ‘refresh’ rangecol/RBA after defining the attribute, as there is with the dialogue box, or should it be happy once the attr is set?
rangecolor will work without any nudging. The Render By Attribute dialog will remain blissfully unaware of the new attributes until you use its refresh menu item. You can use that dialog’s refreshAttrs() method to programatically nudge it. Look at MultAlignViewer/MAViewer.py in your Chimera distribution for an example (search for refreshAttrs). —Eric Eric Pettersen UCSF Computer Graphics Lab

Hi Chimera team, I’ve made some good headway with the subject of this previous thread (a relatively simple Bio.pairwise2 alignment approach seems to serve to associate chains reasonably well. One further thing I’d like to clarify, is how missing data is stored for a residue attribute in the case of the renderbyattribute functionality? e.g. if I have the structure sequence, and the scores (just using integers as a simple example for now): model.residues = MEREYV Scores = 192984 I can assign those scores reasonable easily as discussed with something to the effect of: for res, score in zip(model.residues, scores): setattr(res, “attributeName”, score) If in the case I have scores with missing data however, what value for missing is acceptable/expected by chimera (“”, None etc)? E.g: Scores = 192-84 In which case I envisage doing something like: for res, score in zip(model.residues, scores): if score == “-“: setattr(res, “attributeName, “”) # Or whatever the allowable character/value is? else: setattr(res, “attributeName”, score) Many thanks, Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738 From: Eric Pettersen <pett@cgl.ucsf.edu> Reply to: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Date: Wednesday, 4 December 2019 at 01:00 To: "Healey, Joseph" <J.Healey.1@warwick.ac.uk> Cc: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute On Dec 3, 2019, at 1:15 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk>> wrote: 1. On a related note, is there a need to programmatically ‘refresh’ rangecol/RBA after defining the attribute, as there is with the dialogue box, or should it be happy once the attr is set? rangecolor will work without any nudging. The Render By Attribute dialog will remain blissfully unaware of the new attributes until you use its refresh menu item. You can use that dialog’s refreshAttrs() method to programatically nudge it. Look at MultAlignViewer/MAViewer.py in your Chimera distribution for an example (search for refreshAttrs). —Eric Eric Pettersen UCSF Computer Graphics Lab

Hi Joe, Maybe I’m misunderstanding the issue, but can’t you just skip assigning a value to that residue if it does not have a score? That would be the same as omitting the residue from an attribute assignment file. It depends how you want to use the attribute: do you want those residues to have the attribute but assigned as none, or do you want them simply not to have that attribute? An analogous situation is the amino acid hydrophobicity attribute. The following command-line specifier designates residues (such as water or nucleic acids) that lack a Kyte-Doolittle hydrophobicity assignment: :/^kdHydrophobicity … and the Render by Attribute tool and rangecolor command both have options as to what you want to do with such “no-value” atoms or residues. To summarize, you may not need to assign some kind of null value, but simply skip assigning anything. Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Jan 23, 2020, at 2:51 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
Hi Chimera team,
I’ve made some good headway with the subject of this previous thread (a relatively simple Bio.pairwise2 alignment approach seems to serve to associate chains reasonably well. One further thing I’d like to clarify, is how missing data is stored for a residue attribute in the case of the renderbyattribute functionality?
e.g. if I have the structure sequence, and the scores (just using integers as a simple example for now): model.residues = MEREYV Scores = 192984
I can assign those scores reasonable easily as discussed with something to the effect of:
for res, score in zip(model.residues, scores): setattr(res, “attributeName”, score)
If in the case I have scores with missing data however, what value for missing is acceptable/expected by chimera (“”, None etc)?
E.g: Scores = 192-84
In which case I envisage doing something like:
for res, score in zip(model.residues, scores): if score == “-“: setattr(res, “attributeName, “”) # Or whatever the allowable character/value is? else: setattr(res, “attributeName”, score)
Many thanks,
Joe

Like Elaine says, skipping the assignment entirely is perfectly acceptable. If it’s easier to assign something than skip it, assign None. —Eric Eric Pettersen UCSF Computer Graphics Lab
On Jan 23, 2020, at 10:03 AM, Elaine Meng <meng@cgl.ucsf.edu> wrote:
Hi Joe, Maybe I’m misunderstanding the issue, but can’t you just skip assigning a value to that residue if it does not have a score? That would be the same as omitting the residue from an attribute assignment file.
It depends how you want to use the attribute: do you want those residues to have the attribute but assigned as none, or do you want them simply not to have that attribute?
An analogous situation is the amino acid hydrophobicity attribute. The following command-line specifier designates residues (such as water or nucleic acids) that lack a Kyte-Doolittle hydrophobicity assignment:
:/^kdHydrophobicity
… and the Render by Attribute tool and rangecolor command both have options as to what you want to do with such “no-value” atoms or residues.
To summarize, you may not need to assign some kind of null value, but simply skip assigning anything. Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Jan 23, 2020, at 2:51 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk> wrote:
Hi Chimera team,
I’ve made some good headway with the subject of this previous thread (a relatively simple Bio.pairwise2 alignment approach seems to serve to associate chains reasonably well. One further thing I’d like to clarify, is how missing data is stored for a residue attribute in the case of the renderbyattribute functionality?
e.g. if I have the structure sequence, and the scores (just using integers as a simple example for now): model.residues = MEREYV Scores = 192984
I can assign those scores reasonable easily as discussed with something to the effect of:
for res, score in zip(model.residues, scores): setattr(res, “attributeName”, score)
If in the case I have scores with missing data however, what value for missing is acceptable/expected by chimera (“”, None etc)?
E.g: Scores = 192-84
In which case I envisage doing something like:
for res, score in zip(model.residues, scores): if score == “-“: setattr(res, “attributeName, “”) # Or whatever the allowable character/value is? else: setattr(res, “attributeName”, score)
Many thanks,
Joe
_______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

Hi Eric/Elaine, That’s perfect thanks – just what I needed to know! Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738 From: Eric Pettersen <pett@cgl.ucsf.edu> Reply to: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Date: Thursday, 23 January 2020 at 18:46 To: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Cc: "Healey, Joseph" <J.Healey.1@warwick.ac.uk> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute Like Elaine says, skipping the assignment entirely is perfectly acceptable. If it’s easier to assign something than skip it, assign None. —Eric Eric Pettersen UCSF Computer Graphics Lab On Jan 23, 2020, at 10:03 AM, Elaine Meng <meng@cgl.ucsf.edu<mailto:meng@cgl.ucsf.edu>> wrote: Hi Joe, Maybe I’m misunderstanding the issue, but can’t you just skip assigning a value to that residue if it does not have a score? That would be the same as omitting the residue from an attribute assignment file. It depends how you want to use the attribute: do you want those residues to have the attribute but assigned as none, or do you want them simply not to have that attribute? An analogous situation is the amino acid hydrophobicity attribute. The following command-line specifier designates residues (such as water or nucleic acids) that lack a Kyte-Doolittle hydrophobicity assignment: :/^kdHydrophobicity … and the Render by Attribute tool and rangecolor command both have options as to what you want to do with such “no-value” atoms or residues. To summarize, you may not need to assign some kind of null value, but simply skip assigning anything. Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco On Jan 23, 2020, at 2:51 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk>> wrote: Hi Chimera team, I’ve made some good headway with the subject of this previous thread (a relatively simple Bio.pairwise2 alignment approach seems to serve to associate chains reasonably well. One further thing I’d like to clarify, is how missing data is stored for a residue attribute in the case of the renderbyattribute functionality? e.g. if I have the structure sequence, and the scores (just using integers as a simple example for now): model.residues = MEREYV Scores = 192984 I can assign those scores reasonable easily as discussed with something to the effect of: for res, score in zip(model.residues, scores): setattr(res, “attributeName”, score) If in the case I have scores with missing data however, what value for missing is acceptable/expected by chimera (“”, None etc)? E.g: Scores = 192-84 In which case I envisage doing something like: for res, score in zip(model.residues, scores): if score == “-“: setattr(res, “attributeName, “”) # Or whatever the allowable character/value is? else: setattr(res, “attributeName”, score) Many thanks, Joe _______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu<mailto:Chimera-users@cgl.ucsf.edu> Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

Hi Elaine/Eric, Apologies, that was an error on my part, I forgot that there are case-differences between chains with the same letter ID. They should all match PVC3: the t,u,v,w,x,y chains. Between yours and Eric’s answer though, it sounds like I can’t quite achieve what I’m planning without doing it through a GUI. Perhaps I need to have a re-think about how I go about this altogether. I don’t suppose there’s any other way besides MAV to auto-associate sequences with structures? In an ideal world, I’d be able to provide an attribute/header type file that looked something like: # header info attribute: AttributeName etc… :Met 1.23 :Ala 2.34 :Ser 3.45 :… … And so on, and have it be able to identify the chains from the sequence of residues in the file. I would like to use ChimeraX for this in future, but as you say, the scripting/feature set isn’t quite there yet (manipulating large structures speedily is particularly useful to me). Many thanks for your suggestions so far, Joe Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB Research Fellow Warwick Medical School University of Warwick Coventry CV47AL Mob: +44 (0) 7536 042620 | Twitter: @JRJHealey<https://twitter.com/JRJHealey> | Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey> Email: J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738 From: Elaine Meng <meng@cgl.ucsf.edu> Reply to: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Date: Wednesday, 27 November 2019 at 21:07 To: "Healey, Joseph" <J.Healey.1@warwick.ac.uk> Cc: "chimera-users@cgl.ucsf.edu BB" <chimera-users@cgl.ucsf.edu> Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute Hi Joe, Those chains are two different sequences. Pvc12 is chains P,Q,R,S,T,U with >900 positions in the sequence Pvc4 is chains V,W,X,Y,Z,a with 410 positions in the sequence So I can load the headers from your file for the former sequences (I tried chain T) but not the latter, because your file specifies up to position 436. Python is beyond my skill set, though… somebody else would have to advise on that. An aside: I note that it was much easier to tell which chains are the same sequence as each other in ChimeraX, which is also a lot faster on big structures like this. Unfortunately we don’t yet have it reading custom header files or making the headers into attributes, or else I would suggest using ChimeraX instead. Below is a screenshot from opening 6j0n in ChimeraX and clicking the “Pvc12” and “Pvc4” links in the Chain Description table that automatically appears when the file is opened. Best, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco [cid:97CC551E-1D0A-41D1-8C77-08FB3E61422E] On Nov 27, 2019, at 1:43 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk>> wrote: Hi Elaine, Thank you for clarifying, I had missed the option for any atom-spec (I was following one of the examples but there didn't appear to be one addressing chains). As you say, this still requires some a priori knowledge of 'what's what' in the structures though. An added complication is that the structures are grouped in to a single model, and need to be split to be treated as different models (though I suppose in practice this makes little difference as its either a case of specifying a chain or a model). Consequently, this is why I was hoping that sequence-based auto-assignment could be possible. The sequence/alignment headers looks like they have potential, but this leads me to another 2 questions: - So far, it is complaining that the numeric values I'm using are less than one or greater than the alignment length. The data appears to possibly be longer than the structure (but is shorter than the corresponding gene sequence) so I assume some residues are not present in the mature structure. However, it complains about Position 402 of the file, which should still be in the structure displayed as far as I can tell.According to the docs, numeric values that fall outside [0, 1] should be converted for the histogram, but retained as attributes, so I don't think the numeric values are an issue. Any idea why Chimera is complaining? I've attached the header file I'm using. The model in question is PDB 6j0n (chains T, U, V, W, X, Y). - Secondly, what would be the equivalent python/chimera command interface for loading header files such that I can render them by these attributes (the latter of which I'm already au fait with). Thanks again, Joe On Nov 26, 2019, at 7:40 AM, Healey, Joseph <J.Healey.1@warwick.ac.uk<mailto:J.Healey.1@warwick.ac.uk>> wrote: Hi Chimera team, I’m working on a script to automatically render proteins in a complex by attributes (in this case, a number of scores obtained from different immunogenicity prediction algorithms). I currently have the script outputting attribute files as intended, but ideally I’d like to define the same attributes across all the chains of the model, which are made up of a number of different proteins. Is there any way for me to define the attribute file, or utilise some functionality to auto-associate an attribute file with all of the chains which have the correct sequence (similarly to how the sequence alignment input works at the moment?). Currently the attribute file is only using ‘numbered’ coordinates (:1, :2, :3…etc) , rather than the actual sequence. Since there’s no chain information in the attribute files, I’m not sure how I can go about connecting it to the right structure, other than manually at the moment? Many thanks, Joe <ChouFasman.txt>_______________________________________________ Chimera-users mailing list: Chimera-users@cgl.ucsf.edu<mailto:Chimera-users@cgl.ucsf.edu> Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
participants (3)
-
Elaine Meng
-
Eric Pettersen
-
Healey, Joseph