Renaming of "Non-standard residues" upon file opening

Hello I have a question regarding ChimeraX renaming residues upon loading pdb files. I am specifically referring to a pdb file containing the following lines: TITLE PLACEHOLDER_TITLE REMARK PLACEHOLDER_REMARK CRYST1 100.000 100.000 100.000 90.00 90.00 90.00 P 1 1 MODEL 1 ATOM 1 NC3 DOPCA 1 93.725 50.146 69.265 1.00 0.00 ATOM 2 PO4 DOPCA 1 93.471 50.284 66.361 1.00 0.00 ATOM 3 GL1 DOPCA 1 93.591 50.038 63.469 1.00 0.00 ATOM 4 GL2 DOPCA 1 94.818 50.396 63.498 1.00 0.00 ATOM 5 C1A DOPCA 1 93.669 50.374 60.367 1.00 0.00 ATOM 6 D2A DOPCA 1 93.738 49.939 57.365 1.00 0.00 ATOM 7 C3A DOPCA 1 93.390 49.936 54.359 1.00 0.00 ATOM 8 C4A DOPCA 1 93.358 50.194 51.480 1.00 0.00 ATOM 9 C1B DOPCA 1 96.261 50.393 60.395 1.00 0.00 ATOM 10 D2B DOPCA 1 96.060 50.114 57.261 1.00 0.00 ATOM 11 C3B DOPCA 1 96.224 50.032 54.669 1.00 0.00 ATOM 12 C4B DOPCA 1 95.844 50.135 51.528 1.00 0.00 TER END When I load this file into ChimeraX using "open file.pdb" it renames the residue name "DOPC" to "DOP". A table appears in the Log regarding "Non-standard residues in file.pdb" wherein it lists "DOP - (DOP)" without mentioning "DOPC". However, when I open another file which contains many different types of molecules, then it does not rename "DOPC" to "DOP", even though the "Non-standard residues in file.pdb" table also appears, though this time with an entry containing "DOPC - (DOPC)". I have also previously experienced that ChimeraX renames some (not all) of my cholesterol molecules to "CHO" (instead of using the actual residue name which is "CHOL"). I had a look through the documentation for the "open" command, but I couldn't find anything useful about it. I was wondering why this is happening and if there is a way to prevent it. And just so i am clear, by prevent I don't mean revert (e.g. rename them back to "DOPC" after loading), but actually preventing the initial renaming done by ChimeraX. Kind regards, Mikkel Dahl Andreasen

Hi Mikkel In the actual PDB standard, the “C” column in “DOPCA” is supposed to be blank, separating the limited-to-3-character residue name from the limited-to-1-character chain ID. These limits are obviously constraining so many non-standard PDB files conscript that blank column for either a 4-character residue name or a 2-character chain ID and ChimeraX has to guess which one it is. The criteria ChimeraX has been using is that if that character is the same for all residues of a chain, then it’s part of a 2-character chain ID, otherwise it’s part of a 4-character residue name, which is why for your file the residue name winds up being “DOP” with a chain ID of “CA”. I have now tweaked that criteria so that if there is exactly one residue in a chain it instead assumes a 4-character residue name instead of a 2-character chain ID, which gets things working for the file you provided. That change will be in the daily build (not the 1.8 release candidate) starting tomorrow. --Eric Eric Pettersen UCSF Computer Graphics Lab
On Jun 6, 2024, at 6:14 AM, Mikkel Dahl Andreasen via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hello
I have a question regarding ChimeraX renaming residues upon loading pdb files. I am specifically referring to a pdb file containing the following lines:
TITLE PLACEHOLDER_TITLE REMARK PLACEHOLDER_REMARK CRYST1 100.000 100.000 100.000 90.00 90.00 90.00 P 1 1 MODEL 1 ATOM 1 NC3 DOPCA 1 93.725 50.146 69.265 1.00 0.00 ATOM 2 PO4 DOPCA 1 93.471 50.284 66.361 1.00 0.00 ATOM 3 GL1 DOPCA 1 93.591 50.038 63.469 1.00 0.00 ATOM 4 GL2 DOPCA 1 94.818 50.396 63.498 1.00 0.00 ATOM 5 C1A DOPCA 1 93.669 50.374 60.367 1.00 0.00 ATOM 6 D2A DOPCA 1 93.738 49.939 57.365 1.00 0.00 ATOM 7 C3A DOPCA 1 93.390 49.936 54.359 1.00 0.00 ATOM 8 C4A DOPCA 1 93.358 50.194 51.480 1.00 0.00 ATOM 9 C1B DOPCA 1 96.261 50.393 60.395 1.00 0.00 ATOM 10 D2B DOPCA 1 96.060 50.114 57.261 1.00 0.00 ATOM 11 C3B DOPCA 1 96.224 50.032 54.669 1.00 0.00 ATOM 12 C4B DOPCA 1 95.844 50.135 51.528 1.00 0.00 TER END
When I load this file into ChimeraX using "open file.pdb" it renames the residue name "DOPC" to "DOP". A table appears in the Log regarding "Non-standard residues in file.pdb" wherein it lists "DOP - (DOP)" without mentioning "DOPC". However, when I open another file which contains many different types of molecules, then it does not rename "DOPC" to "DOP", even though the "Non-standard residues in file.pdb" table also appears, though this time with an entry containing "DOPC - (DOPC)".
I have also previously experienced that ChimeraX renames some (not all) of my cholesterol molecules to "CHO" (instead of using the actual residue name which is "CHOL").
I had a look through the documentation for the "open" command, but I couldn't find anything useful about it.
I was wondering why this is happening and if there is a way to prevent it. And just so i am clear, by prevent I don't mean revert (e.g. rename them back to "DOPC" after loading), but actually preventing the initial renaming done by ChimeraX.
Kind regards, Mikkel Dahl Andreasen
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/

Thanks for the help Would a better, and more consistent, solution potentially be to include a new option for the "open" argument? The option could be named something like "pdbCol21" with the accepted values being "automatic" (default, and the way it works right now), "resname" (always treats column 21 as being a part of the residue name), "chain" (always treats column 21 as being a part of the chain name) and "ignore" (just ignores whatever is in column 21). "resname" and "chain" would of course ignore column 21 if it is blank for a given line. I don't know how much work it would be to implement something like this, but i think it would be helpful as users can then be certain about how their pdb files are read by the program. Kind regards, Mikkel Dahl Andreasen

If there prove to be PDB files where you can't tell the column meaning easily by context, then what you propose would be a good solution. --Eric
On Jun 6, 2024, at 11:49 PM, mdahlandreasen--- via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Thanks for the help
Would a better, and more consistent, solution potentially be to include a new option for the "open" argument? The option could be named something like "pdbCol21" with the accepted values being "automatic" (default, and the way it works right now), "resname" (always treats column 21 as being a part of the residue name), "chain" (always treats column 21 as being a part of the chain name) and "ignore" (just ignores whatever is in column 21). "resname" and "chain" would of course ignore column 21 if it is blank for a given line.
I don't know how much work it would be to implement something like this, but i think it would be helpful as users can then be certain about how their pdb files are read by the program.
Kind regards, Mikkel Dahl Andreasen _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
participants (3)
-
Eric Pettersen
-
mdahlandreasen@hotmail.com
-
Mikkel Dahl Andreasen