pdb file problems and chimerx
I will be using ChimeraX much more extensively this fall, and I think the answer to the question below might be useful (and not just for me). Does there exist some list (or lists) of common pdb file errors? One probably can't cover all possibilities, but perhaps there are some common ones that one could find automatically by writing a python program to examine the file. (Perhaps some programs already exist.) Recently there was a case of misplaced spaces and also of duplicated lines. Presumably one could (perhaps) add a search for the correct number of atoms in each residue. What about the existence/presence of water or heavy ions? Perhaps searching for unexpected atom-atom-atom angles or bond lengths (or even separations)? Perhaps an AI agent might be trained? Thanks Steve Brawer
Hi Steve, Our Introduction to PDB Format <https://www.rbvi.ucsf.edu/chimerax/docs/user/formats/pdbintro.html> page covers many of the common errors. It kind of focuses on errors from hand editing of files. For PDB files generated by programs, I find that the most common errors are misaligned atom names (already covered in that page), "stale" CONECT records passed through from the input file to the output file and that no longer correspond to the atom records, and which therefore produce many incorrect bonds, and finally omission of the alternate location indicator in column 17 of the ATOM/HETATM records, resulting in duplicate atoms in the residue. --Eric Eric Pettersen UCSF Computer Graphics Lab
On May 15, 2026, at 7:17 AM, Steven Brawer via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
I will be using ChimeraX much more extensively this fall, and I think the answer to the question below might be useful (and not just for me).
Does there exist some list (or lists) of common pdb file errors? One probably can't cover all possibilities, but perhaps there are some common ones that one could find automatically by writing a python program to examine the file. (Perhaps some programs already exist.) Recently there was a case of misplaced spaces and also of duplicated lines. Presumably one could (perhaps) add a search for the correct number of atoms in each residue. What about the existence/presence of water or heavy ions? Perhaps searching for unexpected atom-atom-atom angles or bond lengths (or even separations)?
Perhaps an AI agent might be trained?
Thanks
Steve Brawer _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Hi Steve, When you open a PDB file in ChimeraX it reports in the Log many syntactic errors in the file. For instance in the ChimeraX mailing list discussion yesterday about the PDB file that had data in the wrong columns, opening that file reported in the Log the problems shown below. You also ask about data quality errors where the file format is correct, like unphysical chemistry (bad bond lengths, angles, clashes). ChimeraX has some tools like Clashes to help identify those problems. If you atomic structure comes from the PDB, every entry has a validation report. For example for PDB 1xyz here is the PDF validation report that covers many data quality problems. https://files.rcsb.org/validation/view/1xyz_full_validation.pdf A major software program for identifying those errors is called Molprobity. We plan to include Molprobity validations in ChimeraX through a user interface that runs the Phenix structure determination software but we have not started that project yet. Tom Ignored bad PDB record found on line 1384 HETATM 1326 N AKG D 146 -3.172 -48.663 -42.861 1.00 57.79 N Ignored bad PDB record found on line 1385 HETATM 1327 C5 AKG D 146 -3.325 -49.926 -43.595 1.00 60.48 C Ignored bad PDB record found on line 1386 HETATM 1328 C AKG D 146 -4.345 -50.910 -43.015 1.00 57.22 C Ignored bad PDB record found on line 1387 HETATM 1329 O AKG D 146 -5.534 -50.795 -43.265 1.00 54.67 O Ignored bad PDB record found on line 1388 HETATM 1330 C6 AKG D 146 -1.965 -50.574 -43.892 1.00 69.38 C 1 messages similar to the above omitted Duplicate atom serial number found: 3839 Ignored bad PDB record found on line 3914 HETATM 3840 N AKG C 146 -32.244 -56.942 -44.376 1.00 61.21 N Ignored bad PDB record found on line 3915 HETATM 3841 C5 AKG C 146 -31.982 -55.679 -45.080 1.00 62.75 C Ignored bad PDB record found on line 3916 HETATM 3842 C AKG C 146 -31.001 -54.730 -44.393 1.00 56.75 C Ignored bad PDB record found on line 3917 HETATM 3843 O AKG C 146 -29.799 -54.876 -44.535 1.00 56.51 O Ignored bad PDB record found on line 3918 HETATM 3844 C6 AKG C 146 -33.295 -54.997 -45.501 1.00 57.97 C 1 messages similar to the above omitted Duplicate atom serial number found: 3846 CONECT record to nonexistent atom: (3, 3844) CONECT record to nonexistent atom: (17, 1330) CONECT record for nonexistent atom: 1330
On May 15, 2026, at 9:54 AM, Eric Pettersen via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Steve, Our Introduction to PDB Format <https://www.rbvi.ucsf.edu/chimerax/docs/user/formats/pdbintro.html> page covers many of the common errors. It kind of focuses on errors from hand editing of files. For PDB files generated by programs, I find that the most common errors are misaligned atom names (already covered in that page), "stale" CONECT records passed through from the input file to the output file and that no longer correspond to the atom records, and which therefore produce many incorrect bonds, and finally omission of the alternate location indicator in column 17 of the ATOM/HETATM records, resulting in duplicate atoms in the residue.
--Eric
Eric Pettersen UCSF Computer Graphics Lab
On May 15, 2026, at 7:17 AM, Steven Brawer via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
I will be using ChimeraX much more extensively this fall, and I think the answer to the question below might be useful (and not just for me).
Does there exist some list (or lists) of common pdb file errors? One probably can't cover all possibilities, but perhaps there are some common ones that one could find automatically by writing a python program to examine the file. (Perhaps some programs already exist.) Recently there was a case of misplaced spaces and also of duplicated lines. Presumably one could (perhaps) add a search for the correct number of atoms in each residue. What about the existence/presence of water or heavy ions? Perhaps searching for unexpected atom-atom-atom angles or bond lengths (or even separations)?
Perhaps an AI agent might be trained?
Thanks
Steve Brawer _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
Thank you Eric and Tom. That is much appreciated. Steve
On May 15, 2026, at 3:25 PM, Tom Goddard via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Steve,
When you open a PDB file in ChimeraX it reports in the Log many syntactic errors in the file. For instance in the ChimeraX mailing list discussion yesterday about the PDB file that had data in the wrong columns, opening that file reported in the Log the problems shown below.
You also ask about data quality errors where the file format is correct, like unphysical chemistry (bad bond lengths, angles, clashes). ChimeraX has some tools like Clashes to help identify those problems. If you atomic structure comes from the PDB, every entry has a validation report. For example for PDB 1xyz here is the PDF validation report that covers many data quality problems.
https://files.rcsb.org/validation/view/1xyz_full_validation.pdf
A major software program for identifying those errors is called Molprobity. We plan to include Molprobity validations in ChimeraX through a user interface that runs the Phenix structure determination software but we have not started that project yet.
Tom
Ignored bad PDB record found on line 1384 HETATM 1326 N AKG D 146 -3.172 -48.663 -42.861 1.00 57.79 N
Ignored bad PDB record found on line 1385 HETATM 1327 C5 AKG D 146 -3.325 -49.926 -43.595 1.00 60.48 C
Ignored bad PDB record found on line 1386 HETATM 1328 C AKG D 146 -4.345 -50.910 -43.015 1.00 57.22 C
Ignored bad PDB record found on line 1387 HETATM 1329 O AKG D 146 -5.534 -50.795 -43.265 1.00 54.67 O
Ignored bad PDB record found on line 1388 HETATM 1330 C6 AKG D 146 -1.965 -50.574 -43.892 1.00 69.38 C
1 messages similar to the above omitted Duplicate atom serial number found: 3839 Ignored bad PDB record found on line 3914 HETATM 3840 N AKG C 146 -32.244 -56.942 -44.376 1.00 61.21 N
Ignored bad PDB record found on line 3915 HETATM 3841 C5 AKG C 146 -31.982 -55.679 -45.080 1.00 62.75 C
Ignored bad PDB record found on line 3916 HETATM 3842 C AKG C 146 -31.001 -54.730 -44.393 1.00 56.75 C
Ignored bad PDB record found on line 3917 HETATM 3843 O AKG C 146 -29.799 -54.876 -44.535 1.00 56.51 O
Ignored bad PDB record found on line 3918 HETATM 3844 C6 AKG C 146 -33.295 -54.997 -45.501 1.00 57.97 C
1 messages similar to the above omitted Duplicate atom serial number found: 3846 CONECT record to nonexistent atom: (3, 3844) CONECT record to nonexistent atom: (17, 1330) CONECT record for nonexistent atom: 1330
On May 15, 2026, at 9:54 AM, Eric Pettersen via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Steve, Our Introduction to PDB Format <https://www.rbvi.ucsf.edu/chimerax/docs/user/formats/pdbintro.html> page covers many of the common errors. It kind of focuses on errors from hand editing of files. For PDB files generated by programs, I find that the most common errors are misaligned atom names (already covered in that page), "stale" CONECT records passed through from the input file to the output file and that no longer correspond to the atom records, and which therefore produce many incorrect bonds, and finally omission of the alternate location indicator in column 17 of the ATOM/HETATM records, resulting in duplicate atoms in the residue.
--Eric
Eric Pettersen UCSF Computer Graphics Lab
On May 15, 2026, at 7:17 AM, Steven Brawer via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
I will be using ChimeraX much more extensively this fall, and I think the answer to the question below might be useful (and not just for me).
Does there exist some list (or lists) of common pdb file errors? One probably can't cover all possibilities, but perhaps there are some common ones that one could find automatically by writing a python program to examine the file. (Perhaps some programs already exist.) Recently there was a case of misplaced spaces and also of duplicated lines. Presumably one could (perhaps) add a search for the correct number of atoms in each residue. What about the existence/presence of water or heavy ions? Perhaps searching for unexpected atom-atom-atom angles or bond lengths (or even separations)?
Perhaps an AI agent might be trained?
Thanks
Steve Brawer _______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/
participants (3)
-
Eric Pettersen -
Steven Brawer -
Tom Goddard