.pdb from Protein Data Bank has more chains than expected

Hi Everyone, I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment. Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B. I have attached the .pdb file. Thanks for any advise. Phil McClean

Hi Phil, You don't need to attach the data if it's accessible from the PDB. If you go to the RCSB page for this entry <https://www.rcsb.org/structure/7FDL> ... and keep clicking through the different assemblies displayed in the upper left you eventually get to Asymmetric Unit which shows all the copies. So the file contains all that just because it's all in the crystallographic asymmetric unit (the copies may have slightly different conformations from one another). It is not something one would know a priori without reading the paper or looking at the data on the website. Here is information about the asymmetric unit vs. biological assemblies at the RCSB PDB <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordi...> OK back to ChimeraX: You can hide or delete the part you don't want using basic command-line specifications, e.g. hide ~/A,B (hide atoms that are NOT chains A,B) How to specify atoms, residues, chains in commands: <https://rbvi.ucsf.edu/chimerax/docs/user/commands/atomspec.html> ...or if it is too hard to figure out or remember commands, you can do it with the menu Actions... Atoms/Bonds... hide Select... Chains... Transcription factor EGL1... A Actions... Atoms/Bonds... show Select... Chains... Transcription factor WER... B Actions... Atoms/Bonds... show Select... Clear I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Nov 9, 2022, at 3:14 PM, McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean

Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so. In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them. Best wishes Kevin From: ChimeraX-users <chimerax-users-bounces@cgl.ucsf.edu> on behalf of McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected Hi Everyone, I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment. Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B. I have attached the .pdb file. Thanks for any advise. Phil McClean

Hi Elaine and Kevin, Thanks to both of you. I now better understand the information provided in the .pdb and the what it represents. It appears that chain A and B are represented as one complex, C and D are another complex, as well as E+F, G+H, I+J, and K+L. I then downloaded the first biological assembly, and it used chains K and L. Is the first biological assembly considered the "best" or "most representative" based on all of the data? Or do all of the biological assemblies have equal value. Does one of the models represent a summary of all of the crystal data best? Here is what I am trying to do. I am analyzing my protein complex strictly based on AF2. I want to show that AF2 modeling of the heterocomplex I am working with is "accurate". I also created the AF2 model of the two proteins found in the 7FDL PDB record. Those two proteins are homologs of the proteins I am working with. I used matchmaker in Chimerax to compare the crystal structure to the AF2 model, and the RMSD was 0.782, which appears to imply the AF2 model is representative of the crystal model. Therefore, the AF2 model is a "trusted" representative of the complex I am working with. As always, thanks for any comments, advise, and your patience as I go through my crash course in protein structure analysis. Phil ________________________________ From: Dr. Kevin M Jude <kjude@stanford.edu> Sent: Wednesday, November 9, 2022 7:30 PM To: McClean, Phillip <phillip.mcclean@ndsu.edu>; chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so. In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them. Best wishes Kevin From: ChimeraX-users <chimerax-users-bounces@cgl.ucsf.edu> on behalf of McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected Hi Everyone, I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment. Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B. I have attached the .pdb file. Thanks for any advise. Phil McClean

Hi Phil, Regarding your questions about assemblies, I don't know off the top of my head. I can only recommend reading the page linked in my previous reply:
Here is information about the asymmetric unit vs. biological assemblies at the RCSB PDB <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordi...>
Best, Elaine
On Nov 10, 2022, at 9:25 AM, McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Elaine and Kevin,
Thanks to both of you. I now better understand the information provided in the .pdb and the what it represents. It appears that chain A and B are represented as one complex, C and D are another complex, as well as E+F, G+H, I+J, and K+L. I then downloaded the first biological assembly, and it used chains K and L. Is the first biological assembly considered the "best" or "most representative" based on all of the data? Or do all of the biological assemblies have equal value. Does one of the models represent a summary of all of the crystal data best?
Here is what I am trying to do. I am analyzing my protein complex strictly based on AF2. I want to show that AF2 modeling of the heterocomplex I am working with is "accurate". I also created the AF2 model of the two proteins found in the 7FDL PDB record. Those two proteins are homologs of the proteins I am working with. I used matchmaker in Chimerax to compare the crystal structure to the AF2 model, and the RMSD was 0.782, which appears to imply the AF2 model is representative of the crystal model. Therefore, the AF2 model is a "trusted" representative of the complex I am working with.
As always, thanks for any comments, advise, and your patience as I go through my crash course in protein structure analysis.
Phil
From: Dr. Kevin M Jude <kjude@stanford.edu> Sent: Wednesday, November 9, 2022 7:30 PM To: McClean, Phillip <phillip.mcclean@ndsu.edu>; chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so.
In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them.
Best wishes Kevin
From: ChimeraX-users <chimerax-users-bounces@cgl.ucsf.edu> on behalf of McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean

Hi Elaine, Thanks for your response. I found this link at PDB since my initial post. https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-ass... Again, thanks for your help. Phil ________________________________ From: Elaine Meng <meng@cgl.ucsf.edu> Sent: Thursday, November 10, 2022 12:26 PM To: McClean, Phillip <phillip.mcclean@ndsu.edu> Cc: chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected Hi Phil, Regarding your questions about assemblies, I don't know off the top of my head. I can only recommend reading the page linked in my previous reply:
Here is information about the asymmetric unit vs. biological assemblies at the RCSB PDB <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordi...>
Best, Elaine
On Nov 10, 2022, at 9:25 AM, McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Elaine and Kevin,
Thanks to both of you. I now better understand the information provided in the .pdb and the what it represents. It appears that chain A and B are represented as one complex, C and D are another complex, as well as E+F, G+H, I+J, and K+L. I then downloaded the first biological assembly, and it used chains K and L. Is the first biological assembly considered the "best" or "most representative" based on all of the data? Or do all of the biological assemblies have equal value. Does one of the models represent a summary of all of the crystal data best?
Here is what I am trying to do. I am analyzing my protein complex strictly based on AF2. I want to show that AF2 modeling of the heterocomplex I am working with is "accurate". I also created the AF2 model of the two proteins found in the 7FDL PDB record. Those two proteins are homologs of the proteins I am working with. I used matchmaker in Chimerax to compare the crystal structure to the AF2 model, and the RMSD was 0.782, which appears to imply the AF2 model is representative of the crystal model. Therefore, the AF2 model is a "trusted" representative of the complex I am working with.
As always, thanks for any comments, advise, and your patience as I go through my crash course in protein structure analysis.
Phil
From: Dr. Kevin M Jude <kjude@stanford.edu> Sent: Wednesday, November 9, 2022 7:30 PM To: McClean, Phillip <phillip.mcclean@ndsu.edu>; chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so.
In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them.
Best wishes Kevin
From: ChimeraX-users <chimerax-users-bounces@cgl.ucsf.edu> on behalf of McClean, Phillip via ChimeraX-users <chimerax-users@cgl.ucsf.edu> Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
participants (3)
-
Dr. Kevin M Jude
-
Elaine Meng
-
McClean, Phillip