.pdb from Protein Data Bank has more chains than expected
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
Hi Phil, You don't need to attach the data if it's accessible from the PDB. If you go to the RCSB page for this entry https://www.rcsb.org/structure/7FDL
... and keep clicking through the different assemblies displayed in the upper left you eventually get to Asymmetric Unit which shows all the copies. So the file contains all that just because it's all in the crystallographic asymmetric unit (the copies may have slightly different conformations from one another). It is not something one would know a priori without reading the paper or looking at the data on the website.
Here is information about the asymmetric unit vs. biological assemblies at the RCSB PDB https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordinates-and-biological-assemblies
OK back to ChimeraX: You can hide or delete the part you don't want using basic command-line specifications, e.g.
hide ~/A,B
(hide atoms that are NOT chains A,B)
How to specify atoms, residues, chains in commands: https://rbvi.ucsf.edu/chimerax/docs/user/commands/atomspec.html
...or if it is too hard to figure out or remember commands, you can do it with the menu
Actions... Atoms/Bonds... hide Select... Chains... Transcription factor EGL1... A Actions... Atoms/Bonds... show Select... Chains... Transcription factor WER... B Actions... Atoms/Bonds... show Select... Clear
I hope this helps, Elaine ----- Elaine C. Meng, Ph.D. UCSF Chimera(X) team Department of Pharmaceutical Chemistry University of California, San Francisco
On Nov 9, 2022, at 3:14 PM, McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu wrote:
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so.
In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them.
Best wishes Kevin
From: ChimeraX-users chimerax-users-bounces@cgl.ucsf.edu on behalf of McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
Hi Elaine and Kevin,
Thanks to both of you. I now better understand the information provided in the .pdb and the what it represents. It appears that chain A and B are represented as one complex, C and D are another complex, as well as E+F, G+H, I+J, and K+L. I then downloaded the first biological assembly, and it used chains K and L. Is the first biological assembly considered the "best" or "most representative" based on all of the data? Or do all of the biological assemblies have equal value. Does one of the models represent a summary of all of the crystal data best?
Here is what I am trying to do. I am analyzing my protein complex strictly based on AF2. I want to show that AF2 modeling of the heterocomplex I am working with is "accurate". I also created the AF2 model of the two proteins found in the 7FDL PDB record. Those two proteins are homologs of the proteins I am working with. I used matchmaker in Chimerax to compare the crystal structure to the AF2 model, and the RMSD was 0.782, which appears to imply the AF2 model is representative of the crystal model. Therefore, the AF2 model is a "trusted" representative of the complex I am working with.
As always, thanks for any comments, advise, and your patience as I go through my crash course in protein structure analysis.
Phil
________________________________ From: Dr. Kevin M Jude kjude@stanford.edu Sent: Wednesday, November 9, 2022 7:30 PM To: McClean, Phillip phillip.mcclean@ndsu.edu; chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so.
In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them.
Best wishes
Kevin
From: ChimeraX-users chimerax-users-bounces@cgl.ucsf.edu on behalf of McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
Hi Phil, Regarding your questions about assemblies, I don't know off the top of my head. I can only recommend reading the page linked in my previous reply:
Here is information about the asymmetric unit vs. biological assemblies at the RCSB PDB https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordinates-and-biological-assemblies
Best, Elaine
On Nov 10, 2022, at 9:25 AM, McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu wrote:
Hi Elaine and Kevin,
Thanks to both of you. I now better understand the information provided in the .pdb and the what it represents. It appears that chain A and B are represented as one complex, C and D are another complex, as well as E+F, G+H, I+J, and K+L. I then downloaded the first biological assembly, and it used chains K and L. Is the first biological assembly considered the "best" or "most representative" based on all of the data? Or do all of the biological assemblies have equal value. Does one of the models represent a summary of all of the crystal data best?
Here is what I am trying to do. I am analyzing my protein complex strictly based on AF2. I want to show that AF2 modeling of the heterocomplex I am working with is "accurate". I also created the AF2 model of the two proteins found in the 7FDL PDB record. Those two proteins are homologs of the proteins I am working with. I used matchmaker in Chimerax to compare the crystal structure to the AF2 model, and the RMSD was 0.782, which appears to imply the AF2 model is representative of the crystal model. Therefore, the AF2 model is a "trusted" representative of the complex I am working with.
As always, thanks for any comments, advise, and your patience as I go through my crash course in protein structure analysis.
Phil
From: Dr. Kevin M Jude kjude@stanford.edu Sent: Wednesday, November 9, 2022 7:30 PM To: McClean, Phillip phillip.mcclean@ndsu.edu; chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so.
In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them.
Best wishes Kevin
From: ChimeraX-users chimerax-users-bounces@cgl.ucsf.edu on behalf of McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
Hi Elaine,
Thanks for your response. I found this link at PDB since my initial post.
https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-ass...
Again, thanks for your help.
Phil ________________________________ From: Elaine Meng meng@cgl.ucsf.edu Sent: Thursday, November 10, 2022 12:26 PM To: McClean, Phillip phillip.mcclean@ndsu.edu Cc: chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Phil, Regarding your questions about assemblies, I don't know off the top of my head. I can only recommend reading the page linked in my previous reply:
Here is information about the asymmetric unit vs. biological assemblies at the RCSB PDB https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordinates-and-biological-assemblies
Best, Elaine
On Nov 10, 2022, at 9:25 AM, McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu wrote:
Hi Elaine and Kevin,
Thanks to both of you. I now better understand the information provided in the .pdb and the what it represents. It appears that chain A and B are represented as one complex, C and D are another complex, as well as E+F, G+H, I+J, and K+L. I then downloaded the first biological assembly, and it used chains K and L. Is the first biological assembly considered the "best" or "most representative" based on all of the data? Or do all of the biological assemblies have equal value. Does one of the models represent a summary of all of the crystal data best?
Here is what I am trying to do. I am analyzing my protein complex strictly based on AF2. I want to show that AF2 modeling of the heterocomplex I am working with is "accurate". I also created the AF2 model of the two proteins found in the 7FDL PDB record. Those two proteins are homologs of the proteins I am working with. I used matchmaker in Chimerax to compare the crystal structure to the AF2 model, and the RMSD was 0.782, which appears to imply the AF2 model is representative of the crystal model. Therefore, the AF2 model is a "trusted" representative of the complex I am working with.
As always, thanks for any comments, advise, and your patience as I go through my crash course in protein structure analysis.
Phil
From: Dr. Kevin M Jude kjude@stanford.edu Sent: Wednesday, November 9, 2022 7:30 PM To: McClean, Phillip phillip.mcclean@ndsu.edu; chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: Re: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Phil, the pdb file represents the asymmetric unit of the crystal, the smallest unit that can be propagated by the crystal symmetry operators to regenerate the crystal lattice. It is common for proteins to crystallize in a form that also contains noncrystallographic symmetry, ie symmetry operations that can’t be propagated through the crystal lattice. Sometimes this is biologically significant, eg in the case of a homoassembly, but other times it is simply the lowest energy way to fill a lattice. In such a case, multiple copies of the protein must be modeled to fill the asymmetric unit. Usually the copies are identical, or nearly so.
In chimeraX, you can select copies of chains you aren’t interested in and either hide or delete them.
Best wishes Kevin
From: ChimeraX-users chimerax-users-bounces@cgl.ucsf.edu on behalf of McClean, Phillip via ChimeraX-users chimerax-users@cgl.ucsf.edu Date: Wednesday, November 9, 2022 at 4:47 PM To: chimerax-users@cgl.ucsf.edu chimerax-users@cgl.ucsf.edu Subject: [chimerax-users] .pdb from Protein Data Bank has more chains than expected
Hi Everyone,
I downloaded the 7FDL .pdb from the Protein Data Bank. I was expecting just two chains (it is a heteromeric complex), but I found 12 chains in the file (A-L). I was expecting two chains because the manuscript suggested they were just modeling protein fragments of the two protein. It appears that chains that A,C,E,G,I,K are multiple models of one fragment, while chains B,D,F,H,J,L are multiple models of the second fragment.
Two questions. Why are multiple structures reported, and how to I visualize just two fragments at the same time, for example A and B.
I have attached the .pdb file.
Thanks for any advise.
Phil McClean
participants (3)
-
Dr. Kevin M Jude
-
Elaine Meng
-
McClean, Phillip