
Hi all I am currently working on a project, which needs to rebuild the statistics using modified protein structures, say, some other set of proteins. There is one quick question about interpreting the BBDep Library. What's the exact definition of the fourth column "N" in the library file format? I thought for each residue within some (phi,psi) angle pair bin, it is the total number of sidechains found in all of the 850 proteins. However, I checked ASN and GLN these two types of amino acids using "Chimera" to return the (phi,psi) dihedral angles for each ASN and GLN residue among 850 proteins with tens of exceptions which have no dihedral angles returned from Chimera(I just ignore them). Then, I count the number of sidechains for each (phi,psi) bin(10*10 degree). It turned out that this number is not consistent with the "N" value in BBDep. e.g. For ASN at bin (phi=-70,psi=-40), Chimera returns 372 sidechains in total from those 850 proteins, while BBDep says 2348. The difference is so large and very common over almost all the bins. I thought probably there might be some stupid errors in my understanding of the "N" value, but could you point that out for me? Or, did Dunbrack guys use another set of proteins other than those 850 ones? Best regards and thanks, Yang Lei Electrical & Computer Engineering University of Massachusetts, Amherst

Hi Yang, The details of the library format can be found here: http://dunbrack.fccc.edu/bbdep/bbdepformat.php It includes a list of the 850 proteins they used and their filtering criteria. According to that document, N is " the number of ... side chains in this bin in the data set". I ran a script across the PDBs in the set and got a number very similar to yours: 379. So either N is wrong or the description of N is wrong, or the description of the data set is wrong. There are only 8919 ASNs total in the chains of the data set so there is basically no way that 2348 of them could fall in a 10 degree by 10 degree bin. Chimera doesn't really care about N; it only cares about the p values, so as long as they're right it's not a problem for Chimera. Of course, wildly wrong N values are alarming! I'll be sending mail to Roland Dunbrack to see if he has any insight on this issue, and I'll cc you. --Eric Eric Pettersen UCSF Computer Graphics Lab http://www.cgl.ucsf.edu On Sep 16, 2009, at 2:09 PM, ylei@ecs.umass.edu wrote:
Hi all
I am currently working on a project, which needs to rebuild the statistics using modified protein structures, say, some other set of proteins.
There is one quick question about interpreting the BBDep Library. What's the exact definition of the fourth column "N" in the library file format? I thought for each residue within some (phi,psi) angle pair bin, it is the total number of sidechains found in all of the 850 proteins. However, I checked ASN and GLN these two types of amino acids using "Chimera" to return the (phi,psi) dihedral angles for each ASN and GLN residue among 850 proteins with tens of exceptions which have no dihedral angles returned from Chimera(I just ignore them). Then, I count the number of sidechains for each (phi,psi) bin(10*10 degree). It turned out that this number is not consistent with the "N" value in BBDep. e.g. For ASN at bin (phi=-70,psi=-40), Chimera returns 372 sidechains in total from those 850 proteins, while BBDep says 2348. The difference is so large and very common over almost all the bins. I thought probably there might be some stupid errors in my understanding of the "N" value, but could you point that out for me? Or, did Dunbrack guys use another set of proteins other than those 850 ones?
Best regards and thanks,
Yang Lei
Electrical & Computer Engineering University of Massachusetts, Amherst
_______________________________________________ Chimera-dev mailing list Chimera-dev@cgl.ucsf.edu http://www.cgl.ucsf.edu/mailman/listinfo/chimera-dev
participants (2)
-
Eric Pettersen
-
ylei@ecs.umass.edu