Seeking guidance on Volume-volume fitting
data:image/s3,"s3://crabby-images/bdf7c/bdf7c0781b792b46a907fab235c9e1abf435301b" alt=""
Dear Chimera, I am seeking guidance on what would be best practice in doing volume-volume fitting, and what expectations should I have on number of trial conformations. For instance would it be beneficial to resample one of the volumes to the same grid as the other volume. What about volumes with different resolutions, do this require any special handling. In the case below it would probably be more efficient to just align the principal moments of inertia of the envelopes above the contour level and then do a search from there, though this will only work well if the volumes are very similar. In an initial study I used 3 EM volumes from EMDB, (EMD-5500, EMD-5501, and EMD-2017). They are all 30S E. coli ribosomes of similar resolution (12.9, 14.0, 13.5 Å). I thought that this would be a relatively easy starting point, with the intent to then go on to fit 30S subunit in some of the many E. coli 70S EM volumes that are available. The volumes EMD-5500 and EMD-5501 are in similar orientation, while EMD-2017 is in a completely different orientation. The grids are similar but not identical, 125^3 vs 128^3. I adjusted the contour levels from the EMDB recommended values to make the enclosed volumes more similar in size. Contour levels used: EMD-5500 -2.8 -> -2.5 EMD-5501 -2.8 EMD-2017 39 -> 32 I am using scripts like the one shown here: from chimera import runCommand as rc from chimera.tkgui import saveReplyLog rc("open data/EMD-5500.map") rc("volume #0 level -2.5 transparency 0.5") rc("open data/EMD-5501.map") rc("volume #1 level -2.8 transparency 0.5") rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2") saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt') It seems correlation about mean is the best metric for volume-volume fitting, and that it is best to only use points inside the envelope. The maps EMD-5500, and EMD-5501 have the unusual feature that the density range is shifted downwards so that the average is well below 0. I was concerned about that and moved the density range, but with the fitmap parameters above that did not seem to have any significant impact (with other sets of parameters the density range appears to be an issue). In this case, it is relatively easy to see when you have found the "good" fit, all other fits have clearly worse statistics. What I was surprised about was the number of trial conformations needed to be reasonable certain to find the "good" fit. 5500 in 5501 7 times in 5000 5501 in 5500 2 times in 5000 5500 in 2017 0 times in 5000 5500 in 2017 8 times in 50000 2017 in 5501 0 times in 5000 Many Thanks, Ingvar Lagerstedt
data:image/s3,"s3://crabby-images/2656e/2656e7e3cced57f8861c05fdcf4651bdaf7ac24b" alt=""
Hi Ingvar, It sounds like you want to automate the fitting of one EMDB ribosome map to another, and probably other structures too. Probably you are aiming to have it find the best fit as often as possible without taking too long. It is not easy to guarantee you have the best fit, even using an exhaustive search can miss it unless you take very fine rotational and translation steps. Here are answers to some of your questions. Should one map be resampled before fitting in the other map? No, I don't see any advantage to doing that. The fit is done on the grid points of the first map within the displayed contour level, and the second map is interpolated at those grid points. Could aligning principle inertia axes help get the best fit. Yes, in some cases that will work. And if you only optimize that initial it will be very fast (less than a second typically). But unless you search lots of other possible orientations you will miss the best fit in some percentage of cases. So it doesn't seem particularly useful if reliability of automated fitting is your goal. Does it matter if grid sizes are different? No. Trilinear interpolation is being used, so there is no advantage of the grid sizes or spacings matching. Should any correction be done for fitting different resolution maps to each other? I don't think there would be much advantage to say smoothing the higher resolution map to match the lower resolution map. I don't think that will improve convergence to the best fit. And I don't think it will change the best fit -- although maybe it is theoretically possible. If you are using the correlation about mean metric then shifting the density values will have no effect, because the mean is subtracted from the density of both maps. But if you are using correlation or overlap a shift may be needed. I am unpleasantly surprised by your low success rate getting the best fit in 5000 or 50000 searched positions. My experience has been that it is found in a few hundred positions. I will need to try your examples. I'll report in a later email what I find. Tom On Apr 22, 2013, at 3:37 AM, ingvar wrote:
Dear Chimera,
I am seeking guidance on what would be best practice in doing volume-volume fitting, and what expectations should I have on number of trial conformations. For instance would it be beneficial to resample one of the volumes to the same grid as the other volume. What about volumes with different resolutions, do this require any special handling. In the case below it would probably be more efficient to just align the principal moments of inertia of the envelopes above the contour level and then do a search from there, though this will only work well if the volumes are very similar.
In an initial study I used 3 EM volumes from EMDB, (EMD-5500, EMD-5501, and EMD-2017). They are all 30S E. coli ribosomes of similar resolution (12.9, 14.0, 13.5 Å). I thought that this would be a relatively easy starting point, with the intent to then go on to fit 30S subunit in some of the many E. coli 70S EM volumes that are available. The volumes EMD-5500 and EMD-5501 are in similar orientation, while EMD-2017 is in a completely different orientation. The grids are similar but not identical, 125^3 vs 128^3. I adjusted the contour levels from the EMDB recommended values to make the enclosed volumes more similar in size.
Contour levels used: EMD-5500 -2.8 -> -2.5 EMD-5501 -2.8 EMD-2017 39 -> 32
I am using scripts like the one shown here:
from chimera import runCommand as rc from chimera.tkgui import saveReplyLog rc("open data/EMD-5500.map") rc("volume #0 level -2.5 transparency 0.5") rc("open data/EMD-5501.map") rc("volume #1 level -2.8 transparency 0.5") rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2") saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt')
It seems correlation about mean is the best metric for volume-volume fitting, and that it is best to only use points inside the envelope.
The maps EMD-5500, and EMD-5501 have the unusual feature that the density range is shifted downwards so that the average is well below 0. I was concerned about that and moved the density range, but with the fitmap parameters above that did not seem to have any significant impact (with other sets of parameters the density range appears to be an issue).
In this case, it is relatively easy to see when you have found the "good" fit, all other fits have clearly worse statistics.
What I was surprised about was the number of trial conformations needed to be reasonable certain to find the "good" fit.
5500 in 5501 7 times in 5000 5501 in 5500 2 times in 5000 5500 in 2017 0 times in 5000 5500 in 2017 8 times in 50000 2017 in 5501 0 times in 5000
Many Thanks, Ingvar Lagerstedt
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
data:image/s3,"s3://crabby-images/bdf7c/bdf7c0781b792b46a907fab235c9e1abf435301b" alt=""
Hi Tom, Thank you for explaining the inner workings of the volume-volume fitting in Chimera. The long term goal is to set up a searchable database with what volumes fit in each other, not necessarily limited to EM, and also provide a service where a user can upload a volume and see if it fits in any existing volume in the archive. It sounds like I am used a good set options. I will try some other examples to see if I have better luck in finding the best fit there, may be a set of 70 S ribosomes. /Ingvar On 2013-04-24 06:09, Tom Goddard wrote:
Hi Ingvar,
It sounds like you want to automate the fitting of one EMDB ribosome map to another, and probably other structures too. Probably you are aiming to have it find the best fit as often as possible without taking too long. It is not easy to guarantee you have the best fit, even using an exhaustive search can miss it unless you take very fine rotational and translation steps. Here are answers to some of your questions.
Should one map be resampled before fitting in the other map? No, I don't see any advantage to doing that. The fit is done on the grid points of the first map within the displayed contour level, and the second map is interpolated at those grid points.
Could aligning principle inertia axes help get the best fit. Yes, in some cases that will work. And if you only optimize that initial it will be very fast (less than a second typically). But unless you search lots of other possible orientations you will miss the best fit in some percentage of cases. So it doesn't seem particularly useful if reliability of automated fitting is your goal.
Does it matter if grid sizes are different? No. Trilinear interpolation is being used, so there is no advantage of the grid sizes or spacings matching.
Should any correction be done for fitting different resolution maps to each other? I don't think there would be much advantage to say smoothing the higher resolution map to match the lower resolution map. I don't think that will improve convergence to the best fit. And I don't think it will change the best fit -- although maybe it is theoretically possible.
If you are using the correlation about mean metric then shifting the density values will have no effect, because the mean is subtracted from the density of both maps. But if you are using correlation or overlap a shift may be needed.
I am unpleasantly surprised by your low success rate getting the best fit in 5000 or 50000 searched positions. My experience has been that it is found in a few hundred positions. I will need to try your examples. I'll report in a later email what I find.
Tom
On Apr 22, 2013, at 3:37 AM, ingvar wrote:
Dear Chimera,
I am seeking guidance on what would be best practice in doing volume-volume fitting, and what expectations should I have on number of trial conformations. For instance would it be beneficial to resample one of the volumes to the same grid as the other volume. What about volumes with different resolutions, do this require any special handling. In the case below it would probably be more efficient to just align the principal moments of inertia of the envelopes above the contour level and then do a search from there, though this will only work well if the volumes are very similar.
In an initial study I used 3 EM volumes from EMDB, (EMD-5500, EMD-5501, and EMD-2017). They are all 30S E. coli ribosomes of similar resolution (12.9, 14.0, 13.5 Å). I thought that this would be a relatively easy starting point, with the intent to then go on to fit 30S subunit in some of the many E. coli 70S EM volumes that are available. The volumes EMD-5500 and EMD-5501 are in similar orientation, while EMD-2017 is in a completely different orientation. The grids are similar but not identical, 125^3 vs 128^3. I adjusted the contour levels from the EMDB recommended values to make the enclosed volumes more similar in size.
Contour levels used: EMD-5500 -2.8 -> -2.5 EMD-5501 -2.8 EMD-2017 39 -> 32
I am using scripts like the one shown here:
from chimera import runCommand as rc from chimera.tkgui import saveReplyLog rc("open data/EMD-5500.map") rc("volume #0 level -2.5 transparency 0.5") rc("open data/EMD-5501.map") rc("volume #1 level -2.8 transparency 0.5") rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2") saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt')
It seems correlation about mean is the best metric for volume-volume fitting, and that it is best to only use points inside the envelope.
The maps EMD-5500, and EMD-5501 have the unusual feature that the density range is shifted downwards so that the average is well below 0. I was concerned about that and moved the density range, but with the fitmap parameters above that did not seem to have any significant impact (with other sets of parameters the density range appears to be an issue).
In this case, it is relatively easy to see when you have found the "good" fit, all other fits have clearly worse statistics.
What I was surprised about was the number of trial conformations needed to be reasonable certain to find the "good" fit.
5500 in 5501 7 times in 5000 5501 in 5500 2 times in 5000 5500 in 2017 0 times in 5000 5500 in 2017 8 times in 50000 2017 in 5501 0 times in 5000
Many Thanks, Ingvar Lagerstedt
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
data:image/s3,"s3://crabby-images/2656e/2656e7e3cced57f8861c05fdcf4651bdaf7ac24b" alt=""
Hi Ingvar, I experimented with fitting the two very similar 30S ribosomal subunit structures EMD 5500 and 5501 to understand why you were needing 5000 search positions to locate the best fit while in most cases I've played with in the past it takes less than 100 search positions. Here's what I found out. The basic trouble is that the two maps have background density levels that are not at zero. They are at about -4.1 and -3.9 while the interesting contour level is around 2.5. Shifting the background level to zero and using the default "overlap" fitting metric makes fit search find the best fit in about 20 trial positions. (I did the shifts with Chimera Volume Filter, type Scale.) Shifts in the density values have a strong effect on the fitting when using the overlap or correlation (not about mean) metrics. It is a bad idea to use either of those two metrics if the background density is not zero in both maps. Tests fitting with those two methods produced horrible fits with hardly any overlap between maps and also seemed to have hundreds of different locally optimal fits. The correlation about mean metric subtracts mean density values and so shifts have no effect. But that metric also produces a huge number of locally optimal solutions -- the reason you had to try 5000 starting positions. The reason that "overlap" is the default metric is that it tends to have many fewer local best fits. So you can be quite far from the best solution and still find it as the local optimum. So my advice is to make sure the background density is 0, and use the default "overlap" fitting metric if you want the widest radius of convergence and fewest local best fits. How do you make the background density 0? One idea is to subtract the mean of all density values. That may be reasonable if the structure occupies only a small part of the map box, but becomes bad if the structure occupies a large fraction of the box. In the latter case trickier methods would have to be applied. If you trusted the "suggested contour level" you might only average values below that level. The real solution here is that EMDB should insist that deposited single-particle maps say what the background level -- ie the level outside the particles, or insist it is 0. If the maps can have any unspecified shift, then automated processing is not going to work. The same problem existed a few years ago when EMDB didn't have the "suggested contour level". It isn't possible to reliably guess a good contour level. If you are considering searches of a user supplied map against all EMDB maps I think you need it to be fast, and will need to sacrifice reliability to some extent. So your earlier suggestion about aligning principle axes and optimizing that fit might be good, except for nearly spherical particles like viruses. Even with viruses it would give a pretty random initial configuration but that might suffice since the fit would be likely to align symmetry axes. The main problem would be that if the user supplied map is a subpart of an EMDB map (half a ribosome, half a virus, etc...) it won't find the good fit if you just align principle axes. For that to work, the maps will have to represent the same object. Tom -------- Original Message -------- Subject: Re: [Chimera-users] Seeking guidance on Volume-volume fitting From: ingvar To: Tom Goddard Date: 4/24/13 5:10 AM
Hi Tom,
Thank you for explaining the inner workings of the volume-volume fitting in Chimera.
The long term goal is to set up a searchable database with what volumes fit in each other, not necessarily limited to EM, and also provide a service where a user can upload a volume and see if it fits in any existing volume in the archive.
It sounds like I am used a good set options. I will try some other examples to see if I have better luck in finding the best fit there, may be a set of 70 S ribosomes.
/Ingvar
On 2013-04-24 06:09, Tom Goddard wrote:
Hi Ingvar,
It sounds like you want to automate the fitting of one EMDB ribosome map to another, and probably other structures too. Probably you are aiming to have it find the best fit as often as possible without taking too long. It is not easy to guarantee you have the best fit, even using an exhaustive search can miss it unless you take very fine rotational and translation steps. Here are answers to some of your questions.
Should one map be resampled before fitting in the other map? No, I don't see any advantage to doing that. The fit is done on the grid points of the first map within the displayed contour level, and the second map is interpolated at those grid points.
Could aligning principle inertia axes help get the best fit. Yes, in some cases that will work. And if you only optimize that initial it will be very fast (less than a second typically). But unless you search lots of other possible orientations you will miss the best fit in some percentage of cases. So it doesn't seem particularly useful if reliability of automated fitting is your goal.
Does it matter if grid sizes are different? No. Trilinear interpolation is being used, so there is no advantage of the grid sizes or spacings matching.
Should any correction be done for fitting different resolution maps to each other? I don't think there would be much advantage to say smoothing the higher resolution map to match the lower resolution map. I don't think that will improve convergence to the best fit. And I don't think it will change the best fit -- although maybe it is theoretically possible.
If you are using the correlation about mean metric then shifting the density values will have no effect, because the mean is subtracted from the density of both maps. But if you are using correlation or overlap a shift may be needed.
I am unpleasantly surprised by your low success rate getting the best fit in 5000 or 50000 searched positions. My experience has been that it is found in a few hundred positions. I will need to try your examples. I'll report in a later email what I find.
Tom
On Apr 22, 2013, at 3:37 AM, ingvar wrote:
Dear Chimera,
I am seeking guidance on what would be best practice in doing volume-volume fitting, and what expectations should I have on number of trial conformations. For instance would it be beneficial to resample one of the volumes to the same grid as the other volume. What about volumes with different resolutions, do this require any special handling. In the case below it would probably be more efficient to just align the principal moments of inertia of the envelopes above the contour level and then do a search from there, though this will only work well if the volumes are very similar.
In an initial study I used 3 EM volumes from EMDB, (EMD-5500, EMD-5501, and EMD-2017). They are all 30S E. coli ribosomes of similar resolution (12.9, 14.0, 13.5 Å). I thought that this would be a relatively easy starting point, with the intent to then go on to fit 30S subunit in some of the many E. coli 70S EM volumes that are available. The volumes EMD-5500 and EMD-5501 are in similar orientation, while EMD-2017 is in a completely different orientation. The grids are similar but not identical, 125^3 vs 128^3. I adjusted the contour levels from the EMDB recommended values to make the enclosed volumes more similar in size.
Contour levels used: EMD-5500 -2.8 -> -2.5 EMD-5501 -2.8 EMD-2017 39 -> 32
I am using scripts like the one shown here:
from chimera import runCommand as rc from chimera.tkgui import saveReplyLog rc("open data/EMD-5500.map") rc("volume #0 level -2.5 transparency 0.5") rc("open data/EMD-5501.map") rc("volume #1 level -2.8 transparency 0.5") rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2") saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt')
It seems correlation about mean is the best metric for volume-volume fitting, and that it is best to only use points inside the envelope.
The maps EMD-5500, and EMD-5501 have the unusual feature that the density range is shifted downwards so that the average is well below 0. I was concerned about that and moved the density range, but with the fitmap parameters above that did not seem to have any significant impact (with other sets of parameters the density range appears to be an issue).
In this case, it is relatively easy to see when you have found the "good" fit, all other fits have clearly worse statistics.
What I was surprised about was the number of trial conformations needed to be reasonable certain to find the "good" fit.
5500 in 5501 7 times in 5000 5501 in 5500 2 times in 5000 5500 in 2017 0 times in 5000 5500 in 2017 8 times in 50000 2017 in 5501 0 times in 5000
Many Thanks, Ingvar Lagerstedt
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
data:image/s3,"s3://crabby-images/bdf7c/bdf7c0781b792b46a907fab235c9e1abf435301b" alt=""
Hi Tom, Thank you for the suggestions. I shifted the density range, and changed to the "overlap" metric. This worked well, I found the best fit in 28/500 for 5500/5501 case and 25/200 for the 5500/2017 case. This is much more in line with what I hoped for. It used to be that almost all single particle EM maps had the background scattering centred at 0, but more recently a number of entries have the background peak away from zero. I think you are right, that EMDB ought to either enforce background at zero or at least collect the value if it is non-zero, for non-tomogram volumes. I will pass this suggestion on. Some maps in the EMDB are masked, in those cases there is no background scattering. Selecting a suitable contour level is challenging. The recommended levels provided by the authors in EMDB are often useful, but at least in some cases clearly incorrect, to the point that values outside the density range have been suggested. In theory it should be possible to select a level that corresponds to the expected volume of a sample. If the weight and density is known, then calculating the volume is trivial. In practice this is not always so straightforward. I have added a graph of the estimated enclosed volume as a function of density on the PDBe's EM analysis pages, e.g., pdbe.org/EMD-5500/analysis, which can be used as an additional guidance to the recommended contour levels. The analysis pages also has a density distribution plot, in the same way as Chimera has in the Volume Viewer interface. I think I use more bins, making masking and/or background scattering to stand out more. With regards to aligning the two volumes along the principal moment of inertia, that would have to be a user-specified option, that should be used only when the two objects are known to be more or less the same sample. Many Thanks, Ingvar On 2013-04-24 20:33, Tom Goddard wrote:
Hi Ingvar,
I experimented with fitting the two very similar 30S ribosomal subunit structures EMD 5500 and 5501 to understand why you were needing 5000 search positions to locate the best fit while in most cases I've played with in the past it takes less than 100 search positions. Here's what I found out.
The basic trouble is that the two maps have background density levels that are not at zero. They are at about -4.1 and -3.9 while the interesting contour level is around 2.5. Shifting the background level to zero and using the default "overlap" fitting metric makes fit search find the best fit in about 20 trial positions. (I did the shifts with Chimera Volume Filter, type Scale.) Shifts in the density values have a strong effect on the fitting when using the overlap or correlation (not about mean) metrics. It is a bad idea to use either of those two metrics if the background density is not zero in both maps. Tests fitting with those two methods produced horrible fits with hardly any overlap between maps and also seemed to have hundreds of different locally optimal fits. The correlation about mean metric subtracts mean density values and so shifts have no effect. But that metric also produces a huge number of locally optimal solutions -- the reason you had to try 5000 starting positions. The reason that "overlap" is the default metric is that it tends to have many fewer local best fits. So you can be quite far from the best solution and still find it as the local optimum.
So my advice is to make sure the background density is 0, and use the default "overlap" fitting metric if you want the widest radius of convergence and fewest local best fits. How do you make the background density 0? One idea is to subtract the mean of all density values. That may be reasonable if the structure occupies only a small part of the map box, but becomes bad if the structure occupies a large fraction of the box. In the latter case trickier methods would have to be applied. If you trusted the "suggested contour level" you might only average values below that level. The real solution here is that EMDB should insist that deposited single-particle maps say what the background level -- ie the level outside the particles, or insist it is 0. If the maps can have any unspecified shift, then automated processing is not going to work. The same problem existed a few years ago when EMDB didn't have the "suggested contour level". It isn't possible to reliably guess a good contour level.
If you are considering searches of a user supplied map against all EMDB maps I think you need it to be fast, and will need to sacrifice reliability to some extent. So your earlier suggestion about aligning principle axes and optimizing that fit might be good, except for nearly spherical particles like viruses. Even with viruses it would give a pretty random initial configuration but that might suffice since the fit would be likely to align symmetry axes. The main problem would be that if the user supplied map is a subpart of an EMDB map (half a ribosome, half a virus, etc...) it won't find the good fit if you just align principle axes. For that to work, the maps will have to represent the same object.
Tom
-------- Original Message -------- Subject: Re: [Chimera-users] Seeking guidance on Volume-volume fitting From: ingvar To: Tom Goddard Date: 4/24/13 5:10 AM
Hi Tom,
Thank you for explaining the inner workings of the volume-volume fitting in Chimera.
The long term goal is to set up a searchable database with what volumes fit in each other, not necessarily limited to EM, and also provide a service where a user can upload a volume and see if it fits in any existing volume in the archive.
It sounds like I am used a good set options. I will try some other examples to see if I have better luck in finding the best fit there, may be a set of 70 S ribosomes.
/Ingvar
On 2013-04-24 06:09, Tom Goddard wrote:
Hi Ingvar,
It sounds like you want to automate the fitting of one EMDB ribosome map to another, and probably other structures too. Probably you are aiming to have it find the best fit as often as possible without taking too long. It is not easy to guarantee you have the best fit, even using an exhaustive search can miss it unless you take very fine rotational and translation steps. Here are answers to some of your questions.
Should one map be resampled before fitting in the other map? No, I don't see any advantage to doing that. The fit is done on the grid points of the first map within the displayed contour level, and the second map is interpolated at those grid points.
Could aligning principle inertia axes help get the best fit. Yes, in some cases that will work. And if you only optimize that initial it will be very fast (less than a second typically). But unless you search lots of other possible orientations you will miss the best fit in some percentage of cases. So it doesn't seem particularly useful if reliability of automated fitting is your goal.
Does it matter if grid sizes are different? No. Trilinear interpolation is being used, so there is no advantage of the grid sizes or spacings matching.
Should any correction be done for fitting different resolution maps to each other? I don't think there would be much advantage to say smoothing the higher resolution map to match the lower resolution map. I don't think that will improve convergence to the best fit. And I don't think it will change the best fit -- although maybe it is theoretically possible.
If you are using the correlation about mean metric then shifting the density values will have no effect, because the mean is subtracted from the density of both maps. But if you are using correlation or overlap a shift may be needed.
I am unpleasantly surprised by your low success rate getting the best fit in 5000 or 50000 searched positions. My experience has been that it is found in a few hundred positions. I will need to try your examples. I'll report in a later email what I find.
Tom
On Apr 22, 2013, at 3:37 AM, ingvar wrote:
Dear Chimera,
I am seeking guidance on what would be best practice in doing volume-volume fitting, and what expectations should I have on number of trial conformations. For instance would it be beneficial to resample one of the volumes to the same grid as the other volume. What about volumes with different resolutions, do this require any special handling. In the case below it would probably be more efficient to just align the principal moments of inertia of the envelopes above the contour level and then do a search from there, though this will only work well if the volumes are very similar.
In an initial study I used 3 EM volumes from EMDB, (EMD-5500, EMD-5501, and EMD-2017). They are all 30S E. coli ribosomes of similar resolution (12.9, 14.0, 13.5 Å). I thought that this would be a relatively easy starting point, with the intent to then go on to fit 30S subunit in some of the many E. coli 70S EM volumes that are available. The volumes EMD-5500 and EMD-5501 are in similar orientation, while EMD-2017 is in a completely different orientation. The grids are similar but not identical, 125^3 vs 128^3. I adjusted the contour levels from the EMDB recommended values to make the enclosed volumes more similar in size.
Contour levels used: EMD-5500 -2.8 -> -2.5 EMD-5501 -2.8 EMD-2017 39 -> 32
I am using scripts like the one shown here:
from chimera import runCommand as rc from chimera.tkgui import saveReplyLog rc("open data/EMD-5500.map") rc("volume #0 level -2.5 transparency 0.5") rc("open data/EMD-5501.map") rc("volume #1 level -2.8 transparency 0.5") rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2") saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt')
It seems correlation about mean is the best metric for volume-volume fitting, and that it is best to only use points inside the envelope.
The maps EMD-5500, and EMD-5501 have the unusual feature that the density range is shifted downwards so that the average is well below 0. I was concerned about that and moved the density range, but with the fitmap parameters above that did not seem to have any significant impact (with other sets of parameters the density range appears to be an issue).
In this case, it is relatively easy to see when you have found the "good" fit, all other fits have clearly worse statistics.
What I was surprised about was the number of trial conformations needed to be reasonable certain to find the "good" fit.
5500 in 5501 7 times in 5000 5501 in 5500 2 times in 5000 5500 in 2017 0 times in 5000 5500 in 2017 8 times in 50000 2017 in 5501 0 times in 5000
Many Thanks, Ingvar Lagerstedt
_______________________________________________ Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
Chimera-users mailing list Chimera-users@cgl.ucsf.edu http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
participants (2)
-
ingvar
-
Tom Goddard