## Definition of Region to be Tested

The resolution criteria to be detailed below all make use of the discrete Fourier transform of the images to be analyzed. It is of crucial importance for a meaningful application of these measures that no correlations are unwittingly introduced when the images are prepared for the resolution test. It is tempting to use a mask to narrowly define the region in the image where the signal—the averaged molecule image—resides, as the inclusion of surrounding material with larger inconsistency might lead to overly pessimistic results. However, imposition of a binary mask, applied to both images that are being compared, would produce correlation extending to the highest resolution. This is on account of the sharp boundary of a binary mask, with a 1-pixel falloff, which requires Fourier terms out to the Nyquist limit to be utilized in the representation. Hence, the resolution found in any of the tests described below would be falsely reported as the highest possible—corresponding to the Nyquist spatial frequency. To avoid this effect, one has to use a "soft" mask whose falloff at the edges is so slow that it introduces correlations at low spatial frequencies only. Gaussian-shaped masks are optimal for this purpose since all derivatives of a Gaussian are again Gaussian, and thus continuous.

Fourier-based resolution criteria are, therefore, governed by a type of uncertainty relationship: precise localization of features for which resolution is determined makes the resolution indeterminate; and, on the other hand, precise measurement of resolution is possible only when the notion of localizing the features is entirely abandoned.

5.2.2. Comparison of Two Subsets Versus Analysis of the Whole Data Set

5.2.2.1. Criteria based on two equally large subsets Many criteria introduced in the following test the reproducibility of a map obtained by averaging (or, as we will later see, by 3D reconstruction) when based on two randomly drawn subsets of equal size. For example, the use of even- and odd-numbered images of the image set normally avoids any systematic trends such as related to the origin in different micrographs or different areas of the specimen. Each subset is averaged, leading to the average images p1 (r),p2(r) ("subset averages'').

Let F1 (k) and F2(k) be the discrete Fourier transforms of the two subset averages, with the spatial frequency k assuming all values on the regular Fourier grid (kx, ky) within the Nyquist range. The Fourier transforms are now compared, and a measure of discrepancy is computed, which is averaged over rings of width Ak and radius k =|k|= [k2x + k^]^2. The result is then plotted as a function of the ring radius. This curve characterizes the discrepancy between the subset averages over the entire spatial frequency range.

In principle, a normalized version of the generalized Euclidean distance, |F1(k) —F2(k)| could be used to serve as a measure of discrepancy, but two other measures, the differential phase residual and the Fourier ring correlation, have gained practical importance. These will be introduced in sections 5.2.3 and 5.2.4. Closely related is the criterion based on Young's fringes (section 5.2.5), which will be covered mainly because of its historical role and the insights it provides.

Even though it is clear that the information given by a curve cannot be condensed into a single number, it is nevertheless common practice to derive a single "resolution figure" for expediency.

5.2.2.2. Truly independent versus partially dependent subsets. The iterative refinement procedures that are part of the reference-based alignment (section 3.4) as well as the reference-free alignment (section 3.5) complicate the resolution estimation by halfset criteria since they inevitably introduce a statistical interdependency between the two halfset averages being compared. This problem, which is closely related to the problem of model bias, will resurface in the treatment of angular refinement of reconstructions by 3D projection matching (chapter 5, section 7.2). In the 3D case, the problem has been extensively discussed (Grigorieff, 2000; Penczek, 2002a; Yang et al., 2003). Grigorieff's (2000) suggestion, if applied to the case of 2D averaging, would call for the use of two markedly different 2D references at the outset, and for a total separation of the two randomly selected image subsets throughout the averaging procedure. The idea is if the data are solid and self-consistent, then they can be expected to converge to an average that is closely reproducible. If, on the other hand, the noise is so large that the reference leaves a strong imprint on the average, the resulting deviation between the subset averages will be reported as a lack of reproducible resolution. We come back to the idea of using independent subset averages in the discussion of the spectral signal-to-noise ratio (SSNR), since it leads to the formulation of a straightforward relationship between the SSNR and the Fourier ring correlation.

5.2.2.3. Criteria based on an evaluation of the whole data set The criteria of the first kind have the disadvantage of large statistical uncertainty, and this is why criteria based on a statistical evaluation of the total set are principally superior, although the advantage diminishes as the numbers of particles increases, which is now often in the tens of thousands. The Q-factor and the SSNR are treated in sections 5.2.7 and 5.2.8, respectively.