Scases 1 |F Fik is distributed as the statistician's F2k—4; (2k—4)(Ni+N2—2). Here and N2 are the two group sample sizes, F1 and F2 are the group average shapes in the common (pooled) Procrustes registration, and || • || is squared Procrustes distance. The numerator of the ratio at the right is the squared distance between mean shapes—the sum of the squared separations between the paired points in Figure 4.4. The denominator is the sum of all squared Procrustes residuals (which have been assumed sufficiently small) from the group means over landmarks and cases.

Although this formula is interesting, the assumptions underlying the claimed F-distribution are quite unrealistic in many settings. They are violated, for instance, whenever variation in points close together is correlated (which is certainly the case for adjacent points along smooth curves), whenever deviations at a distance are correlated (which is certainly the case for symmetrical or nearly symmetrical forms), and whenever some landmarks are noisier than others or more variable in certain directions than others, no matter how small the underlying shape variation. Nevertheless, the statistic itself, that squared Procrustes distance ||F 1 — F21 in the numerator, is a very useful summary of all the difference between the mean shapes. It can be exploited whenever one has decided that all the measurable a priori causes of these correlations have been adjusted out. The denominator is a fine summary of all the within-group shape variation under a similar caveat.

Regardless of Goodall's assumptions, then, one can use the Procrustes statistic ||F 1 — F2! as a very useful summary measure of shape effect size as long as one computes its distribution realistically rather than by recourse to tables. For this purpose it is most useful to construct the reference distribution by permutation test (Good, 2000). In this approach, the actual Procrustes distance between group mean shapes, such as the two in Figure 4.4, is referred not to a table of F-ratios but to the actual distribution of such distances computed after the assignment of groups to specimens is randomly per

Figure 4.5. Aspects of the uniform adjustment. Left: Procrustes optimal uniform component; there are no differences in central tendency by group. Right: Uniform-free Procrustes shape coordinates. The variance here is less than half of that of the analogous unadjusted plot in Figure 4.3.

Figure 4.5. Aspects of the uniform adjustment. Left: Procrustes optimal uniform component; there are no differences in central tendency by group. Right: Uniform-free Procrustes shape coordinates. The variance here is less than half of that of the analogous unadjusted plot in Figure 4.3.

muted. For instance, in the present data set, the first 12 specimens are normals (doctors), and the last 13 are patients; one permutation might declare the doctors to be cases 3, 5, 7, 9, 10, 1 1, 15, 18, 19, 20, 23, and 25 and the patients to be the other cases. The exact significance level of a Procrustes distance between group means like these is the fraction of permutations that produce a difference at least as large while wholly ignoring the actual information about grouping. In practice, one need not take all the combinations—there are more than five million for this sample of 13 cases versus 12—but a suitably extensive random sampling. In this paper, every permutation test involved 1000 permutations. (It sounds as if we are ignoring the denominator of the equation above. Actually, as explained by Bookstein (1997b), the numerator and denominator here sum to a constant— the total variation of the full data set—and so the permutation test ends up a function of the numerator, the pseudogroup mean Procrustes distance, alone.)

To apply this test to a data set such as this one, it is important to partial out all factors that are known in advance to contribute large-scale variability that is unrelated to the hypothesis at hand. In modern mor-phometrics there are a good many rigorous ways of par-tialing out such factors in advance. One method, which is useful in the present context, is removal of the so-called uniform term, the extent to which specimens differ by deformations that take square graph paper into a uniform grid of identical parallelograms. Changes like these, while they can be quite interesting, account for correlated displacements of all the landmarks at once, which can easily overwhelm the signal from any local shape difference encoded in a small arc of callosal boundary.

In the Procrustes toolkit, this uniform term is computed by a fixed formula (see Bookstein, 1996, 1997b) that is determined once and for all by the Procrustes geometry around the mean form. Here this term accounts for more than half of all the shape variation in the data. Figure 4.5 shows the estimate of this component (think of it as height of the arch; notice that it does not discriminate the groups) and also, at the right, the variation remaining after this adjustment. It is clear by comparison with Figure 4.3 that this Procrustes-optimal adjustment for uniform variation greatly sharpened the precision with which other comparisons can be assessed.

A permutation test applied to the unadjusted data in Figure 4.3 results in a significance level of about 25% for the group mean shape difference in Figure 4.4. Applied to the residuals in Figure 4.5, the same test finds that out of 1000 permutations, only 12 produced a Procrustes distance between pseudogroup means as large as the difference shown in Figure 4.4. That is, the two mean shapes in question, the doctors' and the patients', differ significantly in the nonuniform component of shape at about the 1.2% significance level. This degree of im-

Figure 4.6. Simulation of the approach by sector areas. Left: For the mean shape in Figure 4.3. Right: For the mean shape from Figure 4.10. No schemes of this type show statistically two typical sector schemes

Figure 4.6. Simulation of the approach by sector areas. Left: For the mean shape in Figure 4.3. Right: For the mean shape from Figure 4.10. No schemes of this type show statistically

sectors for the Davatzikos data significant group differences in area ratios after an appropriate correction for the multiple comparisons involved.

plausibility on the null hypothesis (no difference) is sufficient to justify a search for further substantive features.

It is instructive to compare this overall approach to significance testing with the conventional counterpart, which would typically reduce the comparison of groups to a matter of areas of various sectors of the callosum (see, e.g., Semrud-Clikeman et al., 1994; Riley et al., 1995). Diverse sets of sectors, such as the two sketched at the left in Figure 4.6, never manage to recover a set of sectors for which the difference of relative areas, doctors versus patients, is significant even at the 5% level when tested by the corresponding omnibus T2 test (or, if you like, when tested by ordinary t-ratio but then corrected Bonferroni-style for the number of sectors). In the Procrustes method, by contrast, there is only one statistic to be tested—the summary statistic was always summed all the way around the outline, even though our findings will be limited to much smaller regions when we discuss them—and hence there was no associated "correction." The true jp-value of the group difference here is arithmetically the same 1.2% that is reported by the permutation test.

Was this article helpful?

## Post a comment