## M

It follows that the recombination fraction for an interval of length t = A is

exp(—AA) £ )0'u, = exp(—AA)[exp(AA) — exp(—AA)]/2 (2m + 1)!

To evaluate the covariance function of Zt for a backcross, write x(t) to denote the number of A alleles at the marker locus t. Then, for two markers t and s at a recombination distance 0, we obtain that

E[x(t)x(s)] = Pr(xt = 1)Pr(x(s) = 1|x(t) = 1) = (1/2)(1 — 0) .

It follows that cov(x(t),x(s)) = E(x(t)x(s)) — E(x(t))E(x(s))

= (1 — 0)/2 — 1/4 = (1 — 20)/4 = exp(—3|t — s|)/4 , where the last equality follows from (5.1) with A = It — s|. If we use this result in the approximation (4.4) and recall that ayj = E[(y — m)2], we obtain (5.2). Note that this argument already appeared in Sect. 4.2.3, except for the final substitution of (5.1).

5.7 Bibliographical Comments

The issue of multiple testing in genome scans and the relevance of the Ornstein-Uhlenbeck process for evaluating (genome-wide) significance levels was recognized by Lander and Botstein [47], who proposed the approximation based on infinitely dense markers. The approximation for equally spaced markers discussed in this chapter is due to Feingold, Brown, and Siegmund [28]. See [73] for additional references. Lander and Kruglyak [48] have been influential in emphasizing the importance of the issue of multiple testing. The permutation based method mentioned in Sect. 5.4 is used in a variety of statistical problems; its implementation and popularity in gene mapping are due to Churchill and Doerge [11].

### Problems

5.1. Get a more accurate assessment of the Bonferroni inequality and the approximation formula by: (a) increasing the number of iterations to 100,000; (b) using the R function "uniroot" in order to find the exact location where the curves cross the level 0.05.

5.2. (This exercise illustrates the effect of the recombination parameter ¡3 on the significance thresholds.) In a RI design, unlike the backcross design, both types of homozygote are observable. Suppose we consider a test statistic based on comparing the difference in expression level of the phenotype between AA homozygous and aa homozygous animals. That will lead to a one-dimensional test statistic similar to the one used for the backcross. The main difference between the two cases is that a large number of meiotic events are relevant, compared to only one in the backcross. By using the recombination fraction for the RI found at the end of Chap. 3, it may be shown that the parameter 3 is 0.08.

(a) Redo for this statistic the evaluation of the Bonferroni inequality and the approximation (5.3).

(b) What is the relation between the thresholds for the backcross design and the thresholds you get for the RI design? Explain!

5.3. For an intercross we introduced the test statistic U(t) = Z2(t) + Zg(t), computed at each marker for the intercross design. Za(-) is a realization of an Ornstein-Uhlenbeck process with a covariance structure given by cor(Za(t),Za(s)) = exp{—0.02 \t — s\}, and Zg(■) is an independent realization of an Ornstein-Uhlenbeck process with a covariance structure given by cor(Zg(t), Zg(s)) = exp{—0.04 \t — s\}.

(a) Apply the function "OU.sim" in order to simulate the process U(■) over a collection of markers.

(b) Compare the distribution under the null hypothesis of U (t), the distribution of Zaa (t), the distribution of maxier U (t), and the distribution of maxieT Z2a (t).

(c) Explain any difference you find between the thresholds for a backcross design and for an intercross design.

5.4. The formula corresponding to (5.3) for the statistic U(t) = Z(t) + Zg (t) for an intercross is

Pr(maxU^ > u) k 1—exp {—Cexp(— u/2)— f3Luexp(—u/2)v({2pAu}112)},

where /? = 0.03 is the average of the values ¡1 = 0.02 for Za and ¡2 = 0.04 for Zg. Find thresholds corresponding to markers spaced at A = 20, 10, 5, and 1 cM based on this formula and compare them to the Bonferroni bounds and the simulations of Prob. 5.3.

5.5. For RI, the approximation given in (5.3) is valid, but the value of 3 is different, to reflect the larger rate of recombination found in RI. In this case the parameter 3 is found by (i) expressing the correlation 1 — 20RI as a function of 0 and hence as a function of exp(—0.02 s) for markers at a genetic distance s from each other, then (ii) expanding the exponential for small s using the approximation exp(— x) « 1 — x. The appropriate value of 3 is the coefficient of s in this expansion. You must also use for small x the formula 1/(1 — x) = 1 + x + x2 + ■ ■■ and neglect terms like x2, x3, etc., which are smaller than x when x is small. Verify the value of 3 given in Prob. 5.2 for RI obtained by repeated sib mating. What is the value for RI obtained by selfing?

## Post a comment