## Prmax z 1 z 4z v

The function "power.marker" implements Formula (6.2):

> power.marker <- function(z,beta,Delta,xi) + {

+ nu <- Nu(z*sqrt(2*beta*Delta)) + return(1-pnorm(z-xi) +

Applying this approximation we get:

> z <- 3.56; beta <- 0.02; Delta <- 10;

> power.marker(z,beta,Delta,xi) [1] 0.7194996

Compare this to the probability of 0.7044, which was obtained via simulation.

The worst case scenario is to have a QTL midway between markers. The formula corresponding to (6.2) is much more complex since it involves conditioning on the values of the process ZiA at both flanking markers. The expression is omitted, but we use the function power.midway in order to approximate the power in this case.

 + { + ul <- 5 + nu <- Nu(z*sqrt(2*beta*Delta)) + zz <- z - xi*exp(-beta*Delta/2) + cc <- sqrt(1 - exp(-2*beta*Delta)) + funl <- function(x,beta,Delta,zz,cc) dnorm(zz-x)* + pnorm((zz-exp(-beta*Delta)*(zz-x))/cc) + terml <- integrate(fun1,0,ul,beta=beta, + Delta=Delta,zz=zz,cc=cc) + fun2 <- function(x,z,beta,Delta,zz,cc) exp(-z*x)*

+ dnorm(zz-x)*pnorm((zz-exp(-beta*Delta)*(zz-x))/cc)

+ term2 <- integrate(fun2,0,ul,z=z,beta=beta, + Delta=Delta,zz=zz,cc=cc)

+ fun3 <- function(x,z,beta,Delta,zz,cc) dnorm(zz-x)* + exp(-z*x-z*(zz-exp(-beta*Delta)*(zz-x))+z"2*cc/2)* + pnorm((zz-exp(-beta*Delta)*(zz-x))/cc-z*cc)

+ term3 <- integrate(fun3,0,ul,z=z,beta=beta, + Delta=Delta,zz=zz,cc=cc)

+ return(1-term1\$value+2*nu*term2\$value-nu"2*term3\$value)

The analytical expression involves an integral. Numerical integrals of functions with respect to their first argument can be computed with the function "integrate". The output is a list, with the component "value" containing the result of the integration.

In the simulations we obtained a power of 0.6482 when £ = 4, A =10, and the QTL is located halfway between markers. Compare this probability to the analytical approximation:

> power.midway(z,beta,Delta,xi) [1] 0.6498196

The power function involves the evaluation of the statistical power over the range of parameters under the alternative distribution. In the case of a whole-genome scan using the backcross and a given set of markers, these parameters are the location of the QTL and the strength of the signal, i.e., the noncentrality parameter £. Let us evaluate the analytical approximations over the range of the power function.

We start with the case of a QTL, which is located next to a marker in the middle of a chromosome. We consider here the case of the backcross design (P = 0.02), and an inter-marker spacing of 10 cM:

> ap.marker <- p.marker <- vector(mode="numeric")

+ Z1 <- add.qtl(Z0[,chr1],beta,markers,q,xi[i]) + p.marker[i] <- mean(apply(abs(Z1),1,max)>=z)

+ ap.marker[i] <- power.marker(z,beta,Delta,xi[i]) + }

> plot(c(0,6),c(0,1),type="n",xlab="xi",ylab="Power")

> lines(xi,ap.marker,lty=2)

xi

Fig. 6.3. The power function when the QTL is next to a marker and when it is midway between markers.

Next, let us consider the case of a QTL midway between markers:

> ap.midway <- p.midway <- vector(mode="numeric")

+ Z1 <- add.qtl(Z0[,chr1],beta,markers,q,xi[i]) + p.midway[i] <- mean(apply(abs(Z1),1,max)>=z)

+ ap.midway[i] <- power.midway(z,beta,Delta,xi[i]) + }

> legend(0,1,legend=c("on marker,simulate",

+ "on marker,approx.","midway,simulate","midway,approx."), + lty=c(1,2,1,2),col=gray(c(0,0,0.5,0.5)))

The resulting plot is displayed in Fig. 6.3. Note the reduction in power when the QTL is not perfectly linked to a marker. Observe good agreement between the analytical approximation and the simulated value. This agreement is destroyed when the QTL is between markers for values of £ less than one. Luckily, the exact evaluation of the power for such low values of the noncentrality parameter is of little practical interest.

### 6.3 Designing an Experiment

Experiments aimed at the dissection of the genetic component of traits in mice require substantial investment. It is unadvisable, therefore, to start such an effort, unless one is likely to obtain a successful outcome. The careful planning of the experiment is key in this regard. It ensures, on the one hand, that sufficient resources are devoted for the task. On the other hand, the optimal distribution of these resources lowers the chance of wasting both time and money.

The role of statistical experimental design is to identify the minimal requirements needed in order to be able to extract scientifically significant signals in the presence of background noise. It helps to use separate terminology in order to distinguish between statistical significance and scientific significance. Statistical significance is a formal term associated with the properties of the random mechanism underlying the background noise. It contrasts the strength of the observed signal in light of what could have been produced in a scenario where no real signal is present. The statistical significance is computed in the context of a null hypothesis, which assumes the absence of any signal. An observed signal can turn out to be statistically significant even if the underlying true signal is very weak. This can occur if the level of the background noise is low relative the amount of data gathered. The scientific significance, on the other hand, is not determined by statistical considerations. It reflects the specifics of the particular scientific discipline, and is given in terms of the strength of the underlying signal. Thus, in experimental genetics we may aim at detecting QTLs which have a strong enough effect on the phe-notypic variance. This corresponds to large enough values of the locus specific heritability. The experiment is designed to discover genetic terms that have an effect above a given threshold.

To be more specific, let us consider an experiment using the backcross design. The strength of the genetic effect is given in terms of the locus specific heritability (see Chap. 2):

In order to design the experiment, we may set a minimal level of this quantity. Based on this level, the specifications of the trials can be determined. In this section we will describe the computations which identify the density of the genotyped markers and the sample size required in order to ensure a reasonably large chance to detect this minimal level of signal. We will carry these computations backwards. First, we will determine, for each inter-marker spacing, the appropriate noncentrality parameter which ensures the minimal statistical power. Second, we will determine the sample size associated with this noncentrality parameter. Finally, we will select the design which minimizes the overall cost.

### Determining the Noncentrality Parameter

Thousands of polymorphic markers, scattered throughout the mouse genome, are available for use. Hundreds of thousands, and even millions more, are expected with the identification of more and more SNP markers. Although not all markers are polymorphic between a given pair of inbred strains, the availability of genetic markers is typically not a limiting factor. Consequently, in principle, we can consider any density of markers. However, in order to simplify the computations, we will analyze here only four different possibilities of inter-marker spacings: A = 20,10, 5, or 1 cM.

In order to have a fair comparison, we will require that all cases possess the same significance level - 5%. Consequently, the thresholds will vary with the spacing. From the computations we made when dealing with the significance level, we found that these thresholds were approximately equal to z = 3.46, 3.56, 3.65, and 3.78, respectively.

Let us use the root finder "uniroot" and apply it to the function "power.midway" in order to identify the value of the noncentrality parameter that produces a power of 85% for each one of the designs. Note that the power is computed for a QTL between markers. This makes the conditions less favorable for designs with larger inter-marker spacings:

+ power.midway(z,beta,Delta,xi)-p

+ xi[i] <- uniroot(ap,interval=c(4,6),z=z[i],beta=beta,

The last row gives target values for the noncentrality parameter for each of the indicated inter-marker spacings.

Determining the Sample Size

Next we turn to the determination of the sample sizes. Recall the definition of the noncentrality parameter:

0 0