## Statistical Power

The formulation of the hypotheses in hypothesis testing is typically based on the scientific characteristics of the phenomena under investigation. In the specification of the test, however, the statistician has the freedom to choose which test statistic to use and how to set the rejection region. In some cases, if statistical considerations are introduced in the design of the trial, the statistician may also have input into the selection of design parameters such as sample size and measurement methods.

Fig. 1.3. Power functions

Fig. 1.3. Power functions

Statistical tests, regardless of the sample size, the type of the collected measurements, and the form of the test statistic, are usually chosen to have a given significance level. Therefore, the discrimination between different testing procedures cannot be based on the properties of the test statistic under the null distribution alone. Instead, the features of the test statistic under the alternative hypothesis are used in the evaluation of statistical tests. A central feature is the statistical power of the test, which is the probability of rejecting the null hypothesis. This probability can be computed for any distribution permitted by the model. When the distribution is one of those specified by Hi, high power is preferred in the comparison between alternative tests.

Consider again the binomial example. We have assumed that the probability n of identity by descent (IBD) under the alternative hypothesis is larger than one-half and that it reflects the strength of the genetic effect. (See Chap. 9 for a detailed discussion.) Hence, the power is equal to the probability that a binomial random variable with the probability of success n > 1/2 exceeds the threshold. The threshold is set with respect to n = 1/2 for a sample of size 100. A question of potential interest is how much power would we lose if we would collect a smaller sample, say of size 36?

An answer can be found with the aid of the power function. The plot of the power function for a sample of size 100 (solid line) and a sample of size 36 (broken line) can be found in Fig. 1.3. This figure is generated by a simple R code, which we will present below. The power function represents the statistical power as a function of the parameters that determine the distribution of the test statistic under the alternative hypothesis. In the case we consider the parameter is the probability of success. Observe that when the parameter approaches the value of 0.5 - its value under the null hypothesis - then the power function converges to the significance level of the test. (The target significance of 0.05 is indicated in the figure by the lower dashed line.) On the other hand, the larger the parameter p the closer the power is to its maximal value of one. Typically, the larger the sample size the higher the power. In general, this can be observed also in the current plot of the power functions. (Can you give an explanation for the fact that the order is reversed in this figure for smaller values of success probabilities?) An assessment of the loss in power can be carried out, for example by the identification of the minimal IBD probability that can be detected with at least a given power, say 85%. Refer to the upper dashed line. If the sample size is 100, then that minimal probability is about 0.6. If the sample size is 36, then the probability is closer to 0.7. The decision which sample size to prefer would ideally depend on the relative importance of detecting smaller genetic effects compared to the increase in cost and workload associated with the recruitment and assessment of a larger sample.

Let us consider the code that generated Fig. 1.3:

0 0