## Problems

In the following exercises you are asked to simulate test statistics in various settings.

1.1. The first problem deals with the situation described in Sect. 1.5. Consider two binomial random variables X1 and X2 such that X1 ~ B(n1,p1) and X2 ~ B(n2,p2). One is interested in testing the null hypothesis H0 : p1 = p2 versus the alternative H1 : p1 = p2 . A standard test considers the difference between the sample proportions p1 = X1/n1 and p2 = X2/n2, which are standardized with the aid of the pooled sample proportion p = (X1 + X2)/(n1 + n2). The resulting test statistic is given in the form

which has approximately a standard normal distribution under the null hypothesis when ni and n2 are large.

(a) Investigate, using simulations, the distribution of this test statistic under the null hypothesis. Consider various values of the parameters ni, n2, and p = pi = p2 . A famous rule of thumb to determine what it means for ni and n2 to be large requires that min{nip, ni(1 — p),n2p, n2(1 — p)} > 5. Evaluate this rule of thumb.

(b) Investigate numerically the power properties of this test as a function of the difference pi — p2 .

(c) Investigate numerically, for fixed values of pi = p2, the power of the test as a function of the sample size. Specifically, for a given n, let ni = ¡in and n2 = (1 — /3)n, 0 < i < 1. What value of ¡3 would maximize the statistical power?

1.2. Consider the following generalization of the regression model that was introduced in Sect. 1.7:

(This model would be appropriate for simultaneously testing for a relation between a pair of genes on different chromosomes and the phenotype y. See Chap. 2.) Again, assume that each gene has two alleles and denote by xi and x2 the count of variant alleles in the first gene and in the second gene, respectively. Assume that xi ~ B(2,pi) and x2 ~ B(2,p2) and that the two are independent. A reasonable test statistic may take the form

U = np2 + np2 , where pi is the correlation coefficient between y and xi, for i =1, 2. Under the null hypothesis of no relation with either gene, the asymptotic distribution of the given statistic is chi-square on two degrees of freedom.

(a) Investigate, using simulations, the distribution of this test statistic under the null hypothesis and compare it to the theoretical asymptotic distribution. Consider various values of the sample size n and of the allele frequencies, pi , and p2 .

(b) Investigate numerically the power of this test for pi = p2 = 1/2, n = 200, and different values of ai and a2.

(c) Compare the statistical properties, i.e., rejection threshold and power function, of the statistic considered above with the statistic for a single gene that is described in Sect. 1.7. Describe conditions under which you would prefer one or the other of the tests.

0 0