Correlated tests statistics

Andrey Ziyatdinov

2017-06-15

1 About

2 Packages

library(devtools)
load_all("~/git/variani/matlm/")

load_all("~/git/variani/qq/")
library(pander)
library(ggplot2)

theme_set(theme_light())

3 Simulations parameters

N <- 2e3
M <- 2e3

seed <- 1

rho <- 0.9

4 Independent predictors

simpred_uncor <- matlm_sim_randpred(seed = seed, N = N, M = M)
assoc_uncor <- with(simpred_uncor, matlm(form, dat, pred = pred))
qq_plot(assoc_uncor$tab$pval)

5 Correlated predictors

simpred_cor <- matlm_sim_randpred(seed = seed, N = N, M = M, rho = rho)
assoc_cor <- with(simpred_cor, matlm(form, dat, pred = pred))
qq_plot(assoc_cor$tab$pval)

5.1 Covariance matrix among predictors

The covariance matrix is pre-defined in matlm_sim_randpred function and has a simple form:

cmat <- matlm_sim_randpred(seed = seed, N = N, M = M, rho = rho, ret = "mat")
# number of predictors
M
[1] 2000
# dimenstions of matrix
dim(cmat)
[1] 2000 2000
# a sub-matrix
cmat[1:5, 1:5]
     [,1] [,2] [,3] [,4] [,5]
[1,]  1.0  0.9  0.9  0.9  0.9
[2,]  0.9  1.0  0.9  0.9  0.9
[3,]  0.9  0.9  1.0  0.9  0.9
[4,]  0.9  0.9  0.9  1.0  0.9
[5,]  0.9  0.9  0.9  0.9  1.0

5.2 Covariance matrix among test statistics

It can be shown (Joo et al. 2016) that the covariance matrix among test statistics (t-test) \(s_i\) is the correlation matrix among predictors \(x_i\):

\(cov(s_i, s_j) = cor(x_i, x_j)\)

This basic relationship is true for the simplest linear regression model:

\(y = \mu + \beta x_i + e\)

\(e \sim \mathcal{N}(0, \sigma_e^2)\)

In other cases, e.g. related observations (Joo et al. 2016), some modifications are required.

5.3 Dummy correction of qq-plot

C <- matrix(rho, M, M)
diag(C) <- 1

ch <- chol(C)
ch_inv <- solve(ch)
s_assoc <- assoc_cor$tab$zscore
s_corrected <- as.numeric(s_assoc %*% ch_inv)

pvals_corrected <- pchisq(s_corrected^2, 1, lower.tail = FALSE)
qq_plot(pvals_corrected)

6 Permutation tests for correlated predictors

(Conneely and Boehnke 2007) introduced \(P_{act}\):

To calculate \(P_{perm}\), we first created 1,000 permutations of the original data by randomly shuffling individual genotype vectors while leaving the trait data and any covariates intact. In this way, the permuted samples simulated the null hypothesis of no association but maintained the original correlation between genotypes, between traits, and between traits and covariates. We tested each of these 1,000 samples for association and estimated \(P_{perm}\) as the proportion of samples with a \(P_{min}\) value as low as that observed in the original data

rho pact N_effective N pval_BF alpha
0 0.1637 500 500 1e-04 0.05
0.3 0.1816 451 500 1e-04 0.05
0.6 0.2116 387 500 1e-04 0.05
0.9 0.2395 342 500 1e-04 0.05

In our simulations, we computed \(P_{act}\) as an alternative to the Bonferroni correction, while varying the value of rho.

References

Conneely, Karen N, and Michael Boehnke. 2007. “So Many Correlated Tests, so Little Time! Rapid Adjustment of P Values for Multiple Correlated Tests.” The American Journal of Human Genetics 81 (6). Elsevier: 1158–68.

Joo, Jong Wha J, Farhad Hormozdiari, Buhm Han, and Eleazar Eskin. 2016. “Multiple Testing Correction in Linear Mixed Models.” Genome Biology 17 (1). BioMed Central: 62.