1 About

1.1 Links

R script demo/correlared-predictors.R

2 Packages

library(devtools)
load_all("~/git/variani/matlm/")

load_all("~/git/variani/qq/")

library(pander)
library(ggplot2)

theme_set(theme_light())

3 Simulations parameters

N <- 2e3
M <- 2e3

seed <- 1

rho <- 0.9

4 Independent predictors

simpred_uncor <- matlm_sim_randpred(seed = seed, N = N, M = M)

assoc_uncor <- with(simpred_uncor, matlm(form, dat, pred = pred))

qq_plot(assoc_uncor$tab$pval)

5 Correlated predictors

simpred_cor <- matlm_sim_randpred(seed = seed, N = N, M = M, rho = rho)

assoc_cor <- with(simpred_cor, matlm(form, dat, pred = pred))

qq_plot(assoc_cor$tab$pval)

5.1 Covariance matrix among predictors

The covariance matrix is pre-defined in matlm_sim_randpred function and has a simple form:

diagonal entries are 1;
off-diagonal entries are rho (equal here to 0.9).

cmat <- matlm_sim_randpred(seed = seed, N = N, M = M, rho = rho, ret = "mat")

# number of predictors
M

[1] 2000

# dimenstions of matrix
dim(cmat)

[1] 2000 2000

# a sub-matrix
cmat[1:5, 1:5]

     [,1] [,2] [,3] [,4] [,5]
[1,]  1.0  0.9  0.9  0.9  0.9
[2,]  0.9  1.0  0.9  0.9  0.9
[3,]  0.9  0.9  1.0  0.9  0.9
[4,]  0.9  0.9  0.9  1.0  0.9
[5,]  0.9  0.9  0.9  0.9  1.0

5.2 Covariance matrix among test statistics

It can be shown (Joo et al. 2016) that the covariance matrix among test statistics (t-test) \(s_i\) is the correlation matrix among predictors \(x_i\):

\(cov(s_i, s_j) = cor(x_i, x_j)\)

This basic relationship is true for the simplest linear regression model:

\(y = \mu + \beta x_i + e\)

\(e \sim \mathcal{N}(0, \sigma_e^2)\)

In other cases, e.g. related observations (Joo et al. 2016), some modifications are required.

5.3 Dummy correction of qq-plot

C <- matrix(rho, M, M)
diag(C) <- 1

ch <- chol(C)
ch_inv <- solve(ch)

s_assoc <- assoc_cor$tab$zscore
s_corrected <- as.numeric(s_assoc %*% ch_inv)

pvals_corrected <- pchisq(s_corrected^2, 1, lower.tail = FALSE)

qq_plot(pvals_corrected)

6 Permutation tests for correlated predictors

(Conneely and Boehnke 2007) introduced \(P_{act}\):

To calculate \(P_{perm}\), we first created 1,000 permutations of the original data by randomly shuffling individual genotype vectors while leaving the trait data and any covariates intact. In this way, the permuted samples simulated the null hypothesis of no association but maintained the original correlation between genotypes, between traits, and between traits and covariates. We tested each of these 1,000 samples for association and estimated \(P_{perm}\) as the proportion of samples with a \(P_{min}\) value as low as that observed in the original data

rho	pact	N_effective	N	pval_BF	alpha
0	0.1637	500	500	1e-04	0.05
0.3	0.1816	451	500	1e-04	0.05
0.6	0.2116	387	500	1e-04	0.05
0.9	0.2395	342	500	1e-04	0.05

In our simulations, we computed \(P_{act}\) as an alternative to the Bonferroni correction, while varying the value of rho.

the sample size 500
the number of predictors 500
the number of permutations 500

References

Conneely, Karen N, and Michael Boehnke. 2007. “So Many Correlated Tests, so Little Time! Rapid Adjustment of P Values for Multiple Correlated Tests.” The American Journal of Human Genetics 81 (6). Elsevier: 1158–68.

Joo, Jong Wha J, Farhad Hormozdiari, Buhm Han, and Eleazar Eskin. 2016. “Multiple Testing Correction in Linear Mixed Models.” Genome Biology 17 (1). BioMed Central: 62.

Correlated tests statistics

Andrey Ziyatdinov

2017-06-15