This brief post presents a function to implement the free step-down re-sampling p-value adjustment for multiple-testing for regression models. It is an adaptation of the R code presented in Foulkes (2009, pp. 114-119), but it implements the minP method in addition to the maxT method.

# Function

## Required Packages

• `lmtest`: for `coeftest()` function
• `sandwich`: for `vcovHC()` function

## Arguments

The function take four arguments:

• `model`: Regression model (`lm` class object)
• `B`: Number of bootstrapped resamples (default = `10,000`)
• `seed`: Seed for reproducibility (default = `1`)
• `rob`: Use HC3 heteroskedasticity-consistent standard errors (default = `FALSE`)

## Code

```SDadjP.lm = function(model, B = 10000, seed = 1, rob = FALSE) {

# Stop if model not 'lm' object
stopifnot(class(model) == "lm")

# Set seed for reproducibility
set.seed(seed)

# Set robust estimator
est = ifelse(rob == TRUE, "HC3", "const")

# Set covariance matrix
covMat = vcovHC(model, est)

# Get observed t-values
tObs = as.vector(abs(coeftest(model, vcov. = covMat)[-1, 3]))

# Sort observed t-values in decreasing order
tObsSort = as.vector(sort(tObs, decreasing = TRUE))

# Save order of the observed t-values
tOrder = order(tObs)

# Get observed p-values
pObs = as.vector(coeftest(model, vcov. = covMat)[-1, 4])

# Sort observed p-values in increasing order
pObsSort = as.vector(sort(pObs))

# Save order of the observed p-values
pOrder = order(pObs)

# Number of terms
nPred = length(pObs)

# Create blank matrices
tResample = matrix(nrow = B,
ncol = nPred)

pResample = matrix(nrow = B,
ncol = nPred)

# Number of observations
nObs = nrow(model\$model)

# Resampling loop
for (i in 1:B) {

# Resample residuals
Ystar = sample(model\$residuals, size = nObs, replace = TRUE)

# Run model with resampled residuals as DV
ModResample = lm(as.formula(paste0("Ystar~", paste0(labels(terms(model)), collapse = "+"))),
data = model\$model[, -1])

# Covariance matrix of resampled data
covResample = vcovHC(ModResample, est)

# Get resampled t-values based on order of the observed
# t-values and store in matrix
tResample[i, ] = as.vector(abs(coeftest(ModResample,
vcov. = covResample)[-1, 3])[tOrder])

# Get resampled p-values based on order of the observed
# p-values and store in matrix
pResample[i, ] = as.vector(coeftest(ModResample,
vcov. = covResample)[-1, 4][pOrder])

}

# Get the cumulative maxima
Qtmat = t(apply(tResample, 1, cummax))

# Get the cumulative minima
QPmat = t(apply(pResample, 1, cummin))

# Compute adjusted p-values:
# maxT
p.maxT = apply(t(matrix(rep(tObsSort, B), nPred))  QPmat, 2, mean)

# Turn off scientific notation
options(scipen = 999)

# Sort the p-values
p.order.maxT = match(tObs, tObsSort)
p.order.minP = match(pObs, pObsSort)

# Get predictor names
varnames = labels(terms(model))

# Outputs the raw and adjusted p-values
# Create data.frame to output
adjP = data.frame(p.raw = pObs,
p.maxT = p.maxT[p.order.maxT],
p.minP = p.minP[p.order.minP],
row.names = varnames)

# Print data.frame
cat("\n", "Step-down residuals-based resampling p-value adjustment (Westfall & Young, 1993)",
"\n",
"\n", "Number of bootstrapped resamples: ", B,
"\n",
"\n", sep = "")

# Turn back on scientific notation
options(scipen = 0)
}
```

## Example

Let’s recreate the example in Foulkes (2009, pp. 114-119). I’m only running 1,000 bootstrapped re-samples (`B = 1000`) to save processing time, but you should run the default 10,000.

```# Load data from Foulkes (2009) book website
header = T, sep = "\t")
# Attach data
attach(fms)

# Subset data
Actn3Bin = data.frame(actn3_r577x!="TT",
actn3_rs540874!="AA",
actn3_rs1815739!="TT",actn3_1671064!="GG")

# Run linear regression
Mod = lm(NDRM.CH ~ ., data = Actn3Bin)

# Get model summary
summary(Mod)

# Get adjusted p-values
SDadjP.lm(Mod, B = 1000)
```

## Output

```Step-down residuals-based resampling p-value adjustment (Westfall & Young, 1993)

Number of bootstrapped resamples: 1000

p.raw p.maxT p.minP
actn3_r577x.....TT.     0.005204379  0.017  0.017
actn3_rs540874.....AA.  0.356225590  0.329  0.350
actn3_rs1815739.....TT. 0.122159680  0.206  0.220
actn3_1671064.....GG.   0.096425660  0.215  0.219
```

As you can see above, the output displays the:

• model terms
• unadjusted p-values (`p.raw`)
• maxT adjusted p-values (`p.maxT`)
• minP adjusted p-values (`p.minP`).

# Reference

Foulkes, A. S. (2009). Applied Statistical Genetics with R. Springer-Verlag: New York.

This site uses Akismet to reduce spam. Learn how your comment data is processed.