UPDATE: Please see the following post for an all-in-one solution: RePROCESS Model 1

This post was was inspired by Nicholas Michalak’s Novum R-ganum blog posts on reproducing Hayes’ PROCESS Model 1 in R. He has two posts where he presents the R code for examining a continuous × continuous moderation and a dichotomous × continuous moderation.

However, a quick Google search suggests a paucity of information for conducting moderation analyses using a multicategorical moderating variables.

Therefore, this post will outline how to run the PROCESS Model 1 with a multicategorical moderator (M) in R. We will examine how to code the M variable,  simulate some data, run the PROCESS analyses in both SPSS and R, and compare the results from both software packages.

## Coding the multicategorical variable

Before conducting the analyses, the reader should decide how the contrasts for the moderating variable should be coded. Briefly, you can choose from indicator (or dummy), effects, sequential, and Helmert coding. A detailed discussion of the coding methods is beyond the scope of this post, so the reader is referred to Hayes and Montoya (2017) for further information.

Please note that we will be using indicator (or dummy) coding.

### Indicator coding

For a given variable with k categories, we need to construct k – 1 variables, D1, …, Dk-1. The pattern of values for D1, …, Dk-1 represents the group in which a case resides. For a variable with k = 3 categories, we would need to construct two variables D1  and D(see Table 1). Note that the reference group is coded 0 across all of the indicator variables, which is the first category in the table below.

#### Table 1: Indicator coding of k categories.

 Category D1 D2 D3 … Dk-1 1 0 0 0 … 0 2 1 0 0 … 0 3 0 1 0 … 0 4 0 0 1 … 0 … … … … … 0 k 0 0 0 … 1

### Interaction Terms

While it is common practice to center the continuous predictor variable X prior to generating the interaction terms, Hayes (2013) suggests that this may not be necessary. Nonetheless, we will center the X variable in this example.

Once we have generated the indicator variables, we then center the X variable and multiply it with each of the indicator variables. We will end up with k – 1 interaction terms: XD1, XD2, XD3, …, XDk-1.

## Generate data

We will generate some data for the analyses by adapting  the code from Nicholas Michalak’s blog.

# Load required packages --------------------------------------------------
if (!require(pacman)) {install.packages("pacman")}

pacman::p_load(MASS,   # For the 'mvrnorm function
ggplot, # For plotting interaction effects
ggsci,  # For ggplot2 scientific journal themes
lavaan) # For structural equation modeling

# Generate Data -----------------------------------------------------------
# Set seed to reproduce results
set.seed(12345)

# 900 observations
# 300 state schools, 300 independent schools, 300 catholic schools
# Positive relationship in State schools (r = .25); mean = 1
# Negative relationship in Independent schools (r = -.5); mean = 0
# Negtaive relationship in Catholic schools (r = -.25); mean = 1

dataSim = data.frame(id = 1:900,
school = rep(x = c("State", "Independent", "Catholic"), each = 300),
rbind(mvrnorm(n = 300, mu = c(1,1), Sigma = diag(x = .5, nrow = 2, ncol = 2) + .25),
mvrnorm(n = 300, mu = c(0,0), Sigma = diag(x = 1.5, nrow = 2, ncol = 2) - .5),
mvrnorm(n = 300, mu = c(0,0), Sigma = diag(x = 1.5, nrow = 2, ncol = 2) - .25)))

dataSim = within(dataSim, {

# Centre X1
X1.c = X1 - mean(X1, na.rm = TRUE)

# Indicator (dummy) code the school variable
D1 = ifelse(school == "Independent", 1, 0)
D2 = ifelse(school == "Catholic", 1, 0)

# Create Interaction Terms
X1D1 = X1.c*D1
X1D2 = X1.c*D2

})



## Plot interactions

Now we will plot the interactions using ggplot2.

ggplot(data = dataSim, aes(x = X1.c, y = X2, color = school)) +
geom_point() +
geom_smooth(method = "lm", fullrange = TRUE) +
scale_y_continuous(breaks = seq(-4, 4, 1), limits = c(-4, 4)) +
labs(x = "X1 (Centered)", y = "X2", color = "School") +
theme_bw() +
scale_color_npg()


## Conditional effect of X on Y

We now specify the simple slopes to test the slope at the difference values of the moderator.

Without the interaction terms, we have a regression model of the form:

$\large \breve{Y} = b_{0} + b_{1}X + b_{2}D_{1} + b_{3}D_{2} + \cdots + b_{k+1}D_{k-1}$

With the interaction terms:

$\large \breve{Y} = b_{0} + b_{1}X + b_{2}D_{1} + b_{3}D_{2} + \cdots + b_{k}D_{k-1} + b_{k+1}D_{1}X + b_{k+2}D_{2}X+\cdots +b_{2k-1}D_{k-1}X$

Based on Hayes (2013), the conditional effect of on Y (denoted by $\large \theta _{X\rightarrow Y}$) at each level of the moderator is given by:

$\large \theta _{X\rightarrow Y} = b_{1}+b_{4}D_{1} + b_{5}D_{2} + \cdots + b_{2k-1}D_{2k-1}$

For our three-level moderator, this simplifies to:

$\large \theta _{X\rightarrow Y} = b_{1}+b_{4}D_{1} + b_{5}D_{2}$

To determine the conditional effect for each level of our moderator variable, we simply plug in its D1 and D2 values into the above formula.

1. $\large State = b_{1}+b_{4}(0) + b_{5}(0) = b_{1}$
2. $\large Independent = b_{1}+b_{4}(1) + b_{5}(0) = b_{1} + b_{4}$
3. $\large Catholic = b_{1}+b_{4}(0) + b_{5}(1) = b_{1} + b_{5}$

## Run the model

We now fit the model using the lavaan package.

# Define model
modModel =
' # Regressions
X2 ~ b1*X1.c
X2 ~ b2*D1
X2 ~ b3*D2
X2 ~ b4*X1D1
X2 ~ b5*X1D2

# Covariate(s) - not used in current example
# X2 ~ covariate1 + covariate2 + ... covariateN

# Define simple slopes
state := b1
independent := b1 + b4
catholic := b1 + b5
'
# Use 5000 boostrap resamples (may take a few mintues)
modModelFit = sem(model = modModel,
data = dataSim,
se = "bootstrap",
bootstrap = 5000)

# Get summary results
summary(modModelFit,
fit.measures = FALSE,
standardize = FALSE,
rsquare = TRUE)

# Get boostrapped confidence intervals
parameterEstimates(modModelFit,
boot.ci.type = "bca.simple",
level = .95,
ci = TRUE,
standardized = FALSE)


We get the following bootstrapped estimates:

           lhs op   rhs       label    est    se       z pvalue ci.lower ci.upper
1           X2  ~  X1.c          b1  0.375 0.056   6.691  0.000    0.266    0.486
2           X2  ~    D1          b2 -0.937 0.079 -11.859  0.000   -1.087   -0.774
3           X2  ~    D2          b3 -0.915 0.089 -10.287  0.000   -1.091   -0.741
4           X2  ~  X1D1          b4 -0.890 0.074 -12.036  0.000   -1.027   -0.742
5           X2  ~  X1D2          b5 -0.554 0.080  -6.954  0.000   -0.713   -0.399
6           X2 ~~    X2              0.869 0.045  19.124  0.000    0.788    0.968
7         X1.c ~~  X1.c              1.241 0.000      NA     NA    1.241    1.241
8         X1.c ~~    D1             -0.122 0.000      NA     NA   -0.122   -0.122
9         X1.c ~~    D2             -0.098 0.000      NA     NA   -0.098   -0.098
10        X1.c ~~  X1D1              0.370 0.000      NA     NA    0.370    0.370
11        X1.c ~~  X1D2              0.475 0.000      NA     NA    0.475    0.475
12          D1 ~~    D1              0.222 0.000      NA     NA    0.222    0.222
13          D1 ~~    D2             -0.111 0.000      NA     NA   -0.111   -0.111
14          D1 ~~  X1D1             -0.081 0.000      NA     NA   -0.081   -0.081
15          D1 ~~  X1D2              0.033 0.000      NA     NA    0.033    0.033
16          D2 ~~    D2              0.222 0.000      NA     NA    0.222    0.222
17          D2 ~~  X1D1              0.041 0.000      NA     NA    0.041    0.041
18          D2 ~~  X1D2             -0.066 0.000      NA     NA   -0.066   -0.066
19        X1D1 ~~  X1D1              0.355 0.000      NA     NA    0.355    0.355
20        X1D1 ~~  X1D2             -0.012 0.000      NA     NA   -0.012   -0.012
21        X1D2 ~~  X1D2              0.466 0.000      NA     NA    0.466    0.466
22       state :=    b1       state  0.375 0.056   6.690  0.000    0.266    0.486
23 independent := b1+b4 independent -0.515 0.048 -10.692  0.000   -0.610   -0.418
24    catholic := b1+b5    catholic -0.179 0.057  -3.161  0.002   -0.293   -0.072


Now we will compare lines 22 to 24 of the above output to that from the SPSS PROCESS.

Conditional Effect of Focal Predictor in Groups Defined by the Moderator Variable:
school      coeff         se          t          p       LLCI       ULCI
1.0000      .3749      .0623     6.0143      .0000      .2525      .4972
2.0000     -.5152      .0546    -9.4347      .0000     -.6223     -.4080
3.0000     -.1793      .0467    -3.8427      .0001     -.2709     -.0877


Note that while both R and SPSS give the same coefficient estimates, the confidence intervals differ somewhat. This is because lavaan uses the bootstrapped Bias Corrected (BC) confidence intervals, whereas SPSS PROCESS calculates the standard error and confidence intervals based on the covariance matrix.

## References

Andrew F. Hayes & Amanda K. Montoya (2017) A Tutorial on
Testing, Visualizing, and Probing an Interaction Involving a Multicategorical Variable in
Linear Regression Analysis, Communication Methods and Measures, 11:1, 1-30, DOI:
10.1080/19312458.2016.1271116

Andrew F. Hayes (2013) Introduction to Mediation, Moderation, and Conditional Process Analysis A Regression-Based Approach.

## One thought on “Moderation with a multicategorical moderating variable”

This site uses Akismet to reduce spam. Learn how your comment data is processed.