**UPDATE:** Please see the following post for an all-in-one solution: RePROCESS Model 1

This post was was inspired by Nicholas Michalak’s Novum R-ganum blog posts on reproducing Hayes’ PROCESS Model 1 in *R*. He has two posts where he presents the *R* code for examining a continuous × continuous moderation and a dichotomous × continuous moderation.

However, a quick Google search suggests a paucity of information for conducting moderation analyses using a multicategorical moderating variables.

Therefore, this post will outline how to run the PROCESS Model 1 with a multicategorical moderator (*M*) in *R*. We will examine how to code the *M* variable, simulate some data, run the PROCESS analyses in both *SPSS* and *R,* and compare the results from both software packages.

## Coding the multicategorical variable

Before conducting the analyses, the reader should decide how the contrasts for the moderating variable should be coded. Briefly, you can choose from *indicator* (or *dummy*), *effects*, *sequential*, and *Helmert* coding. A detailed discussion of the coding methods is beyond the scope of this post, so the reader is referred to Hayes and Montoya (2017) for further information.

Please note that we will be using *indicator* (or *dummy*) coding.

### Indicator coding

For a given variable with *k* categories, we need to construct *k* – 1 variables, *D*_{1}, …, *D*_{k-1}. The pattern of values for *D*_{1}, …, *D*_{k-1 }represents the group in which a case resides. For a variable with *k* = 3 categories, we would need to construct two variables *D*_{1 }and *D*_{2 }(see Table 1). Note that the reference group is coded 0 across all of the indicator variables, which is the first category in the table below.

#### Table 1: Indicator coding of *k* categories.

Category |
D_{1} |
D_{2} |
D_{3} |
… | D_{k}_{-1} |

1 | 0 | 0 | 0 | … | 0 |

2 | 1 | 0 | 0 | … | 0 |

3 | 0 | 1 | 0 | … | 0 |

4 | 0 | 0 | 1 | … | 0 |

… | … | … | … | … | 0 |

k |
0 | 0 | 0 | … | 1 |

### Interaction Terms

While it is common practice to center the continuous predictor variable *X* prior to generating the interaction terms, Hayes (2013) suggests that this may not be necessary. Nonetheless, we will center the *X* variable in this example.

Once we have generated the indicator variables, we then center the *X* variable and multiply it with each of the indicator variables. We will end up with *k* – 1 interaction terms: *X**D*_{1}, *X**D*_{2}, *X**D*_{3}, …, *X**D*_{k-1}.

## Generate data

We will generate some data for the analyses by adapting the code from Nicholas Michalak’s blog.

# Load required packages -------------------------------------------------- if (!require(pacman)) {install.packages("pacman")} pacman::p_load(MASS, # For the 'mvrnorm function ggplot, # For plotting interaction effects ggsci, # For ggplot2 scientific journal themes lavaan) # For structural equation modeling # Generate Data ----------------------------------------------------------- # Set seed to reproduce results set.seed(12345) # 900 observations # 300 state schools, 300 independent schools, 300 catholic schools # Positive relationship in State schools (r = .25); mean = 1 # Negative relationship in Independent schools (r = -.5); mean = 0 # Negtaive relationship in Catholic schools (r = -.25); mean = 1 dataSim = data.frame(id = 1:900, school = rep(x = c("State", "Independent", "Catholic"), each = 300), rbind(mvrnorm(n = 300, mu = c(1,1), Sigma = diag(x = .5, nrow = 2, ncol = 2) + .25), mvrnorm(n = 300, mu = c(0,0), Sigma = diag(x = 1.5, nrow = 2, ncol = 2) - .5), mvrnorm(n = 300, mu = c(0,0), Sigma = diag(x = 1.5, nrow = 2, ncol = 2) - .25))) dataSim = within(dataSim, { # Centre X1 X1.c = X1 - mean(X1, na.rm = TRUE) # Indicator (dummy) code the school variable D1 = ifelse(school == "Independent", 1, 0) D2 = ifelse(school == "Catholic", 1, 0) # Create Interaction Terms X1D1 = X1.c*D1 X1D2 = X1.c*D2 })

## Plot interactions

Now we will plot the interactions using `ggplot2`

.

ggplot(data = dataSim, aes(x = X1.c, y = X2, color = school)) + geom_point() + geom_smooth(method = "lm", fullrange = TRUE) + scale_y_continuous(breaks = seq(-4, 4, 1), limits = c(-4, 4)) + labs(x = "X1 (Centered)", y = "X2", color = "School") + theme_bw() + scale_color_npg()

## Conditional effect of *X* on *Y*

We now specify the simple slopes to test the slope at the difference values of the moderator.

Without the interaction terms, we have a regression model of the form:

With the interaction terms:

Based on Hayes (2013), the conditional effect of *X *on *Y *(denoted by* *)* *at each level of the moderator is given by:

For our three-level moderator, this simplifies to:

To determine the conditional effect for each level of our moderator variable, we simply plug in its D1 and D2 values into the above formula.

## Run the model

We now fit the model using the `lavaan`

package.

# Define model modModel = ' # Regressions X2 ~ b1*X1.c X2 ~ b2*D1 X2 ~ b3*D2 X2 ~ b4*X1D1 X2 ~ b5*X1D2 # Covariate(s) - not used in current example # X2 ~ covariate1 + covariate2 + ... covariateN # Define simple slopes state := b1 independent := b1 + b4 catholic := b1 + b5 ' # Use 5000 boostrap resamples (may take a few mintues) modModelFit = sem(model = modModel, data = dataSim, se = "bootstrap", bootstrap = 5000) # Get summary results summary(modModelFit, fit.measures = FALSE, standardize = FALSE, rsquare = TRUE) # Get boostrapped confidence intervals parameterEstimates(modModelFit, boot.ci.type = "bca.simple", level = .95, ci = TRUE, standardized = FALSE)

We get the following bootstrapped estimates:

lhs op rhs label est se z pvalue ci.lower ci.upper 1 X2 ~ X1.c b1 0.375 0.056 6.691 0.000 0.266 0.486 2 X2 ~ D1 b2 -0.937 0.079 -11.859 0.000 -1.087 -0.774 3 X2 ~ D2 b3 -0.915 0.089 -10.287 0.000 -1.091 -0.741 4 X2 ~ X1D1 b4 -0.890 0.074 -12.036 0.000 -1.027 -0.742 5 X2 ~ X1D2 b5 -0.554 0.080 -6.954 0.000 -0.713 -0.399 6 X2 ~~ X2 0.869 0.045 19.124 0.000 0.788 0.968 7 X1.c ~~ X1.c 1.241 0.000 NA NA 1.241 1.241 8 X1.c ~~ D1 -0.122 0.000 NA NA -0.122 -0.122 9 X1.c ~~ D2 -0.098 0.000 NA NA -0.098 -0.098 10 X1.c ~~ X1D1 0.370 0.000 NA NA 0.370 0.370 11 X1.c ~~ X1D2 0.475 0.000 NA NA 0.475 0.475 12 D1 ~~ D1 0.222 0.000 NA NA 0.222 0.222 13 D1 ~~ D2 -0.111 0.000 NA NA -0.111 -0.111 14 D1 ~~ X1D1 -0.081 0.000 NA NA -0.081 -0.081 15 D1 ~~ X1D2 0.033 0.000 NA NA 0.033 0.033 16 D2 ~~ D2 0.222 0.000 NA NA 0.222 0.222 17 D2 ~~ X1D1 0.041 0.000 NA NA 0.041 0.041 18 D2 ~~ X1D2 -0.066 0.000 NA NA -0.066 -0.066 19 X1D1 ~~ X1D1 0.355 0.000 NA NA 0.355 0.355 20 X1D1 ~~ X1D2 -0.012 0.000 NA NA -0.012 -0.012 21 X1D2 ~~ X1D2 0.466 0.000 NA NA 0.466 0.466 22 state := b1 state 0.375 0.056 6.690 0.000 0.266 0.486 23 independent := b1+b4 independent -0.515 0.048 -10.692 0.000 -0.610 -0.418 24 catholic := b1+b5 catholic -0.179 0.057 -3.161 0.002 -0.293 -0.072

Now we will compare lines 22 to 24 of the above output to that from the SPSS PROCESS.

Conditional Effect of Focal Predictor in Groups Defined by the Moderator Variable: school coeff se t p LLCI ULCI 1.0000 .3749 .0623 6.0143 .0000 .2525 .4972 2.0000 -.5152 .0546 -9.4347 .0000 -.6223 -.4080 3.0000 -.1793 .0467 -3.8427 .0001 -.2709 -.0877

Note that while both *R* and SPSS give the same coefficient estimates, the confidence intervals differ somewhat. This is because `lavaan`

uses the bootstrapped Bias Corrected (BC) confidence intervals, whereas SPSS PROCESS calculates the standard error and confidence intervals based on the covariance matrix.

## References

Andrew F. Hayes & Amanda K. Montoya (2017) A Tutorial on

Testing, Visualizing, and Probing an Interaction Involving a Multicategorical Variable in

Linear Regression Analysis, Communication Methods and Measures, 11:1, 1-30, DOI:

10.1080/19312458.2016.1271116

Andrew F. Hayes (2013) Introduction to Mediation, Moderation, and Conditional Process Analysis A Regression-Based Approach.

## One thought on “Moderation with a multicategorical moderating variable”