- Mediation vs. ModerationClick here to view a Word document version of this guide!

Most often in statistical research there is the focus on analysis with two variables: the impact that an independent variable has on the dependent variable, or the possible relationship between two variables. Mediating and moderating variables are examples of third variables (also known as 'cofounders'), which helps you expand your research beyond simple two-variable relationships and help you develop a rounder, more in-depth picture of the real relationship between your chosen variables.

But first...how do you decide if a variable is a mediator or a moderator? It is possible for variables to be either, but understand that the two have very different roles and purposes. Mediating variables a variables which are a part of the cause and effect relationship: that is, the independent variable influences the mediating variable which in turn influences the dependent variable. Moderating variables, on the other hand, are used to understand the actual relationship between the independent and dependent variables, which can cause a change in direction, magnitude and significance.

The tabs of this guide will support you in your mediation and moderation analyses. The sections are organised as follows:

**Mediation Analysis**- "How does the intervention work?"**Mediation Scenarios**- Some examples of mediation in action**Moderation Analysis**- "For which groups does the intervention work?"**Moderation Scenarios**- Some examples of moderation in action**Conditional Process Analysis**

Mediation analysis is used in statistics to test the effect an independent variable *X* has on a dependent variable *Y* through one (or more) mediator variables. In other words, we use mediation analysis to investigate *how* *X* has an impact on *Y*.

In statistics, a mediator variable is a variable which helps the researcher to understand how two variables are related. You can think of this mediator as an 'intervening' variable, because a variable is a mediator when it is affected by the independent variable and in turn affects the dependent variable. In this way, using mediating variables in your analyses can help you assess how and why an effect is taking place between your independent and dependent variables. The mediating variable is often denoted as *M*.

The path model for mediation analysis is as follows:

You can see that the first diagram is the typical regression path model, where independent variable *X* has a path *c* onto dependent variable *Y*. In mediation analysis, this is called the total effect. In the diagram on the right however, we can see that there are two paths from *X* to *Y*, which are *c'* and *a*b*. *c* is called the direct effect of *X* on *Y*, and *a*b* is called the indirect effect.

The mediator in a mediation analysis is determined by you, the researcher. What is important to remember is that a mediator is chosen and included in the model when there is a higher statistical correlation between itself and independent variable as well as between itself and dependent variable than there is between the independent and dependent variable. We assume that there is little to no measurement error in the mediating variable, and that the dependent variable did not cause the mediator.

There are three things to consider when designing a mediation analysis and determining if mediation has occurred:

- The independent variable
*X*predicts the dependent variable*Y* - The independent variable
*X*predicts the mediator*M* - The mediator
*M*predicts the dependent variable*Y*

For example, you may wish to investigate the effect homework burden has on sleep quality, however suspect that homework burden actually impacts sleep hygiene, which in turn affects sleep quality. A mediation analysis would be the correct statistical method to perform this investigation.

The assumptions for mediation analysis are very similar to those for linear regression:

- Linearity between the relationships between
*X*,*Y*and*M* - Independence of observations
- Homoscedasticity of residuals across levels of
*X*and*M* - Normality of residuals
- No multicollinearity between
*X*and*M* - Temporal ordering

There are things you can do to still do mediation analysis if any of the assumptions are not yet:

- Non-linearity - you can use a polynomial regression or a non0linear model instead of the usual regression methods
- No independence of observations - if your data is hierarchical you can use a mixed-effects model
- No homoscedasticity of residuals - you can use robust standard errors or weighted least squares regression
- Non-normality of residuals - you can either transform
*Y*or*M*using a linear transformation, or use non-parametric methods to perform regression like bootstrapping - Multicollinearity - you can either remove highly correlated independent variables or use principal component analysis (PCA) to reduce multicollinearity

Mediation can either be partial or complete (or 'full'). The difference between the two is that, if the mediator variable was to be removed in either analysis, a relationship between independent and dependent variable in partial mediation would still exist, whereas in complete mediation there would no longer be any relationship at all. In other words, in partial mediation, the mediator is only partially explaining the relationship between independent and dependent variables, whereas in complete mediation the mediator explains the relationship fully.

That is, in partial mediation, the direct effect *c'* of *X* on *Y* remains significant after accounting for the mediator *M*, whereas this will become non-significant in a complete mediation. The indirect effect *a*b* will remain unchanged in either mediation analyses, it is only the direct effect which is different.

In order to interpret the results of a partial mediation, you will need to sensibly interpret the following:

- The direct effect
*c'*: in partial mediation this will need to be statistically significant. - The indirect effect
*a*b*: this is what shows the mediating effect of*M*in the model. - The total effect
*c*: this is the overall effect of*X*on*Y*, which combines direct and indirect effects. - The bootstrap confidence interval: like with other confidence interval interpretations, if this interval does not include 0 it shows statistical significance.

When reporting a mediation analysis, it is important to include the coefficients and significance of the direct, indirect and total effects. Sensible interpretation of these will help you determine whether a mediation effect exists between the variables.

Always pay attention to the measurement error when reporting outcomes, as measurement error can bias path estimates and produce results which may appear significant when they are in fact not, and vice versa.

Listed here are various scenarios in which mediation analysis can be performed. In all cases the dependent variable is continuous and that the sample size is sufficiently large enough to detect interaction effects.

To test the mediating effects of the mediating variable, a prior investigation of the overall effect the independent variable has on the dependent variable needs to be performed, to see if the interaction effect conjured by the mediator is significant. Remember that binary mediating variables should be dummy-coded as 0/1 in these analyses.

Due to the binary nature of the variables involved, so you can treat the variables as continuous as continue with either a structural equations model (SEM) or a regression model for the *X* → *Y* pathway, a logistic or logit model for the *X* → *M* path and finally another SEM or regression model for *M* → *Y*.

For example, we wish to see if the effect of having a University degree has on starting salary becomes mediated by work experience.

- The independent variable is university degree (yes/no)
- The dependent variable is starting salary
- The mediator variable is work experience (yes/no).

This scenario is very similar to scenario 1, and the model requires only slight adjusting in order to be successfully implemented.

As an example, we can investigate how smoking leads to a premature death via lung disease.

- The independent variable is the number of packets of cigarettes smokes a month
- The dependent variable is life expectancy
- The mediator variable is presence of lung disease (yes/no).

For instances where the independent variable is continuous but the mediator is categorical with more than two categories. structural equations model (SEM) or a regression model for the *X* → *Y* pathway, a logistic model for the *X* → *M* path and finally another SEM or regression model for *M* → *Y*.

For example, we may wish to see if the effect of phytoplankton biomass on the abundance of marine species variety is mediated by region.

- The independent variable is phytoplankton biomass
- The dependent variable is number of marine species varieties.
- The mediator variable is regions of the sea.

This is the typical scenario in which mediation analysis takes place, where either a SEM or regression model is used for each pathway *X* → *Y*, *X* → *M* and *M* → *Y*.

An example of this is seeing the mediative effects of the amount of time a couple have been in a relationship for on the cause-and-effect relationship between annual income and the amount spent on an engagement ring.

- The independent variable is annual income
- The dependent variable is price of engagement ring
- The mediator variable is length of time of relationship prior to engagement.

When the independent variable is categorical in a mediation analysis the usual equations for effects cannot be used as there is no single pathway to represent *X'*s effects on *M*. Instead, an ANOVA needs to be used: the groups in *X* require dummy coding with *k* - 1 dummy groups, where *k* is the total number of groups in the categorical independent variable, with participants in group *i* coded 1 and the rest 0. The resulting path diagram looks like this:

Note that there will be no one parameter estimate of the total effect of *X* on *Y*, so interpret your results sensibly!

As an example, we can investigate whether the effect Likert scale data on the quality of individuals' sleep has on their productivity at work is being mediated by the individuals' job satisfaction scores.

- The independent variable is quality of sleep
- The dependent variable is work productivity
- The mediator variable is job satisfaction.

One thing to bear in mind is that the more groups your categorical independent variable has, the greater the number of tests you perform in order to do the mediation analysis, and therefore the potential for Type 1 errors occurring increases.

Moderation analysis occurs when you wish to investigate the level of the current relationship which already exists between independent and dependent variable, and how this relationship changes in the presence of a third variable. This third variable in this context is the moderator *W*, and it can be either quantitative or categorical.

A moderating variable can influence the relationship in a number of ways: the relationship can be strengthened, weakened, or even turn it around and make it negative! Specifically, the moderator impacts the zero-order correlation between the independent and dependent variables. Unlike mediation analysis, in which the mediating variable acts as both a dependent and independent variable, the moderator in a moderation analysis always functions as an independent variable. Moderation can be determined by looking for significant interactions between the moderating and independent variable, and like in mediation analysis with the mediator, we assume that there is no (or at least very little) measurement error in the moderator, and that the dependent variable did not cause the moderator.

The path model for moderation is as follows:

In this diagram, there is the direct effect of *X* on *Y*, the direct effect of *W* on *Y* and the interaction effect *X*W* on *Y*.

To see if your analysis requires moderation, you can simply include an interaction term between the independent and moderator variables in your model and test for a significance in the interaction. You could also plot the independent and dependent variables on a graph or chart at different levels of your moderator variable and see how the moderator effects their relationship.

Moderator variables can affect the strength, direction and magnitude of a relationship between independent and dependent variables. Moderation analysis involves measuring the effect a moderator variable has on a simple regression test: significant interaction terms indicate a successful moderation.

Be aware that a significant effect may be detected even though the moderator may only affect the relationship across some levels but not all.

The assumptions for moderations are very similar to those for linear regression:

- Linearity between
*X*,*Y*and*W*. - Independence of observations
- Homoscedasticity of residuals across all levels of
*X*and*W* - Normality of residuals
- No multicollinearity between
*X*and*W*

When these assumptions are not met, there are things you can do to your data in order to still be able to perform moderation:

- Non-linearity - you can use a polynomial regression or generalised additive model (GAM)
- No independence of observations - if your data is hierarchical you can use a mixed-effects model
- No homoscedasticity of residuals - you can use robust standard errors or weighted least squares regression
- Non-normality of residuals - you can use non-parametric methods to perform regression, like bootstrapping
- Multicollinearity - you can either remove highly correlated independent variables or use principal component analysis (PCA) to reduce multicollinearity

When using a moderation analysis in regression, the usual things to report in regards to regression still apply: the R^2 value, the F statistic and the *p* value. You still need to report if your results were statistically significant or not and interpret the results appropriately, linking back to real-life consequences and answering the research question.

When using a moderation analysis within an ANOVA, again the usual things to report for ANOVA still apply.

It is important to remember to draw a real-life conclusion when interpreting your statistical results. Remember, you have performed this analysis for a reason.

Below are some scenarios in which moderation analysis can be performed. Note that in all cases the dependent variable is continuous and the sample size is sufficient to detect interaction effects..

To test the moderating effects of the mediating variable, a prior investigate the overall effect the independent variable has on the dependent variable to see if the interaction effect conjured by the moderator is significant. Remember that binary moderating variables should be dummy-coded as 0/1 in these analyses.

This case is the most simple in which a moderation analysis takes place, as this can simply be performed with a 2x2 ANOVA. This is because the interaction which takes place during this test is indicative of the relationship between an independent variable and a moderator. The usual assumptions for performing a 2 x 2 ANOVA will apply here.

An example of this scenario involves the effect a visit to a petting zoo has on stress levels between undergraduate and postgraduate students pre-exam season.

- The independent variable is the zoo visit (pre- and post-visit)
- the dependent variable is the stress levels
- the moderator is the students' level of study (undergraduate or postgraduate).

In other words, we are seeing if students' level of study changes the effect a visit to a petting zoo has on students' stress levels.

In this instance, since our independent variable is continuous, it is beneficial to perform a regression analysis. With this type of analysis, you need to take care that your variables are approximately normally distributed, there is no multicollinearity present between the independent and moderator variables, and there needs to be a linear relationship between all three of your variables. These are in addition to the usual assumptions of regression analysis.

An example for this scenario is investigating if having already passed a previous exam impacts how well revision influences a student's exam scores.

- The independent variable is the amount of time spent revising prior to the exam
- The dependent variable is the exam score
- The moderator is the result of the previous exam (pass or fail)

Scenario 3 is very similar to Scenario 2 and operates in the same way: with a regression analysis with the above assumptions. However, it is effective for you to perform a Potthoff analysis to see if the regression between independent and dependent variables change across the groups of the categorical moderator.

An example for this involves seeing the impact the amount of a new medicine has on the quality of life patients suffering from a disease score according to the treatment plan they receive.

- The independent variable is medicine dosage
- the dependent variable is the quality of life scores
- the moderator is the treatment plan (group therapy, physiotherapy, group therapy + physio, nothing (control group)).

Continuous moderators should be converted to a categorical dummy coded variable

An example of this scenario is the relationship between a film's ratings from critics and film ratings from the audience are being moderated by the number of years ago the film came out. The moderator can be dummy coded to be more than and less than ten years old.

- The independent variable is the critics' scores
- The dependent variable is the audience's scores
- The moderator is if the film is more or less than ten years old (0/1).

In other words, we are investigating how critics' opinions influence audience's opinions over time.

Conditional process analysis (CPA) take mediation and moderation analyses further by not only investigating how and why a process occurs but also the contingencies of the variables. This is an analysis which uses a combination of mediation and moderation to investigate in-depth the complex relationship between independent and dependent variables when a model combining levels of mediation and moderation are made to provide a greater understanding.

To conduct CPA, researchers need to use path analysis and regression analysis using software such as SPSS to estimate the direct, indirect and conditional effects of the variables involved. To perform CPA in SPSS, the PROCESS macro needs to be installed first.

CPA is used to measure associations between variables and cause-and-effect hypotheses can be supported or rejected with sufficient extra research done on the variables involved (independent and dependent, as well as the choice of mediator(s) and moderator(s)). It can answer questions such as:

- Is a mediation effect present?
- Is a moderation effect present?
- If a mediation effect exists, does this change with the presence of the moderator?
- If a moderation effect exists, can this be explained with the presence of the mediator?
- As what levels of the moderator is the mediation significant?
- How do different levels of a moderator affect the effect of the mediator?

CPA can also be explained with the terms 'moderated mediation' and 'mediated moderation': two similar but distinct concepts.

In moderated mediation analysis, you have a usual mediation model however have one of the mediation pathways be moderated by another variable (the moderator *W*). This kind of analysis can come about when a mediation analysis requires further evaluation to explain the strength or direction of the relationship between the variables involved. In other words, the moderator influences the mediation model.

There are several different ways in which a moderator has an influence in a mediated relationship, and careful action must be taken to input the correct moderator into the correct location.

To interpret the results of a moderated mediation analysis, you will need to look at three things:

- The path coefficients: you will need to see the coefficients and their significance for the paths
*X*→*M*,*M*→*Y*,*X*→*Y*and wherever the path involving*W*exists. - The conditional indirect effects: you will need to see how the indirect effect of
*X*on*Y*through*M*varies at different levels of*W*. - The index of moderated mediation: this will tell you whether the mediation effect varies significantly a different levels of
*W*.

By appropriately interpreting the above, you will be able to determine if the moderated mediation model is the best one to explain your variables' relationships.

Mediated moderation is similar to moderated mediation but this time the moderation pathway is being mediated by another variable - the mediator *M*. With this type of analysis, the interaction effect of the independent and moderator variables on the dependent variable is explained through the mediator.

Interpreting the results of a mediated moderation analysis involves your interpretations of the following:

- The path coefficients: these are the paths from
*X*→*M*,*M*→*Y*and*X*W*→*Y*. - The indirect effect: to see if the mediation of the interaction effect is significant, you will need to see if the indirect effect of the interaction term on
*Y*through*M*is significant. - The conditional effects: this will tell you how the indirect effect of
*X*on*Y*through*M*varies at different levels of*W*. - The moderated mediation index: this will tell you whether the mediation effect varies significantly a different levels of
*W*.