Dummy variables are important but also cause much frustration in intro-stat courses. Below I will demonstrate the concept via a linear regression model.
The basic idea is that a factor with levels can be replaced by dummy variables that act as switches to select different levels. When all switches are turned off, the reference level is chosen. Mathematically, let be the factor with levels , i.e. . By convention, let be the reference level chosen by the user. Now introduce the dummy variables defined by for . Note that
Assume that we are interested in the ANOVA model
given by the R
formula y ~ f
(e.g. lm(y ~ f)
).
Then R
automatically translates this into the model
with dummy variables and as defined above.
This can be illustrated in R
as follows:
f <- factor(c("l0", "l1", "l2"))
as.data.frame(model.matrix(~ f))
## (Intercept) fl1 fl2
## 1 1 0 0
## 2 1 1 0
## 3 1 0 1
So the first row is , the second , and the third .
Here we see that the intercept is the constant (“silent”) in front of such that is always included.
The parameter is the mean of the ’s for .
Notice the column name fl1
; this refers to the difference
in mean of between and . This can be seen by inspecting row two above.
The convention in R
is to concatenate the factor (variable)
name, here f
, with the level, here l1
.
As seen, the first level was taken as the reference level (silently by R
).
This is the convention: the first level of the factor is the reference level:
f <- factor(c("l0", "l1", "l2"), level = c("l1", "l0", "l2"))
as.data.frame(model.matrix(~ f))
## (Intercept) fl0 fl2
## 1 1 1 0
## 2 1 0 0
## 3 1 0 1
Sometimes the relevel()
function is useful:
f <- factor(c("l0", "l1", "l2"))
f <- relevel(f, ref = "l2") # ref: the reference level
as.data.frame(model.matrix(~ f))
## (Intercept) fl0 fl1
## 1 1 1 0
## 2 1 0 1
## 3 1 0 0
Contrasts
The above is one particular way of creating so-called contrast. There are many other ways to do it. See for example https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/.