We use the following packages in this Practical:
library(dplyr)
library(magrittr)
library(ggplot2)
In this practical you will need to perform regression analysis and create plots with ggplot2. I give you some examples and ask from you to apply the techniques I demonstrate. For some exercises I give you the solution (e.g. the resulting graph) and the interpretation. The exercise is then to provide to me the code that generates the solution and give me the interpretation for the exercises where this is omitted.
Feel free to ask me, if you have questions.
All the best,
Gerko
y1
predicted by x1
- stored in object fit1
y2
predicted by x2
- stored in object fit2
y3
predicted by x3
- stored in object fit3
y4
predicted by x4
- stored in object fit4
I give you the code for first regression model. You need to fit the other three models yourself.
fit1 <- anscombe %$%
lm(y1 ~ x1)
fit2 <- anscombe %$%
lm(y2 ~ x2)
fit3 <- anscombe %$%
lm(y3 ~ x3)
fit4 <- anscombe %$%
lm(y4 ~ x4)
Use the following code to markup your output into a nice format
output <- data.frame(fit1 = coef(fit1),
fit2 = coef(fit2),
fit3 = coef(fit3),
fit4 = coef(fit4))
row.names(output) <- names(coef(fit1))
output
## fit1 fit2 fit3 fit4
## (Intercept) 3.0000909 3.000909 3.0024545 3.0017273
## x1 0.5000909 0.500000 0.4997273 0.4999091
output
object. What do you conclude?# These estimates are very similar.
(x1, y1)
such that y1
is on the Y-axis and make the color of the points blue This is quite simple to do with ggplot2
anscombe %>%
ggplot(aes(x = x1, y = y1)) +
geom_point(color = "blue")
In the above code we put the aesthetics aes(x = x1, y = y1)
in the ggplot()
function. This way, the aesthetics hold for the whole graph (i.e. all geoms
we specify), unless otherwise specified. Alternatively, we could specify aesthetics for individual geom
’s, such as in
anscombe %>%
ggplot() +
geom_point(aes(x = x1, y = y1), color = "blue")
We can also override the aes(x = x1, y = y1)
specified in ggplot()
by specifying a different aes(x = x2, y = y2)
under geom_point()
.
anscombe %>%
ggplot(aes(x = x1, y = y1)) +
geom_point(aes(x = x2, y = y2), color = "blue")
blue
, gray
, orange
and purple
, respectively. In other words, create the following plot:
gg <- anscombe %>%
ggplot() +
geom_point(aes(x = x1, y = y1), color = "blue") +
geom_point(aes(x = x2, y = y2), color = "gray") +
geom_point(aes(x = x3, y = y3), color = "orange") +
geom_point(aes(x = x4, y = y4), color = "purple") +
ylab("Y") + xlab("X")
gg
(y3, x3)
and (y4, x4)
where the line inherits the colour from the respective points. Hint: use geom_smooth()
.gg + # take the plot under #5 as the starting point
geom_smooth(aes(x = x3, y = y3), method = "lm", se = FALSE, color = "orange") +
geom_smooth(aes(x = x4, y = y4), method = "lm", se = FALSE, color = "purple")
Exercise 5
for all pairs but (y4, x4)
where the line inherits the colour from the respective points.gg + # take the plot under #5 as the starting point
geom_smooth(aes(x = x1, y = y1), method = "loess", se = FALSE, color = "blue") +
geom_smooth(aes(x = x2, y = y2), method = "loess", se = FALSE, color = "grey") +
geom_smooth(aes(x = x3, y = y3), method = "loess", se = FALSE, color = "orange")
fit1
HINT: use plot()
and use the plots you’ve created in exercises 5-7.
plot(fit1)
Normal Q-Q
plot.Residuals vs. Fitted
plot and the Scale-Location
plot. Again, the dip in the Scale-Location
plot can easily be explained by the small sample size and the deviation should be taken with a grain of salt.fit2
fit2
. What do you think?fit3
fit3
. What do you think?fit4
fit4
. What do you think?