futures
This is an exercise in Monte Carlo simulation, that also details a reproducible workflow. But we’ll consider more about reproducibility in the next meeting.
We load the necessary packages
library(future)
library(furrr)
library(dplyr)
library(magrittr)
Next, we define the simulation as a multisession future
.
We need to fix the seed here as the parallel sequence should be
replicable.
nsim = 100
plan(multisession)
# start future mapping
SIM <- future_map(1:nsim, function(x){
x <- rnorm(5000, mean = 0, sd = 1)
M <- mean(x)
DF <- length(x) - 1
SE <- 1 / sqrt(length(x))
INT <- qt(.975, DF) * SE
return(c(mean = M,
bias = M - 0,
std.err = SE,
lower = M - INT,
upper = M + INT,
cov = M - INT < 0 & 0 < M + INT))
},
.options = furrr_options(seed = 123),
.progress = TRUE) %>%
do.call("rbind", args = .) %>%
as_tibble
SIM %>% colMeans
## mean bias std.err lower upper cov
## 0.000675346 0.000675346 0.014142136 -0.027049443 0.028400135 0.960000000
We can see that 96 out of the 100 samples cover the population value.
To identify the samples for which the population value is not
covered, we can use column cov
as it is already a logical
evaluation.
SIM %>% filter(!cov)
## # A tibble: 4 × 6
## mean bias std.err lower upper cov
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0309 0.0309 0.0141 0.00319 0.0586 0
## 2 0.0358 0.0358 0.0141 0.00812 0.0636 0
## 3 0.0370 0.0370 0.0141 0.00925 0.0647 0
## 4 0.0288 0.0288 0.0141 0.00107 0.0565 0
To present this info as a table, package DT
is a
wonderful extension to use with rmarkdown
library(DT)
SIM %>%
round(4) %>%
datatable()
To create a graph that would serve the purpose of the exercise, one could think about the following graph:
library(ggplot2)
limits <- aes(ymax = SIM$upper, ymin = SIM$lower)
SIM %>% mutate(covered = as.factor(cov)) %>%
ggplot(aes(y=mean, x=1:100, colour = covered)) +
geom_hline(aes(yintercept = 0), color = "dark grey", size = 2) +
geom_pointrange(limits) +
xlab("Simulations 1-100") +
ylab("Means and 95% Confidence Intervals")
End of solution.