Solution to Exercise 1 - using futures

This is an exercise in Monte Carlo simulation, that also details a reproducible workflow. But we’ll consider more about reproducibility in the next meeting.

We load the necessary packages

library(future)
library(furrr)
library(dplyr)
library(magrittr)

Next, we define the simulation as a multisession future. We need to fix the seed here as the parallel sequence should be replicable.

nsim = 100
plan(multisession)
# start future mapping
SIM <- future_map(1:nsim, function(x){
  x <- rnorm(5000, mean = 0, sd = 1)
  M <- mean(x)
  DF <- length(x) - 1
  SE <- 1 / sqrt(length(x))
  INT <- qt(.975, DF) * SE
  return(c(mean = M, 
           bias = M - 0, 
           std.err = SE, 
           lower = M - INT, 
           upper = M + INT, 
           cov = M - INT < 0 & 0 < M + INT))
}, 
.options = furrr_options(seed = 123), 
.progress = TRUE) %>% 
  do.call("rbind", args = .) %>% 
  as_tibble

SIM %>% colMeans

##         mean         bias      std.err        lower        upper          cov 
##  0.000675346  0.000675346  0.014142136 -0.027049443  0.028400135  0.960000000

We can see that 96 out of the 100 samples cover the population value.

To identify the samples for which the population value is not covered, we can use column cov as it is already a logical evaluation.

SIM %>% filter(!cov)

## # A tibble: 4 × 6
##     mean   bias std.err   lower  upper   cov
##    <dbl>  <dbl>   <dbl>   <dbl>  <dbl> <dbl>
## 1 0.0309 0.0309  0.0141 0.00319 0.0586     0
## 2 0.0358 0.0358  0.0141 0.00812 0.0636     0
## 3 0.0370 0.0370  0.0141 0.00925 0.0647     0
## 4 0.0288 0.0288  0.0141 0.00107 0.0565     0

To present this info as a table, package DT is a wonderful extension to use with rmarkdown

library(DT)
SIM %>% 
  round(4) %>% 
  datatable()

Show entries

Search:

	mean	bias	std.err	lower	upper	cov
1	-0.0074	-0.0074	0.0141	-0.0351	0.0204	1
2	-0.0056	-0.0056	0.0141	-0.0333	0.0222	1
3	0.0251	0.0251	0.0141	-0.0026	0.0529	1
4	-0.0112	-0.0112	0.0141	-0.0389	0.0165	1
5	-0.0069	-0.0069	0.0141	-0.0346	0.0209	1
6	-0.0206	-0.0206	0.0141	-0.0483	0.0072	1
7	-0.0125	-0.0125	0.0141	-0.0402	0.0152	1
8	0.0309	0.0309	0.0141	0.0032	0.0586	0
9	0.0043	0.0043	0.0141	-0.0234	0.0321	1
10	-0.0043	-0.0043	0.0141	-0.032	0.0235	1

Showing 1 to 10 of 100 entries

Previous1 2 3 4 5…10Next

To create a graph that would serve the purpose of the exercise, one could think about the following graph:

library(ggplot2)
limits <- aes(ymax = SIM$upper, ymin = SIM$lower)
SIM %>% mutate(covered = as.factor(cov)) %>% 
ggplot(aes(y=mean, x=1:100, colour = covered)) + 
  geom_hline(aes(yintercept = 0), color = "dark grey", size = 2) + 
  geom_pointrange(limits) + 
  xlab("Simulations 1-100") +
  ylab("Means and 95% Confidence Intervals")

End of solution.

Solution to Exercise 1 - using `futures`

Gerko Vink

Markup Languages and Reproducible Programming in Statistics