About me

  • Psychology –> Statistics –> ADS
  • Associate prof
  • Developer / Teacher
  • Missing Data / Programming
  • Bayesian by default
  • Frequentist through convenience
  • Father / Married / 2 kids
  • LCE afficionado

About the course

Format

  • Some Wednesdays till the end of the year:

    • 9 am - noon
    • Ruppert 011
    • The first part of every meeting will be a lecture.
    • Every second part will be hands-on / Q&A

If you cannot make a meeting; I’d like to know beforehand.

I’d strongly suggest to participate in every meeting

Schedule

When? Where? What?
14-Sep 9 am Ruppert 011 Monte Carlo simulation and Git
28-Sep 9 am Ruppert 011 Reproducible workflows and replication
26-Oct 9 am Ruppert 011 Typesetting equations with LaTeX
02-Nov 9 am Ruppert 011 Version control and Github in depth
23-Nov 9 am Ruppert 011 Presentations with rMarkdown
07-Dec 9 am Ruppert 011 Github pages and Shiny

Exercises

To develop the necessary skills for completing this course, 6 exercises must be made and submitted:

  • These exercises are not graded, but students must fulfill them to pass the course.
  • Exercises have to be handed in before the next meeting.
  • Answers will be posted after the next meeting.

Grading

The final grade is computed as follows:

Graded part Weight
Markup manuscript 50 %
Research repository 40 %
Personal repository 10 %

Good to know

Grading considers concepts like visual appearance, readability, usefulness, efficiency of the code. Grading does not consider the theoretical or quantitative properties or scientific quality of the content!

Contact

If you need any help during this course, do one of the following

  1. Google seems to have a lot of answers if you present it with the right question
  2. Open a Github Issue in the course GitHub page. The holy grail in discussion about collaborative development. I’d prefer plenary discussions for problems, because your unique problem usually is less unique than you might think.
  3. If you really have to: send an e-mail at G.Vink@uu.nl containing your question or issue.

Questions that are beyond the scope of the course are also welcome!

  • Always post your coding/development issue with a MWE or reprex.

Goal of this course

Learn the skills and tools to present yourself and your work.

Useful for: a phd, career in data science, being at the state-of-the-art in markup programming.

What to do (not in any definitive order)

  1. Perform a (monte carlo) simulation study
  2. Create a scientific manuscript
  3. Program your simulation such that it creates a data archive
  4. Publish this archive to GitHub
  5. Tell people about yourself on GitHub
  6. Showcase your findings through a shiny web-app
  7. Make absolutely sure that others can retrace your steps
  8. Be able to replicate it all

Some useful info

GitHub offers a GitHub Student Developer Pack, including a free Github PRO account for as long as you are a student

Get your pack here

Git

Why Git

At this point in the lecture I will ramble on about how awesome Git and GitHub is.

Monte Carlo methods

Monte Carlo simulation

  • repeated sampling
  • use randomness
  • solve problems

Throwing a fair die

set.seed(123)
1/6
## [1] 0.1666667
sample(1:6, 1000000, replace = TRUE) %>% 
  table %>% prop.table
## .
##        1        2        3        4        5        6 
## 0.166897 0.166489 0.166511 0.167181 0.166725 0.166197
replicate(1000, 
          sample(1:6, 1000, replace = TRUE)) %>% 
  table %>% prop.table
## .
##        1        2        3        4        5        6 
## 0.166418 0.166835 0.166688 0.166097 0.167155 0.166807

Throwing an unfair die

sample(1:6, 1000000, replace = TRUE, prob = c(.1, .1, .1, .4, .2, .1)) %>% 
  table %>% 
  prop.table
## .
##        1        2        3        4        5        6 
## 0.100152 0.100189 0.099937 0.400395 0.199524 0.099803
replicate(1000, 
          sample(1:6, 1000, replace = TRUE, 
                 prob = c(.1, .1, .1, .4, .2, .1))) %>% 
  table %>% 
  prop.table
## .
##        1        2        3        4        5        6 
## 0.100173 0.099818 0.100338 0.400375 0.199319 0.099977

Sampling distribution

replicate(1000, rnorm(10), simplify = FALSE) %>%
  lapply(var) %>% unlist %>%
  density %>% plot