This course gives an overview of the state-of-the-art in statistical markup, reproducible programming and scientific digital representation. Students will get to know the professional field of statistical markup and its innovations and challenges. It consists of 6 meetings in which students will learn about markup languages (( and Markdown), learn efficient and reproducible programming with rMarkdown, experience developing Shiny web apps, get to know version control with Git and will create and maintain their own data archive repository and personal (business card) page through GitHub. Combining these lectures, the students get acquainted with different viewpoints on marking up statistical manuscripts, areas of innovation, and challenges that people face when working with, analysing and reporting (simulated) data. Knowledge obtained from this course will help students face multidimensional problems during their professional career.
The final grade is computed as follows
Graded part | Weight |
---|---|
Markup manuscript | 50 % |
Research repository | 40 % |
Personal repository | 10 % |
To develop the necessary skills for completing the assignment and the presentation, 6 exercises must be made and submitted. These exercises are not graded, but students must fulfill them to pass the course.
In order to pass the course, the final grade must be 5.5 or higher, your contribution to the course should be sufficient and all assignments and practical assignments should be handed in and/or passed. Otherwise, additional work is required concerning the assignments and/or exercises you have failed.
When? | Where? | What? | |
---|---|---|---|
14-Sep | 9 am | Ruppert 011 | Monte Carlo simulation and Git |
28-Sep | 9 am | Ruppert 011 | Reproducible workflows and replication |
02-Nov | 9 am | Ruppert 011 | Typesetting equations with LaTeX |
09-Nov | 9 am | Ruppert 011 | Version control and Github in depth |
23-Nov | 9 am | Ruppert 011 | Presentations with rMarkdown |
07-Dec | 9 am | Ruppert 011 | Github pages and Shiny |
Expand \((a+b)^n\): \[ \begin{gather*} (a + b)^n\\ (a\ + \ b)^n\\ (a\quad + \quad b)^n\\ (a\qquad + \qquad b)^n \end{gather*} \] source
This course gives an overview of the state-of-the-art in statistical markup, reproducible programming and scientific digital representation. Students will get to know the professional field of statistical markup and its innovations and challenges. It consists of 6 meetings in which students will learn about markup languages (LaTeX and Markdown), learn efficient and reproducible programming with rMarkdown, experience developing Shiny web apps, get to know version control with Git and will create and maintain their own data archive repository and personal (business card) page through GitHub. Combining these lectures, the students get acquainted with different viewpoints on marking up statistical manuscripts, areas of innovation, and challenges that people face when working with, analysing and reporting (simulated) data. Knowledge obtained from this course will help students face multidimensional problems during their professional career.
Students will individually choose one statistical topic and work on a manuscript about this topic. Students will need to perform calculations and program code for this manuscript. All work for the student needs to be combined in an easy understandable and insightful data archive and will need be posted on a personal GitHub repository. This end result will be graded on 1) Quality of the markup language skills, 2) Quality of the data archive and 3) Quality of the online repository.
Students will be evaluated on the following aspects:
After taking this course students can understand innovations in statistical markup, statistical simulation and reproducible research. Students are also able to approach challenges from different professional viewpoints. They have gained experience in marking up a professional manuscript and designing a state-of-the-art statistical archive in an open source repository.
To develop the necessary skills for completing the assignment and the presentation, 6 exercises must be made and submitted. These exercises are not graded, but students must fulfil them to pass the course.
The final grade is computed as follows
Graded part | Weight |
---|---|
Markup manuscript | 50 % |
Research repository | 40 % |
Personal repository | 10 % |
In order to pass the course, the final grade must be 5.5 or higher, your contribution to the course should be sufficient and all assignments and practical assignments should be handed in and/or passed. Otherwise, additional work is required concerning the assignments and/or exercises you have failed.
Relevant documents that are used in the course and/or may provide useful additional reading will be placed on or referred to on the course website.
The research repository has to be prepared as a supplementary archive that can serve as an extensive documentation of the research (e.g. as a supplement to be submitted to a journal). The archive has to be published in a public or private GitHub repository.
This course takes place in the first semester. The course will run over 6 meetings (Wednesdays 9-12am) during the semester, with the first meeting on September 14, 2022.
Students will need their own laptop computer. Students should have experience in programming with R and should be familiar with the IDE RStudio.
We start the hard way with Git
, GitHub
and
Monte Carlo simulation. Just as with any new skill aimed at
programming and/or scripting: practice makes perfect. Follow the two
exercises for this week and you’ll have a head start on the rest
of the course of your career.
All the best,
Gerko
The following links are very useful:
An old video
walkthrough about Git
and Rstudio
GitHub Glossary for all terminology
this online
Git
book is a very good resource
This book covers pretty
much everything you need to marry git
and
R
.
I highly appreciate the clip on MC simulation by Ben Lambert (LEFT) and the quite comprehensive exploration of the origin and concepts of probability that govern Monte Carlo simulations by John Guttag (RIGHT).
Git
to manage his other projectR
The exercise for this week:
This week we’ll cover reproducible workflows with rmarkdown
in RStudio
You can find the lecture here
All the best,
Please watch the below videos
The following links are very useful:
rmarkdown
This week’s documents:
Rmd
html
This week we’ll cover equations in LaTeX
- I’m sure
you’ll love it. In order to put it to the test, we will also use
LaTeX
to design slide show presentations. Later on in this
course, we’ll focus on creating presentation with Markdown - which is
much easier, but also less flexible in obtaining perfect detailed
typesetting. For now, getting to know the basics of typesetting and
equations in LaTeX
will pay off in the future.
All the best,
Mathematicians and physicists have been using LaTeX typesetting language to craft equations in manuscripts, but now other scientists are using it too. Here’s how to get started. https://t.co/AzFaehdNXf
— Nature (@nature) October 7, 2019
This week’s excercise:
The following links are very useful for this week’s exercise:
git
in more detail.This week we’ll cover git
commands and procedures for
when the proverbial shit hits the fan.
All the best,
This week’s exercise is straightforward. Follow my lead and you’ll
learn a great deal about git
The info in these links will take you beyond the exercise
These links cover vital information about placement, sizing, etc of figures and tables and referencing. I’ll dive into it in more detail next meeting, but please study these documents links to prepare yourself.
markdown
. All
relevant materials can be found in the 2019 Archive for Week 1.rmarkdown
This week we’ll cover presentations with rmarkdown
in RStudio
You can find the slides here.
All the best,
The following links are very useful:
This week we’ll cover shiny
web-apps and
GitHub
pages. shiny
is a wonderful means to
showcase your work and offer online services; Hanne Oberman will share
with you her expertise. GitHub
pages is the way for
developers and professionals to introduce yourself to the world and host
a personal webpage right from your GitHub
. And all this is
free!
All the best,
Definitely look at the book Mastering Shiny by Hadley Wickham. This book is currently under development.
I suggest that you watch this video before you make the exercise:
Please find a sufficiently different past iteration of this course here
Your markup manuscript may be anything, but be aware that it is must be gradeable. We do not grade your manuscript on content and theoretical soundness, but will assess the visual and organizational aspects of your manuscript. Your markup manuscript must prove that you can produce a publication up to the typesetting standards of international peer-reviewed journals. So include equations, tables, figures, references, etcetera.
You should develop and publish a research archive that demonstrates a reproducible workflow. The archive should contain code, data and (if applicable) the typeset markup manuscript detailed above. Examples of research archives are:
You should showcase yourself in a personal website, a well-covered quarto or rmarkdown document, a thoroughly designed CV. Examples are:
Send us (Hanne & Gerko) an e-mail with either
devtools
package. For example, the following line of code
would install from my GitHub repo the mice
package from
respectively the main branch and from branch estimice
.
::install_github("gerkovink/mice")
devtools::install_github("gerkovink/mice@estimice") devtools
If it is not a package, but rather a series of scripts that is needed
for your research archive, you can simply load()
[workspaces] or source()
[scripts] the url()
in R
. For example,
load(url("https://www.gerkovink.com/yourdata.RData"))
Alternatively, given that the author has allowed you to do so, you can also download the code and add it to your archive with e.g. a header and readme that indicate where you’ve obtained the source code. If you alter any code for which the source is GNU GPL v3 licences, your source needs to be open too if you intend to share it.