Dear all,
Last week we’ve learned about using rmarkdown
to create a reproducible workflow. With rmarkdown
any independent researcher would be able to quickly reproduce, evaluate and/or add to your work. Also, your collaborators can do the same and new collaborators can be quickly brought up to speed.
This week we’ll add another toolset to reproducibility; one that is primarily designed for you: Git. You can view Git
as the ability to go back in time. Back to the very beginning of your project.
Git
integrates nicely with RStudio
. In this exercise we will learn
Git
within our projects.GitHub
repositoriesAll the best,
Gerko
Git
Git
is a free and open source version control system for text files. It can handle extensive change logging for you, no matter the size of the project. Git
is fast and efficient, but its effectiveness depends also on the frequency you instruct it to log your project’s changes.
You can see Git
as a blank canvas that starts at a certain point in time. Every time you (or others) instruct Git
to log any changes that have been made, Git
adds the changes that are made to this canvas. We call the changes to the canvas commits
. With every commit
an extensive log is created that includes at least the following information:
The difference between two commits - or the changes between them - are called diffs
.
If you’d like to know much more about Git
, this online book is a very good resource. If you’d like to practice with the command line interface use this webpage for a quick course.
GitHub
GitHub
is the social and user interface to Git
that allows you to work in repositories. These repositories can be seen as project folders in which you publish your work, but you can also use them as test sites for development, testing, etcetera. There is a distinction between private repositories (only for you and those you grant access) and public repositories (visible for everyone).
Your public repositories can be viewed and forked by everyone. Forking
is when other people create a copy of your repository on their own account. This allows them to work on a repository without affecting the master
. You can also do this yourself, but then the process is called branching
instead of forking. If you create a copy of a repository that is offline, the process is called cloning
.
GitHub
’s ability to branch, fork and clone is very useful as it allows other people and yourself to experiment on (the code in) a repository before any definitive changes are merged
with the master
. If you’re working in a forked repository, you can submit a pull request
to the repository collaborators to accept (or reject) any suggested changes.
For now, this may be confusing, but I hope you recognize the benefits GitHub
can have on the process of development and bug-fixing. For example, the most up-to-date version of the mice
package in R
can be directly installed from the mice
repository with the following code:
install.packages("devtools")
devtools::install_github(repo = "stefvanbuuren/mice")
You can see that this process requires package devtools
that expands the R
functionality with essential development tools. Loading packages in R
directly from their respective GitHub
repositories, allows you to obtain the latest - often improved and less buggy - iteration of that software even before it is published on CRAN
.
Git
I suggest you install Git
by downloading and installing GitHub Desktop
. GitHub
’s desktop application is a nice GUI and, naturally, integrates well into the repository workflow on GitHub
.
When installed, you can go to GitHub Desktop > Install Command Line Tool
After a reboot, all should be set.
Download and install Git for Windows
, Then download and install GitHub Desktop
. GitHub
’s desktop application is a nice GUI and, naturally, integrates well into the repository workflow on GitHub
.
After a reboot, all should be set.
Ultimately, you’ll want to learn how to use Git
through the command line. It offers better functionality. Again, take this 15-minute course to get a gentle introduction,
Git
and RStudio
To link Git
and RStudio
, follow this document
To learn more about maintaining a package as GitHub
repository within RStudio
, have a look at this guide by Hadley Wickham.
GitHub sees every file in your repository as one of the following three
It may be wise to instruct Git
to ignore changes in some files. For example, compiled files (think about .com
, .exe
, .o
, .so
, etc), archives (e.g. .zip
, .tar
, .rar
), logs (.log
) and files generated in runtime (.temp
) do not have to be tracked by Git
. The same holds for hidden system files (e.g. .DS_Store
or Thumbs.db
). Adding such filetypes to a file named .gitignore
and placing that file in the root of your repository will take care of focusing Git
’s energy on useful files only. For common .gitignore
examples, see this repository. There are many examples inside, such as this .gitignore
example for R
Follow this tutorial to add an SSH key to your GitHub account. With an SSH key you can identify yourself to an online server (in this case the GitHub
server) without having to log in every time. It is like your machine having access to an online server through a unique biometric security measure, but instead of biometric data a bits-and-bytes hash code is communicated every time. You will need an SSH key to link RStudio
to your GitHub
repository.
If you use GitHub
’s 2FA functionality - you should! - your username and password are not sufficient to push
commits
to GitHub
through RStudio
. To solve this follow these steps on github.com:
Settings
Developer settings
Personal access tokens
in the left sidebarGenerate new token
repo
scope; you’ll need these permissions to access repositoriesCopy the token. The token will not be displayed again, so make a note of it, or save it somewhere.
In RStudio
, paste the generated token in the password field when RStudio
asks for your credentials. The token will now serve as the unique authenticated link instead of your password.
Git
functionality to the project of last week’s exercise and publish that project on GitHub.README.md
upstream/master
branch.End of exercise