--- Dear all, Last week we've learned about using `rmarkdown` to create a reproducible workflow. With `rmarkdown` any independent researcher would be able to quickly reproduce, evaluate and/or add to your work. Also, your collaborators can do the same and new collaborators can be quickly brought up to speed. This week we'll add another toolset to reproducibility; one that is primarily designed for you: [Git](https://git-scm.com). You can view `Git` as the ability to go back in time. Back to the very beginning of your project. `Git` integrates nicely with `RStudio`. In this exercise we will learn 1. How to integrate `Git` within our projects. 2. How to publish our projects as `GitHub` repositories All the best, Gerko --- # `Git` `Git` is a free and open source version control system for text files. It can handle extensive change logging for you, no matter the size of the project. `Git` is fast and efficient, but its effectiveness depends also on the frequency you instruct it to log your project's changes. You can see `Git` as a blank canvas that starts at a certain point in time. Every time you (or others) instruct `Git` to log any changes that have been made, `Git` adds the changes that are made to this canvas. We call the changes to the canvas [`commits`](https://help.github.com/articles/github-glossary/#commit). With every `commit` an extensive log is created that includes at least the following information: - the changes made - who made the changes - metadata - a small piece of text that describe the changes made The difference between two commits - or the changes between them - are called [`diffs`](https://help.github.com/articles/github-glossary/#diff). If you'd like to know much more about `Git`, [this online book](https://git-scm.com/book/en/v2) is a very good resource. If you'd like to practice with the command line interface [use this webpage](https://learngitbranching.js.org) for a quick course. --- # `GitHub` `GitHub` is the social and user interface to `Git` that allows you to work in [repositories](https://help.github.com/articles/github-glossary/#repository). These repositories can be seen as project folders in which you publish your work, but you can also use them as test sites for development, testing, etcetera. There is a distinction between [private repositories](https://help.github.com/articles/github-glossary/#private-repository) (only for you and those you grant access) and public repositories (visible for everyone). Your public repositories can be viewed and [forked](https://help.github.com/articles/github-glossary/#fork) by everyone. `Forking` is when other people create a copy of your repository on their own account. This allows them to work on a repository without affecting the `master`. You can also do this yourself, but then the process is called [`branching`](https://help.github.com/articles/github-glossary/#branch) instead of forking. If you create a copy of a repository that is offline, the process is called [`cloning`](https://help.github.com/articles/github-glossary/#clone). `GitHub`'s ability to branch, fork and clone is very useful as it allows other people and yourself to experiment on (the code in) a repository before any definitive changes are [`merged`](https://help.github.com/articles/github-glossary/#merge) with the `master`. If you're working in a forked repository, you can submit a [`pull request`](https://help.github.com/articles/github-glossary/#pull-request) to the repository collaborators to accept (or reject) any suggested changes. For now, this may be confusing, but I hope you recognize the benefits `GitHub` can have on the process of development and bug-fixing. For example, the most up-to-date version of the `mice` package in `R` can be directly installed from the `mice` repository with the following code: ```{r eval=FALSE} install.packages("devtools") devtools::install_github(repo = "stefvanbuuren/mice") ``` You can see that this process requires package `devtools` that expands the `R` functionality with essential development tools. Loading packages in `R` directly from their respective `GitHub` repositories, allows you to obtain the latest - often improved and less buggy - iteration of that software even before it is published on [`CRAN`](https://cran.r-project.org). --- # Installing `Git` ## Installing on Mac I suggest you install `Git` by downloading and installing [`GitHub Desktop`](https://desktop.github.com). `GitHub`'s desktop application is a nice GUI and, naturally, integrates well into the repository workflow on `GitHub`. When installed, you can go to `GitHub Desktop > Install Command Line Tool` After a reboot, all should be set. ## Installing on Windows Download and install [`Git for Windows`](https://git-for-windows.github.io), Then download and install [`GitHub Desktop`](https://desktop.github.com). `GitHub`'s desktop application is a nice GUI and, naturally, integrates well into the repository workflow on `GitHub`. After a reboot, all should be set. --- # Command line interface vs. GUI Ultimately, you'll want to learn how to use `Git` through the command line. It offers better functionality. Again, [take this 15-minute course](https://learngitbranching.js.org) to get a gentle introduction, --- # `Git` and `RStudio` To link `Git` and `RStudio`, follow [this document](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN) To learn more about maintaining a package as `GitHub` repository within `RStudio`, have a look at [this guide](http://r-pkgs.had.co.nz/git.html) by Hadley Wickham. --- # .gitignore GitHub sees every file in your repository as one of the following three - **tracked files** that have been (previously) staged and committed - **untracked files** that have not been staged or committed - **ignored files** that have been explicitly ignored It may be wise to instruct `Git` to ignore changes in some files. For example, compiled files (think about `.com`, `.exe`, `.o`, `.so`, etc), archives (e.g. `.zip`, `.tar`, `.rar`), logs (`.log`) and files generated in runtime (`.temp`) do not have to be tracked by `Git`. The same holds for hidden system files (e.g. `.DS_Store` or `Thumbs.db`). Adding such filetypes to a file named `.gitignore` and placing that file in the root of your repository will take care of focusing `Git`'s energy on useful files only. For common `.gitignore` examples, see [this repository](https://github.com/github/gitignore). There are many examples inside, such as [this `.gitignore` example for `R`](https://github.com/github/gitignore/blob/master/R.gitignore) --- # SSH keys Follow [this tutorial](https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account) to add an SSH key to your GitHub account. With an SSH key you can identify yourself to an online server (in this case the `GitHub` server) without having to log in every time. It is like your machine having access to an online server through a unique biometric security measure, but instead of biometric data a bits-and-bytes hash code is communicated every time. You will need an SSH key to link `RStudio` to your `GitHub` repository. --- # 2-Factor Authentication If you use `GitHub`'s 2FA functionality - you should! - your username and password are not sufficient to `push` `commits` to `GitHub` through `RStudio`. To solve this follow these steps on [github.com](https://github.com): 1. Log in to your account 2. Click on your profile photo (upper right corner) and select `Settings` 3. Go to `Developer settings` 4. Select `Personal access tokens` in the left sidebar 5. Click `Generate new token` 6. Give the token a name 7. Select at least the `repo` scope; you'll need these permissions to access repositories 8. Click Generate token Copy the token. The token will not be displayed again, so make a note of it, or save it somewhere. In `RStudio`, paste the generated token in the password field when `RStudio` asks for your credentials. The token will now serve as the unique authenticated link instead of your password. --- # Exercise 1. Add `Git` functionality to the project of last week's exercise and publish that project on GitHub. 2. Fork [this repository](https://github.com/gerkovink/MarkupLanguages2019) - add a new branch with your name to your fork, - add your name to the file `README.md` - send a pull request to incorporate your changes into the `upstream/master` branch. --- End of exercise