Week 1: Markup Languages

Introduction to Markup Languages in general and LaTeX in particular

Important

This week’s slides can be found here


Introduction

Dear all,

Welcome to MLaRPiS. In this course you will learn a great deal about presenting and structuring your research. We start doing so with \(\LaTeX\), an environment for typesetting documents that is particularly useful for complex content, such as graphics, tables and equations. The result of a LaTeX document is a perfectly typeset pdf that adheres to a set of predefined rules. Many journals and preprint services in our field require LaTeX documents. This is why you have to learn it. The resulting flexibility and customizability when using LaTeX is why you should learn it.

I will take you through the basics of the scientific LaTeX family in a couple of walkthroughs and optional exercises. In these exercises we will treat:

  1. the introduction to LaTeX
  2. managing references with BibTeX
  3. including equations
  4. creating tables and displaying figures

Mastering a new scripting or programming environment is not done by simple exercises, but requires practice and repetition. Don’t worry, the documents you will have to produce during this course will require you to repeat the LaTeX process (Wk1) and the processes and skills in other weeks over and over. However, you should work together with others and share your insights, findings and documents. We’ll cover those bases in other weeks.

Of course you will run into problems and/or difficulties. In general: the internet is your friend. Chances are that someone else encountered the same issue and, most likely, a solution has been posed by an expert user. If, however, you seem stuck; please post an issue on GitHub. Collectively, we can then troubleshoot your problem and help you reach a solution. Just remember that outside of class hours, answers may take a bit longer.

LaTeX is extremely flexible and allows you to typeset documents with ‘surgical precision’. There are however languages that allow you to create basic documents much quicker, but without the level of detail that LaTeX offers. One such a language is Markdown. This document is created with quarto, an implementation that goes far beyond Markdown and can be compiled from within RStudio. Of course it is required that you learn LaTeX, but since Markdown and Quarto allow for direct integration of LaTeX and Html5, it can be a very valuable tool. If you’d like to see the Quarto files (.qmd), feel free to browse around the course materials repository at github.com/gerkovink/markup.

Enough general intro. Let’s start,

Gerko and Hanne


The TeX framework

TeX’s structure bares close resemblance to that of the R-project. The core functionality can easily be expanded by users by means of packages. These packages are stored in a centralized location called the Comprehensive TeX Archive Network (CTAN). The TexLive and MacTex distributions contain an image of many of the available packages, meaning that you will have most functionality available at all times (even offline).

Just like with R, TeX is a software implementation that requires an editor to work with. LaTeX is a set of macros that make TeX easier for the users. From this moment on, if I speak about TeX, I am most likely refering to LaTeX.

TeX is widely used by publishers as it give the user full control about the appearance of the document. It is designed to write in such a way that a minimum of attention is required by the user to typeset the document (as opposed to e.g. typesetting hell MS Word).


Installing a TeX framework

If you are on Windows or Linux, I suggest you install the easy to install ProTeXt distribution or the TexLive distribution. If you are on a Mac, the MacTex distribution will give you everything you need. If you use the default installation parameters, everything you might ever need is included in these distributions. Both distributions come with an excellent (but basic) editor (Texworks in Texlive and TexShop in MacTex), but if you want to go the fancy way: there are very good alternatives. See this page for a comprehensive overview of all TeX editors and code editors with TeX capabilities.

Alternatively, you can use the online TeX editor Overleaf. Please pay attention to any data privacy regulations. It may not be an option for all your research endeavors.


The structure of a LaTeX document

Open the file LaTeX_template.tex from the course page. This file is a template that you may use for your documents in this course. The file LaTeX_template.pdf is the typeset version of this document (the output result). We will go through the document line-by-line:

    \documentclass[10pt, fullpage, a4paper, titlepage]{article}

This is the line that tells TeX what the class of the document is and how it should be interpreted dimension-wise. In this situation, we use a 10 point font size (options are 10, 11 and 12 point), use the full page, use A4 paper size (as opposed to e.g. US letter format) and require a titlepage. The class of the document is set to be article. There are many deviations from the defaults. A simple online search usually gets you the option you desire.

    \usepackage{graphicx, latexsym}
    \usepackage{setspace}
    \usepackage{apalike}
    \usepackage{amssymb, amsmath, amsthm}
    \usepackage{bm}
    \usepackage{epstopdf}
    \usepackage[]{hyperref}

We load these packages by default, because they sum up pretty much everything needed to begin working with LaTeX as a statistician. They govern devices such as graphics, mathematical notation (normal and bold face), and so on. Package hyperref is particularly interesting, because it allows you to set the meta-info for your document and it allows you to specify the way links and references in your document are treated. Meta-info is needed to make your documents indexable and, hence, more visible to you and everyone on the internet (if your document is on the internet).

    \hypersetup{
    pdftitle={title of the pdf},
    pdfauthor={your name},
    pdfsubject={cool stuff},
    pdfkeywords={koala, chuck norris},
    bookmarksnumbered=true,     
    bookmarksopen=true,         
    bookmarksopenlevel=1,       
    colorlinks=true,            
    pdfstartview=Fit,           
    pdfpagemode=UseOutlines,      
    pdfpagelayout=TwoPageRight
    }

These are the options set for package hyperref. You can specify the document and author information and add keywords. The other options are also relevant, but we will not discuss them now.

    %\singlespacing
    %\onehalfspacing
    \doublespacing

If you like to have a single-spaced document (commented out by %, so not executed), a one-and-a-half-spaced document (commented out by %, so also not executed) or a double-spaced document (not commented out by %, so executed), these options from package setspace are very handy.

    \title{title of your paper\\ \small subtitle of your paper}
    \author{name}
    %\date{\today}
    \date{}

Give titlepage information. We have set a title and a smaller subtitle. A line-break in tex is denoted by \\, although most of the times you won’t need to use this as tex takes care of this for you in most situations. The \small command tells tex that the remainder of this textbox is to be printed in a smaller font. All functions in tex are preceded by \. So, \author{} is the function for author, \date{} the function for the date and \today the function that prints todays date.

Up until now, we have not done anything. No code is executed. We have just been considering the preamble of a tex document. To start a document, we use \begin{document}.

    \begin{document}

We can then tell tex to print the titlepage information that we assigned (title, author, date)

    \maketitle

and continue the document on a new page:

    \newpage

We start the first section, labelled abstract:

    \section*{Abstract}
    text of abstract

Then a section called Introduction:

    \section{Introduction}
    text introduction

With a subsection called sub introduction:

    \subsection{sub introduction}
    text text text

And we end the document by the \end{} command.

    \end{document}

Exercise on TeX classes - optional

In this optional exercise you can make yourself familiar with some of the TeX classes. A class is a clear definition of the type of document. There are classes for books, articles, letters, resumes, and so on. There are even classes for journals and bookseries by publishers, such as Springer, Elsevier, CRC and Sage.

Classes are usually linked to style files, the files that define the looks of a document. Have a look at the LaTeX background archive if you’d like to know more about how classes and styles make documents look different. It is important to realize that not every document by default accommodates for the type of content. Some classes make some content look ridiculous. Imagine printing an A5 document on an A1 canvas, or submitting a job application as a statistical journal manuscript. Choosing a proper class is therefore essential. The good thing with LaTeX is that different classes can easily be applied to the same content.


Exercise

Use the text in the Virgil - Aeneid.txt file as the content for the following five documents:

  1. An article with 12 point font size on US letter paper.
  2. A book with 10 point font size on A4 paper.
  3. A book with 10 point font size on A5 paper.
  4. A minimal document with 12 point font size on A5 paper.
  5. A letter with 12 point font size on A5 paper.

You just need to specify the paragraphs correctly. The solution to this exercise can be found here


Getting jumpstarted with LaTeX, bibliographies, citing and compiling

I have created many documents and walkthroughs about TeX and LaTeX, most of which are already outdated.

The folks at Overleaf have provided a much more up-to-date series. I suggest that you go throught the following chapters:

  1. Learn LaTeX in 30 minutes
  2. Because most of you are new to LaTeX, I suggest you learn biblatex: Bibliography management with biblatex
  3. Choosing a LaTeX compiler
  4. Mathematical notation in LaTeX

If - like me (Gerko) - you still come across new notation conventions and would like to know how to write expressions with it in LaTeX: use detexify to draw the symbol and obtain the LaTeX code.


Further Reading

Chances are that you will struggle at some point with a problem or a compilation error that will lead to much frustration. The internet is your friend: search for the error or problem and you will most likely find the solution. Additionally, these are good resources to study:

Even though the internet may hold most answers to your problem, please don’t struggle for too long. Post an issue on GitHub.


Exercise for this week

Create a document that contains [1-5] and then change the document with [6]. Submit both documents as well as the LaTeXdiff.

  1. A titlepage
  2. An equation
  3. A section and a subsection
  4. The resulting figure and table with captions from this file to be generated by the below code:
#load package lattice
library(lattice)
library(xtable) # generate the LaTeX code for tables
#fix the random generator seed
set.seed(123)
#create data
data <- rnorm(1000)
#plot histogram
histogram(data)
#plot density 
densityplot(data^12 / data^10, xlab = expression(data^12/data^10))
#plot stripplot
stripplot(data^2, xlab = expression(data^2))
#plot boxplot
bwplot(exp(data))
#matrix with all data used
data.all <- cbind(data = data, 
                  squared1 = data^12 / data^10,
                  squared2 = data^2,
                  exponent = exp(data))
  1. The text from this AI-generated fairytale
  2. Also create another document with LaTeXdiff version where the text in fairytale is replaced by this slight variation on the same fairytale

Handing in the exercise

Create a zipped archive of your all necessary (e.g. .bib .tex and .pdf files and upload it to the correct folder (Wk1) via Add File > Upload Files in the course collaboration repository. Name the file yourname.zip.



Back to course schedule