- Covid and the course
- Course pages
- Course overview
- Introduction to SLV
- Some examples
- Data Wrangling
Supervised Learning and Visualization
Parts of this week’s slides may be based on materials from previous iterations of
Data Analysis and Visualization courses. The authors of these materials include, but may not be limited to: Erik-Jan van Kesteren, Daniel Oberski and Peter van der Heijden.
When figures and other external sources are shown, the references are included when the origin is known.
With the exception of the first lecture, all lectures are on location. There are some rules by which we obide:
The first lecture will be recorded because of schedule clashes.
The on-location lectures will not be recorded.
If you feel that you are stuck, and the wait for the Q&A session is too long: open a GitHub issue here.
reprexto detail your issue, when code is involved.
If you expect that you are going to miss some part(s) of the course, please notify me via a private MS-Teams message.
You can find all materials at the following location:
All course materials should be submitted through a pull-request from your Fork of
The structure of your submissions should follow the corresponding repo’s README. To make it simple, I will add an example for the first of each submission type.
If you are unfamiliar with GitHub, forking and/or pull-request, please study this exercise from one of my other courses. There you can find video walkthroughs that detail the process.
All three have a PhD in statistics and a ton of experience in development, data analysis and visualization.
|1||Data wrangling with
|2||The grammar of graphics||GV||R4DS|
|3||Exploratory data analysis||GV||R4DS FIMD|
|4||Statistical learning: regression||MC||ISLR, TBD|
|5||Regression model evaluation||MC||ISLR, TBD|
|6||Statistical learning: classification||EJvK||ISLR, TBD|
|7||Classification model evaluation||EJvK||ISLR, TBD|
|8||Nonlinear models||MC||ISLR, TBD|
|9||Bagging, boosting, random forest and support vector machines||MC||ISLR, TBD|
Each weak we have the following:
Twice we have:
Once we have:
We will make groups on Monday Sept 13!
|Description||EDA; unsupervised learning||One-sample t-test|
|Explanation||Visual mining||Causal inference|
|Prescription||Personalised medicine||A/B testing|
Exploratory Data Analysis:
Describing interesting patterns: use graphs, summaries, to understand subgroups, detect anomalies, understand the data
Examples: boxplot, five-number summary, histograms, missing data plots, …
Regression: predict continuous labels from other values.
Examples: linear regression, support vector machines, regression trees, … Classification: predict discrete labels from other values.
Examples: logistic regression, discriminant analysis, classification trees, …
How do you think that
data analysis relates to:
People from different fields (such as statistics, computer science, information science, industry) have different goals and different standard approaches.
In this course we emphasize on drawing insights that help us understand the data.