- Course overview
- Introduction to data science
- Some examples
- Pipes in
R
- Visualization with
ggplot2
Data Science and Predictive Machine Learning
R
ggplot2
I owe a debt of gratitude to many people as the thoughts and teachings in my slides are the process of years-long development cycles and discussions with my team, friends, colleagues and peers. When someone has contributed to the content of the slides, I have credited their authorship.
When figures and other external sources are shown, the references are included when the origin is known.
Opinions are my own.
You can find all materials at the following location:
The MS Teams environment can be found here. Participants can join the meeting online here
Dark data
scientistExpertise: Missing data theory, statistical programming, computational evaluation
Introduction to it all, modeling, least-squares estimation, linear regression, assumptions, model fit and complexity, intervals.
Test/training splits, crossvalidation, curse of dimensionality, underfitting and overfitting, bias/variance trade-off, logistic regression, classification and prediction, prediction evaluation, marginal likelihood, model interpretability, k-nearest neighbours and a peak into unsupervised clustering.
The curse of dimensionality on steroids, how to avoid overfitting, problems with least squares estimation, ridge regression, the lasso, elastic net regularization, model interpretability.
Support vector machines and non-linear predictions. BYOD.
How do you think that data analysis
relates to:
People from different fields (such as statistics, computer science, information science, industry) have different goals and different standard approaches.
data analysis
.In this course we emphasize on drawing insights that help us understand the data.
Source: wikimedia commons and MIMP summerschool slide 28
Challenger space shuttle - 28 Jan 1986 - 7 deaths
When high risk decisions are at hand, it paramount to analyze the correct data.
When thinking about important topics, such as whether to stay in school, it helps to know that more highly educated people tend to earn more, but also that there is no difference for top earners.
Before John Snow, people thought “miasma” caused cholera and they fought it by airing out the house. It was not clear whether this helped or not, but people thought it must because “miasma” theory said so.
If we know flu is coming two weeks earlier than usual, that’s just enough time to buy shots for very weak people.
The above examples have in common that data analysis and the accompanying visualizations have yielded insights and solved problems that could not be solved without them.