DSCI 100
Attribution: images in these slides that are not accompanied by code mostly come from
The Fundamentals of Data Visualization by Claus O. Wilke
Artwork by @allison_horst
image source: R for Data Science by Grolemund & Wickham
Ask a question, then answer it
The purpose of a visualization is to answer a question about a dataset of interest.
A good visualization answers the question clearly. A great visualization also hints at the question itself.
Visualizations alone help us answer two types of questions:
(we need more tools + visualizations to answer the others)
ggplotggplot2. There are three key aspects of plots in ggplot2:
+ggplotggplot is loaded in with the tidyverse package in R, or can be loaded on its own! We need a number of functions from various packages from the tidyverse (including dplyr, so we’ll load in tidyverse:
A variable refers to a characteristic of interest and can be:
The types of variables (along with the question we wish to answer/explore) we have may depict the type of data visualization we should use.
Scatterplots are used to visualize the relationship between two quantitative variables
Line plots are used to visualize trends with respect to an independent quantity
Not coding in these slides? You can find co2_df as a csv file here
Barplots are used to visualize the comparison of amounts
Not coding in these slides? You can find islands_df as a csv file here
Histograms are used to visualize the distribution of a single quantitative variable
Not coding in these slides? You can find gapminder_2016 data as a csv file here
Notes:
1) No tables / pie charts

Which one is easier to interpret?
Notes:
2) No 3D visualizations
Notes:
the third dimension does not improve the reading of the data
these plots are difficult to interpret because of the distorted effect of perspective associated with the third dimension.
3) Use simple, colourblind-friendly colour palettes
Notes:
4) Include labels and legends, make them legible
Remember: a great visualization tells its own story without needing you to be there explaining things
Notes:
5) Avoid overplotting
Generally, need to use an alternative geometric object
Add alpha = 0.2 to geom_point()
Notes:
Time to work on our worksheet!
Before Next Class: Please register for a free GitHub account (this will help you follow along!) https://github.com/signup
Need a data-viz refresh? Check out this optional video.