+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization

Lesser known intro

Amy Tzu-Yu Chen

2020-06-27

1 / 41

About Me

  • UCLA'16, Statistics

  • Data Scientist at System1 & Computational Linguistics MS Student at University of Washington

  • Happy R user since STAT 20

  • Find me at amy17519 @ Twitter, Github, and LinkedIn

2 / 41

Why is this a "lesser known" intro

You can easily find tutorials if you google "how to use "blah blah" visualization library"

(🤭 and just copy and paste code! It works!)

3 / 41

Why is this a "lesser known" intro

You can easily find tutorials if you google "how to use "blah blah" visualization library"

(🤭 and just copy and paste code! It works!)

You can even build amazing graphs without coding using Tableau etc

(🤭 and they look nice! Nicer than my ggplots sometimes!)

4 / 41

Why is this a "lesser known" intro

You can easily find tutorials if you google "how to use "blah blah" visualization library"

(🤭 and just copy and paste code! It works!)

You can even build amazing graphs without coding using Tableau etc

(🤭 and they look nice! Nicer than my ggplots sometimes!)




🤔... However, having a deeper understanding on visualization tools and process is a great asset for data practitioners

5 / 41

Why is this a "lesser known" intro

You can easily find tutorials if you google "how to use "blah blah" visualization library"

(🤭 and just copy and paste code! It works!)

You can even build amazing graphs without coding using Tableau etc

(🤭 and they look nice! Nicer than my ggplots sometimes!)




🤔... However, having a deeper understanding on visualization tools and process is a great asset for data practitioners

🤩 Know behind-the-library design philosophy ➡️ helps you understand a diverse range of graphics and powerful tools faster

6 / 41

Why is this a "lesser known" intro

You can easily find tutorials if you google "how to use "blah blah" visualization library"

(🤭 and just copy and paste code! It works!)

You can even build amazing graphs without coding using Tableau etc

(🤭 and they look nice! Nicer than my ggplots sometimes!)




🤔... However, having a deeper understanding on visualization tools and process is a great asset for data practitioners

🤩 Know behind-the-library design philosophy ➡️ helps you understand a diverse range of graphics and powerful tools faster

🤩 Practice visualization process ➡️ inform yourself, then educate your audience

7 / 41

Agenda

  • Grammar of Graphics

  • Strategy

  • Visualization Process

    • Making Exploratory Graphs
    • Making Confirmatory Graphs
  • Toolbox

  • Resources

8 / 41

Grammar of Graphics

9 / 41

The Beauty of Grammar of Graphics

In languages, grammar keeps things in order.


10 / 41

The Beauty of Grammar of Graphics

In languages, grammar keeps things in order.


If you know some grammar, you don't need to know all the vocabularies to speak.


11 / 41

The Beauty of Grammar of Graphics

In languages, grammar keeps things in order.


If you know some grammar, you don't need to know all the vocabularies to speak.


If you know some grammar of graphics, you don't need to know all coding syntax or graph types to make an informative graph

12 / 41

History - Grammar of Graphics

13 / 41

Grammar of Graphics - Components of Grammar

14 / 41

Example Data

# df_measles comes from dataset dslabs::us_contagious_diseases
str(df_measles)
## Classes 'data.table' and 'data.frame': 3825 obs. of 6 variables:
## $ disease : Factor w/ 7 levels "Hepatitis A",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ state : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : num 1928 1929 1930 1931 1932 ...
## $ weeks_reporting: num 52 49 52 49 41 51 52 49 40 49 ...
## $ count : num 8843 2959 4156 8934 270 ...
## $ population : num 2589923 2619131 2646248 2670818 2693027 ...
## - attr(*, ".internal.selfref")=<externalptr>
head(df_measles)
## disease state year weeks_reporting count population
## 1: Measles Alabama 1928 52 8843 2589923
## 2: Measles Alabama 1929 49 2959 2619131
## 3: Measles Alabama 1930 52 4156 2646248
## 4: Measles Alabama 1931 49 8934 2670818
## 5: Measles Alabama 1932 41 270 2693027
## 6: Measles Alabama 1933 51 1735 2713243
15 / 41

Grammar of Graphics

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
# Annual reported Measles cases in California
ggplot(data = CA_Measles, aes(x = year, y = count)) +
geom_line()

16 / 41

Grammar of Graphics

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
# Annual reported Measles cases in California
ggplot(data = CA_Measles, aes(x = year, y = count)) +
geom_line()

17 / 41

Grammar of Graphics

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme

Data, Aesthestics(for input data), and Geometry are required to make a minimal graph

ggplot(aes(x = year, y = count)) +
geom_line()
## Error: `data` must be a data frame.... 🤔
ggplot(data = CA_Measles) +
geom_line()
## Error in order(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
## argument 3 is not a vector
ggplot(data = CA_Measles, aes(x = year, y = count))
# No error, but you will get an empty ggplot canvas
18 / 41

Grammar of Graphics

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme

Data, Aesthestics(for input data), and Geometry are required to make a minimal graph

ggplot(aes(x = year, y = count)) +
geom_line()
## Error: `data` must be a data frame.... 🤔
ggplot(data = CA_Measles) +
geom_line()
## Error in order(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
## argument 3 is not a vector
ggplot(data = CA_Measles, aes(x = year, y = count))
# No error, but you will get an empty ggplot canvas
19 / 41

Grammar of Graphics

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme

Data, Aesthestics(for input data), and Geometry are required to make a minimal graph

ggplot(aes(x = year, y = count)) +
geom_line()
## Error: `data` must be a data frame.... 🤔
ggplot(data = CA_Measles) +
geom_line()
## Error in order(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
## argument 3 is not a vector
ggplot(data = CA_Measles, aes(x = year, y = count))
# No error, but you will get an empty ggplot canvas
20 / 41

Different Library, Similar Syntax, Same Basic Components

library(plotly)
plot_ly(CA_Measles, x = ~year, y = ~count, type = 'scatter', mode = 'lines')
library(highcharter)
hchart(CA_Measles, 'line', hcaes(x = year, y = count)
21 / 41

Strategy

22 / 41

23 / 41

Bottomline

  • Focus on showing data patterns using an appropriate fancy graph

  • Informativeness >> Clarity >> Aesthestics

  • Data visualization could be subjective, but

    The greatest value of a picture is when it forces us to notice what we never expected to see -- John W. Tukey
    -- as opposed to what we wanted to confirm.
24 / 41

Visualization Process

25 / 41

Making Exploratory Graphs

to be able to say that we looked one layer deeper, and found nothing, is a definite step forward -- though not as far as to be able to say that we looked deeper and found thus-and-suck -- John W. Tukey
  • Make LOTS of exploratory graphs, and only present those that can convince yourself and guide the audience through your data analysis

  • In this stage, we only care about informativeness! We will worry about Clarity and Aesthestics in next stage.

26 / 41

Making Exploratory Graphs - Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
ggplot(data = CA_Measles, aes(x = year, y = count)) +
geom_line()

27 / 41

Making Exploratory Graphs - Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
ggplot(data = df_measles, aes(x = year, y = count)) +
geom_line() +
facet_wrap(~state)

28 / 41

Making Exploratory Graphs - Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
df_avg_pop <- df_measles[, .(mean_pop = mean(population, na.rm = TRUE)), state]
ggplot(data = df_avg_pop, aes(x = state, y = mean_pop)) +
geom_col()

29 / 41

Making Exploratory Graphs - Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
ca_wy_noweight <- ggplot(df_measles[state %in% c("California", "Wyoming")],
aes(x = year, y = count)) +
geom_line() +
facet_wrap(~state)
ca_wy_weighted <- ggplot(df_measles[state %in% c("California", "Wyoming")],
aes(x = year, y = count / (population / 1000000))) +
geom_line() +
facet_wrap(~state)
30 / 41

Making Exploratory Graphs - Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
library(patchwork)
ca_wy_noweight / ca_wy_weighted

31 / 41

Making Exploratory Graphs - Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
ggplot(data = df_measles, aes(x = year, y = count / (population / 1000000))) +
geom_line() +
facet_wrap(~state)

32 / 41

Making Confirmatory Graphs

  • Few exploratory graphs need to become confirmatory graphs

  • Key findings or evidence in your data analysis that help draw conclusions or inform modeling decisions

  • Now we have the information we want to share, we can work on Clarity and Aesthestics

33 / 41

Making Confirmatory Graphs -- Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
ggplot(data = df_avg_pop, aes(x = reorder(state, mean_pop), y = mean_pop)) +
geom_col() + coord_flip() +
ggtitle("Average US State Population, 1928-2002") +
scale_y_continuous(labels = scales::comma) +
xlab("Avg Population") + ylab("State") +
theme_bw()

34 / 41

Making Confirmatory Graphs -- Measles

  • Data
  • Aesthestics
  • Geometry
  • Stats
  • Facets
  • Coordinate
  • Theme
ggplot(data = df_measles, aes(x = year, y = count / (population / 1000000))) +
geom_line() + facet_wrap(~state) +
ggtitle("Measle Cases per Million People by State, 1928-2002") +
scale_y_continuous(labels = scales::comma) +
xlab("State") + ylab("Cases/1m Population") +
theme_bw()

35 / 41

Toolbox

36 / 41

Visualization Toolbox

37 / 41

Visualization Toolbox

38 / 41

Resources

39 / 41

Tutorials, Videos, Books, and Paper

40 / 41

Thanks!

Slides created using the R package xaringan.

41 / 41

About Me

  • UCLA'16, Statistics

  • Data Scientist at System1 & Computational Linguistics MS Student at University of Washington

  • Happy R user since STAT 20

  • Find me at amy17519 @ Twitter, Github, and LinkedIn

2 / 41
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow