+ - 0:00:00
Notes for current slide
Notes for next slide

Statistics on Street Corners: Conducting Inference for Data Plots

Di Cook
Monash University

NCB2021
June 22, 2021



https://dicook.org/files/NCB2021/slides.html
Image credit: Di Cook, 2018

1 / 48

Hello 👋🏻

2 / 48

Hello 👋🏻

2 / 48

Hello 👋🏻

Professor, Monash University, Melbourne, Australia
2 / 48

Data plots are often to make decisions, especially for deciding on whether a model fits. They can and should be integrated into the classical statistics infrastructure.

3 / 48

I'm going to talk about

4 / 48

I'm going to talk about

inference for data plots

4 / 48

I'm going to talk about

inference for data plots

examples, deer sightings in the wild, gene expression, making disease maps, and RNA-seq

4 / 48

I'm going to talk about

inference for data plots

examples, deer sightings in the wild, gene expression, making disease maps, and RNA-seq

and how you can do this too.

4 / 48

Inference for data plots requires

  1. the plot is a statistic
5 / 48

Inference for data plots requires

  1. the plot is a statistic
  2. the type of plot (specified by a grammar) implicitly defines the null hypothesis
5 / 48

Inference for data plots requires

  1. the plot is a statistic
  2. the type of plot (specified by a grammar) implicitly defines the null hypothesis
  3. a null generating mechanism provides draws from the sampling distribution, among which to embed the data plot
5 / 48

Inference for data plots requires

  1. the plot is a statistic
  2. the type of plot (specified by a grammar) implicitly defines the null hypothesis
  3. a null generating mechanism provides draws from the sampling distribution, among which to embed the data plot
  4. human (computer) observers are engaged to conduct a lineup test
5 / 48

Inference for data plots requires

  1. the plot is a statistic
  2. the type of plot (specified by a grammar) implicitly defines the null hypothesis
  3. a null generating mechanism provides draws from the sampling distribution, among which to embed the data plot
  4. human (computer) observers are engaged to conduct a lineup test
  5. statistical significance and power can be computed based on the proportion of observers choosing the data plot from the lineup
5 / 48

Why is a plot a statistic?

Many of you (hopefully) use ggplot2 to make your plots with a grammar of graphics.

ggplot(data=DATA) +
geom_something(
mapping=aes(x=VAR1, y=VAR2, colour=VAR3)
) +
extra nice styling
6 / 48

Why is a plot a statistic?

Many of you (hopefully) use ggplot2 to make your plots with a grammar of graphics.

ggplot(data=DATA) +
geom_something(
mapping=aes(x=VAR1, y=VAR2, colour=VAR3)
) +
extra nice styling

A statistic is a function of a random variable(s). This is how the mapping can be interpreted.

6 / 48

Adding data gives a visual statistic

# Get some data
library(amt)
data("deer")
data("sh_forest")
rsf1 <- deer %>%
random_points(n=1500) %>%
extract_covariates(sh_forest) %>%
mutate(forest = sh.forest == 1) %>%
rename(x=x_, y=y_, sighted=case_)
# Plot it
ggplot(data=rsf1) +
geom_point(
aes(x=x, y=y, colour=sighted),
alpha=0.7) +
extra nice styling

Observed value of the statistic

7 / 48
ggplot(rsf1) +
geom_bar(
aes(x=sighted, fill=forest),
position = "fill") +
extra nice styling

For sighted vs forest habitat the mapping requires call to stat=count:

## # A tibble: 4 x 3
## # Groups: sighted [2]
## sighted forest count
## <lgl> <lgl> <int>
## 1 FALSE FALSE 1208
## 2 FALSE TRUE 292
## 3 TRUE FALSE 560
## 4 TRUE TRUE 266

Observed value of statistic

8 / 48

Null generating mechanism: Example 1

What's the null? What would be uninteresting?

ggplot(DATA) +
geom_POINT(
aes(x=x, y=y, colour=sighted),
alpha=0.7) +
extra nice styling
9 / 48

Null generating mechanism: Example 1

What's the null? What would be uninteresting?

ggplot(DATA) +
geom_POINT(
aes(x=x, y=y, colour=sighted),
alpha=0.7) +
extra nice styling

Ho: Sightings are uniformly distributed in space

Ha: Sightings are NOT uniformly distributed in space

Null generating mechanism could be to permute the labels of sighted variable. (Or could simulated a second uniform set of points.)

9 / 48

Null generating mechanism: Example 2

What's the null? What would be uninteresting?

ggplot(DATA) +
geom_BAR(
aes(x=sighted, fill=forest),
position = "fill") +
extra nice styling
10 / 48

Null generating mechanism: Example 2

What's the null? What would be uninteresting?

ggplot(DATA) +
geom_BAR(
aes(x=sighted, fill=forest),
position = "fill") +
extra nice styling

Ho: No relationship between sighted and forest habitat

Ha: Sightings in forest habitat more likely

Null generating mechanism could also be permute the labels of sighted (or forest) variable. (Or could simulate from a binomial.)

10 / 48

Pretend you haven't seen the data plot

11 / 48

Which plot is different from the rest?

set.seed(20210622)
library(nullabor)
l <- lineup(null_permute("sighted"),
rsf1, n=6)
ggplot(l) +
geom_point(
aes(x=x, y=y, colour=sighted),
alpha=0.3) +
facet_wrap(~.sample, ncol=2) +
extra nice styling

12 / 48

Which plot is different from the rest?

set.seed(20210622)
library(nullabor)
l <- lineup(null_permute("sighted"),
rsf1, n=6)
ggplot(l) +
geom_point(
aes(x=x, y=y, colour=sighted),
alpha=0.3) +
facet_wrap(~.sample, ncol=2) +
extra nice styling

You say 1? Oh, that is the data plot.

12 / 48
set.seed(20210622)
l <- lineup(null_permute("sighted"),
rsf1, n=9)
ggplot(l) +
geom_bar(
aes(x=sighted, fill=forest),
position = "fill") +
facet_wrap(~.sample, ncol=3) +
extra nice styling

In which plot is the light brown bar on the right the tallest?

13 / 48
set.seed(20210622)
l <- lineup(null_permute("sighted"),
rsf1, n=9)
ggplot(l) +
geom_bar(
aes(x=sighted, fill=forest),
position = "fill") +
facet_wrap(~.sample, ncol=3) +
extra nice styling

In which plot is the light brown bar on the right the tallest?

Did you say 5? You're good!

13 / 48

In each case, the data plot was identifiable, and the null hypothesis would be rejected

14 / 48

Inference for graphics infrastructure

15 / 48
16 / 48

Visual inference broadens the scope of statistics

17 / 48

Let's do an actual lineup test

18 / 48

Lineup protocol

I'm going to show you a page of plots

19 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

19 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

Choose the plot that you think exhibits the most separation between groups

19 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

Choose the plot that you think exhibits the most separation between groups

If you really need to choose more than one, or even not choose any, that is ok, too

19 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

Choose the plot that you think exhibits the most separation between groups

If you really need to choose more than one, or even not choose any, that is ok, too

Ready?

19 / 48

20 / 48

The data plot is





My guess is that you didn't picked this one?

21 / 48

LDA resulted in ... that gynes had the most divergent expression patterns

Toth et al (2010) Proc. of the Royal Society

22 / 48

LDA resulted in ... that gynes had the most divergent expression patterns

Toth et al (2010) Proc. of the Royal Society

... show that foundress and worker brain profiles are more similar to each other than to the other groups.

Toth et al (2007) Science

22 / 48

True data

Null data

23 / 48

Space is big, and with few data points, classes can easily be separated

24 / 48

Space is big, and with few data points, classes can easily be separated

spuriously.

24 / 48

Space is big, and with few data points, classes can easily be separated

spuriously.



The lineup protocol can help people understand the problem.

24 / 48

If you first do dimension reduction (e.g. PCA), and then LDA, the problem goes away. LDA into three dimensions shown below.

All data

Top 12 PCs

25 / 48

Let's do another actual lineup test

26 / 48

Lineup protocol

I'm going to show you a page of plots

27 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

27 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

Choose the plot that you think exhibits the most separation between groups

27 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

Choose the plot that you think exhibits the most separation between groups

If you really need to choose more than one, or even not choose any, that is ok, too

27 / 48

Lineup protocol

I'm going to show you a page of plots

Each has a number above it, this is its id

Choose the plot that you think exhibits the most separation between groups

If you really need to choose more than one, or even not choose any, that is ok, too

Ready?

27 / 48
28 / 48

Try another one

29 / 48
30 / 48

Answers

  • In both of these lineups the data plot is in position 8.
  • The first display is a choropleth map, where big spatial areas dominate.
  • The second display is a hexagon tile map, a new display that we have designed for Australian disease communication.
31 / 48

Thyroid cancer incidence

The work is motivated by the Australian Cancer Atlas. Our experiment, using lineups showed that the hexagon tile map more effectively communicates the distribution of cancer incidence.

Kobakian and Cook (unpublished) https://github.com/srkobakian/experiment

32 / 48

What's that you say? That people can't look at so many plots?

33 / 48

What's that you say? That people can't look at so many plots?

Crowd-sourcing can help here.

33 / 48

Validation experiment

Majumder et al (2013) conducted validation study to compare the performance of the lineup protocol, assessed by human evaluators, in comparison to the classical test, using subjects employed with Amazon's Mechanical Turk.

34 / 48

Explanation of experiment

Read about it at http://datascience.unomaha.edu/turk/exp2/index.html

Ho:βk=0  vs  Ha:βk0

  • 70 lineups of size 20 plots:
    • n=100,300
    • β[6,4.5]
    • σ=5,12
  • 351 evaluations by human subjects

35 / 48

Power analysis of human evaluation relative to classical test.

Effect =n×|β|σ



Pooling the results from multiple people produces results that mirror the power of the classical test.

36 / 48

High-throughput analysis

😓

The wasps example made us worried about our own RNA-Seq analyses!

37 / 48

Lineup of our own data

I'm going to show you a page of plots

38 / 48

Lineup of our own data

I'm going to show you a page of plots

Each has a number above it, this is its orange[id].

38 / 48

Lineup of our own data

I'm going to show you a page of plots

Each has a number above it, this is its orange[id].

Choose the plot that you think exhibits the

  • steepest green line
  • with relatively small spread of the green points
38 / 48

Lineup of our own data

I'm going to show you a page of plots

Each has a number above it, this is its orange[id].

Choose the plot that you think exhibits the

  • steepest green line
  • with relatively small spread of the green points

Ready?

38 / 48
39 / 48

Experimental design 2x2 factorial:

  • Two genotypes (EV, RPA)
  • Two growing conditions (I, S)
  • Three reps for each treatment
  • Approx 60,000 genes

Results from two different procedures, edgeR and DESeq provided conflicting numbers of significant genes, but on the order of 300 significant genes.

One of the top genes was selected for the lineup study, and independent observers engaged through Amazon's Mechanical Turk.

40 / 48

How does a
discrepancy
happen?

41 / 48

Turk results

Is there any significant structure in our data?

42 / 48

Turk results

Is there any significant structure in our data?

  • 24 lineups were made, only one shown to an observer
  • 5 different positions of the data plot
  • 5 different sets of null plots

Pooling results gave a detection rate of 0.65, which is high. There is some structure to our data.

42 / 48

Two aspects of massive multiple testing

  • ruler on which to measure difference === empirical Bayes
  • false positives === False Discovery Rate
43 / 48

Two aspects of massive multiple testing

  • ruler on which to measure difference === empirical Bayes
  • false positives === False Discovery Rate


Even with these, mistakes can happen, and visualising the data remains valuable.

43 / 48
44 / 48

How to do this yourself

Get a copy of the nullabor package

install.packages("nullabor")

or

# install.packages("remotes")
remotes::install_github("dicook/nullabor")

Look at the "Get started" documentation at http://dicook.github.io/nullabor/index.html

45 / 48

Thanks for listening!

Here's what I hope you heard:

  • Plots can be embedded into an inferential framework
  • This extends the applicability of statistics to more complex problems
  • Crowd-sourcing can help manage plot evaluation
46 / 48

Additional reading

^ Buja et al (2009) Statistical Inference for Exploratory Data Analysis and Model Diagnostics, RSPT A
^ Wickham et al (2010) Graphical Inference for Infovis, TVCG
^ Hofmann et al (2012) Graphical Tests for Power Comparison of Competing Design, TVCG
^ Majumder et al (2013) Validation of Visual Statistical Inference, Applied to Linear Models, JASA
^ Yin et al (2013) Visual Mining Methods for RNA-Seq data: Examining Data structure, Understanding Dispersion estimation and Significance Testing, JDMGP
^ Zhao, et al (2014) Mind Reading: Using An Eye-tracker To See How People Are Looking At Lineups, IJITA
^ Lin et al (2015) Does host-plant diversity explain species richness in insects? Ecological Entomology
^ Roy Chowdhury et al (2015) Using Visual Statistical Inference to Better Understand Random Class Separations in High Dimension, Low Sample Size Data, CS
^ Loy et al (2017) Model Choice and Diagnostics for Linear,
Mixed-Effects Models Using Statistics on Street Corners, JCGS
^ Roy Chowdhury et al (2018) Measuring Lineup Difficulty By Matching Distance Metrics with Subject Choices in Crowd- Sourced Data, JCGS
^ Vanderplas et al (2020) Testing Statistical Charts: What Makes a Good Graph? ARSIA
^ Vanderplas et al (2021) Statistical significance calculations for scenarios in visual inference. Stat.

47 / 48

Acknowledgements

Slides created via the R package xaringan, with wattle theme created from xaringanthemer.

The chakra comes from remark.js, knitr, and R Markdown.

Slides are available at https://dicook.org/files/NCB2021/slides.html and supporting files at https://github.com/dicook/NCB2021.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Image credit: Di Cook, 2018

48 / 48

Hello 👋🏻

2 / 48
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow