The paradox of the positive: exploratory tools for visualising the individuals in (multivariate) longitudinal data

class: inverse middle
background-image: url(images/people2.png)
background-position: 99% 98%
background-size: 55%

# *The Paradox of the Positive*

## Exploratory tools for visualising the individuals in (multivariate) longitudinal data

### Di Cook, Monash University .small[Joint with Nick Tierney and Tania Prvan] International Biometrics Conference .tiny[Virtual Learning Series, July 20, 2020]

.tiny[ [https://dicook.org/files/IBC2020/slides.html](https://dicook.org/files/IBC2020/slides.html)]

.footnote[Image credit: 2020 Australian Open spectators by Di Cook]

---
background-image: url(images/singer_willett.png)
background-size: 50%

.huge[
☀️
]

.footnote[Example from Singer and Willett (2003) Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence]

---
class: inverse

.footnote[[Exploring Longitudinal Data with GGobi](http://ggobi.org/book/chap-misc/Longitudinal.mov) by Di Cook on [GGobi website](http://ggobi.org)]

---
class: inverse middle

.huge[
🌧
]

# Shiver.
--

The variation from individual to individual is much greater than the overall trend. While there may be an overall trend that matches our common belief, many individuals have a different experience.

---
background-image: \url(https://suziegruber.com/wp-content/uploads/2018/06/Frayed-Rope-2-Deposit-web.jpg)
background-size: cover
class: inverse center

# A divergence of purpose

.pull-left[

Statistics for policy
]

.pull-right[

Statistics for the public

]

.footnote[Image source: [Suzie Graber]((https://suziegruber.com/wp-content/uploads/2018/06/Frayed-Rope-2-Deposit-web.jpg)]

---
background-image: \url(https://upload.wikimedia.org/wikipedia/commons/2/21/Frederick_Douglass_by_Samuel_J_Miller%2C_1847-52.png)
background-size: 15%
background-position: 100% 0%

# Paradox of the positive

> *Douglass orates that positive statements about American values, such as liberty, citizenship, and freedom, were an offense to the enslaved population of the United States because of their lack of freedom, liberty, and citizenship. As well, Douglass referred not only to the captivity of enslaved people, but to the merciless exploitation and the cruelty and torture that slaves were subjected to in the United States. Rhetoricians R.L. Heath and D. Waymer called this topic the "paradox of the positive" because it highlights how something positive and meant to be positive can also exclude individuals.* 
.footnote[[Wikipedia: What to a slave is the fourth of July ](https://en.wikipedia.org/wiki/What_to_the_Slave_Is_the_Fourth_of_July%3F)]

---

.pull-left[
Aside: Should race even be a variable used in analyses?
]

.pull-right[
<blockquote class="twitter-tweet">&quot;First, if racism is a principal factor organizing social life, why not study racism rather than race? Second, why use an unscientific system of classification in scientific research?&quot; AJPH 22 years ago, loud and clear, in plain sight, <a href="https://twitter.com/mindphul?ref_src=twsrc%5Etfw">@mindphul</a> <a href="https://t.co/aLQ5BqquIS">https://t.co/aLQ5BqquIS</a>&mdash; Melanie Wall (@mwallbiostat) <a href="https://twitter.com/mwallbiostat/status/1282418693750894594?ref_src=twsrc%5Etfw">July 12, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
]

---

.large[I'm going to talk about]

.large[.purple[new tools for longitudinal data]]

.large[.green[to explore the individuals]]

.large[.orange[in the R package `brolgar`.]]

---

# What is the data structure

`brolgar` builds on `tsibble`, by Earo Wang.

```
*## # A tsibble: 6,402 x 9 [!]
*## # Key: id [888]
## id ln_wages xp ged xp_since_ged black hispanic high_grade
## <int> <dbl> <dbl> <int> <dbl> <int> <int> <int>
## 1 31 1.49 0.015 1 0.015 0 1 8
## 2 31 1.43 0.715 1 0.715 0 1 8
## 3 31 1.47 1.73 1 1.73 0 1 8
## 4 31 1.75 2.77 1 2.77 0 1 8
## 5 31 1.93 3.93 1 3.93 0 1 8
## 6 31 1.71 4.95 1 4.95 0 1 8
## 7 31 2.09 5.96 1 5.96 0 1 8
## 8 31 2.13 6.98 1 6.98 0 1 8
## 9 36 1.98 0.315 1 0.315 0 0 9
## 10 36 1.80 0.983 1 0.983 0 0 9
## # … with 6,392 more rows, and 1 more variable: unemploy_rate <dbl>
```

---

# Making spaghetti

.pull-left[

```r
wages %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line(alpha=0.3) + invthm
```
]

.pull-right[
<img src="slides_files/figure-html/unnamed-chunk-2-1.svg" width="100%" />
]

---
class: inverse middle center

# from a spaghetti mess

.footnote[Source: giphy]
---
class: inverse middle center

# to controlled spaghetti handling

.footnote[Source: giphy]

---
class: inverse middle center

# to perfection

.footnote[Source: giphy]

---
# Its not regular

.pull-left[

Using features, compute the number of measurements for each subject

```r
wages %>%
* features(ln_wages, n_obs) %>%
  ggplot(aes(x = n_obs)) + 
  geom_bar() +
  xlab("Number of observations") +
  invthm
```
]

.pull-right[
<img src="slides_files/figure-html/unnamed-chunk-3-1.svg" width="100%" />
]

---

# We could filter on this

.pull-left[

```r
*wages <- wages %>% add_n_obs()
wages %>% 
* filter(n_obs > 3) %>%
 select(id, ln_wages, xp, n_obs)
```
]

.pull-right[

```
## # A tsibble: 6,145 x 4 [!]
*## # Key: id [764]
## id ln_wages xp n_obs
## <int> <dbl> <dbl> <int>
## 1 31 1.49 0.015 8
## 2 31 1.43 0.715 8
## 3 31 1.47 1.73 8
## 4 31 1.75 2.77 8
## 5 31 1.93 3.93 8
## 6 31 1.71 4.95 8
## 7 31 2.09 5.96 8
## 8 31 2.13 6.98 8
## 9 36 1.98 0.315 10
## 10 36 1.80 0.983 10
## # … with 6,135 more rows
```
]

---
# Subjects don't all start at the same time

.pull-left[
Using features to extract minimum time

```r
wages %>%
* features(xp, list(min = min)) %>%
  ggplot(aes(x = min)) + 
  geom_histogram(binwidth=0.5) +
  xlim(c(0, 13)) +
  xlab("First time in study") + 
  invthm
```
]

.pull-right[
<img src="slides_files/figure-html/unnamed-chunk-5-1.svg" width="100%" />
]

---

# There's a range of experience

.pull-left[
Using features to extract range of time index

```r
wages_xp_range <- wages %>% 
* features(xp, feat_ranges)

ggplot(wages_xp_range,
       aes(x = range_diff)) + 
  geom_histogram() + 
  xlab("Range of experience") +
  invthm
```
]

.pull-right[
<img src="slides_files/figure-html/unnamed-chunk-6-1.svg" width="100%" />
]

---

# Small spoonfuls of spaghetti

.pull-left[
Sample some individuals

```r
set.seed(20200720)
wages %>%
* sample_n_keys(size = 10) %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() +
  xlim(c(0,13)) + ylim(c(0, 4.5)) +
  xlab("Years of experience") + 
  ylab("Log wages") +
  invthm
```