class: inverse middle background-image: url(images/people2.png) background-position: 99% 98% background-size: 55% # *The Paradox of the Positive* ## Exploratory tools for visualising the individuals in (multivariate) longitudinal data ### Di Cook, Monash University <br> .small[Joint with Nick Tierney and Tania Prvan] <br> International Biometrics Conference <br> .tiny[Virtual Learning Series, July 20, 2020] .tiny[
<i class="fas fa-arrow-right faa-passing animated "></i>
[https://dicook.org/files/IBC2020/slides.html](https://dicook.org/files/IBC2020/slides.html)] <br> .footnote[Image credit: 2020 Australian Open spectators by Di Cook] --- background-image: url(images/singer_willett.png) background-size: 50% .huge[ ☀️ ] .footnote[Example from Singer and Willett (2003) Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence] --- class: inverse <video width="750" height="563" controls="controls" name="Exploring longitudinal data" src="http://ggobi.org/book/chap-misc/Longitudinal.mov"> </video> .footnote[[Exploring Longitudinal Data with GGobi](http://ggobi.org/book/chap-misc/Longitudinal.mov) by Di Cook on [GGobi website](http://ggobi.org)] --- class: inverse middle .huge[ 🌧 ] # Shiver. -- <p> The variation from individual to individual is much greater than the overall trend. While there may be an overall trend that matches our common belief, many individuals have a different experience. --- background-image: \url(https://suziegruber.com/wp-content/uploads/2018/06/Frayed-Rope-2-Deposit-web.jpg) background-size: cover class: inverse center # A divergence of purpose <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .pull-left[ Statistics<br> for policy ] .pull-right[ Statistics <br> for the public ] .footnote[Image source: [Suzie Graber]((https://suziegruber.com/wp-content/uploads/2018/06/Frayed-Rope-2-Deposit-web.jpg)] <!-- <img src="slides_files/figure-html/dichotomy-1.png" width="40%" /> --> --- background-image: \url(https://upload.wikimedia.org/wikipedia/commons/2/21/Frederick_Douglass_by_Samuel_J_Miller%2C_1847-52.png) background-size: 15% background-position: 100% 0% # Paradox of the positive > *Douglass orates that positive statements about American values, such as liberty, citizenship, and freedom, were an offense to the enslaved population of the United States because of their lack of freedom, liberty, and citizenship. As well, Douglass referred not only to the captivity of enslaved people, but to the merciless exploitation and the cruelty and torture that slaves were subjected to in the United States. Rhetoricians R.L. Heath and D. Waymer called this topic the "paradox of the positive" because it highlights how something positive and meant to be positive can also exclude individuals.* .footnote[[Wikipedia: What to a slave is the fourth of July ](https://en.wikipedia.org/wiki/What_to_the_Slave_Is_the_Fourth_of_July%3F)] --- .pull-left[
Aside: Should race even be a variable used in analyses?
] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">"First, if racism is a principal factor organizing social life, why not study racism rather than race? Second, why use an unscientific system of classification in scientific research?" AJPH 22 years ago, loud and clear, in plain sight, <a href="https://twitter.com/mindphul?ref_src=twsrc%5Etfw">@mindphul</a> <a href="https://t.co/aLQ5BqquIS">https://t.co/aLQ5BqquIS</a></p>— Melanie Wall (@mwallbiostat) <a href="https://twitter.com/mwallbiostat/status/1282418693750894594?ref_src=twsrc%5Etfw">July 12, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- .large[I'm going to talk about] --
<i class="fas fa-hand-pointer fa-2x faa-float animated faa-slow " style=" color:#75A34D;"></i>
.large[.purple[new tools for longitudinal data]] --
<i class="fas fa-hand-spock fa-2x faa-wrench animated faa-slow " style=" color:#75A34D;"></i>
.large[.green[to explore the individuals]] --
<i class="fas fa-hand-peace fa-2x faa-vertical animated faa-slow " style=" color:#75A34D;"></i>
.large[.orange[in the R package `brolgar`.]] --- # What is the data structure `brolgar` builds on `tsibble`, by Earo Wang. ``` *## # A tsibble: 6,402 x 9 [!] *## # Key: id [888] ## id ln_wages xp ged xp_since_ged black hispanic high_grade ## <int> <dbl> <dbl> <int> <dbl> <int> <int> <int> ## 1 31 1.49 0.015 1 0.015 0 1 8 ## 2 31 1.43 0.715 1 0.715 0 1 8 ## 3 31 1.47 1.73 1 1.73 0 1 8 ## 4 31 1.75 2.77 1 2.77 0 1 8 ## 5 31 1.93 3.93 1 3.93 0 1 8 ## 6 31 1.71 4.95 1 4.95 0 1 8 ## 7 31 2.09 5.96 1 5.96 0 1 8 ## 8 31 2.13 6.98 1 6.98 0 1 8 ## 9 36 1.98 0.315 1 0.315 0 0 9 ## 10 36 1.80 0.983 1 0.983 0 0 9 ## # … with 6,392 more rows, and 1 more variable: unemploy_rate <dbl> ``` --- # Making spaghetti .pull-left[ ```r wages %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line(alpha=0.3) + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-2-1.svg" width="100%" /> ] --- class: inverse middle center # from a spaghetti mess <img src="spaghetti_mess.gif" width="640" height="480"> .footnote[Source: giphy] --- class: inverse middle center # to controlled spaghetti handling <img src="spaghetti_clean.gif" width="640" height="480"> .footnote[Source: giphy] --- class: inverse middle center # to perfection <img src="spaghetti_perfect.gif" width="640" height="480"> .footnote[Source: giphy] --- # Its not regular .pull-left[ Using features, compute the number of measurements for each subject ```r wages %>% * features(ln_wages, n_obs) %>% ggplot(aes(x = n_obs)) + geom_bar() + xlab("Number of observations") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-3-1.svg" width="100%" /> ] --- # We could filter on this .pull-left[ ```r *wages <- wages %>% add_n_obs() wages %>% * filter(n_obs > 3) %>% select(id, ln_wages, xp, n_obs) ``` ] .pull-right[ ``` ## # A tsibble: 6,145 x 4 [!] *## # Key: id [764] ## id ln_wages xp n_obs ## <int> <dbl> <dbl> <int> ## 1 31 1.49 0.015 8 ## 2 31 1.43 0.715 8 ## 3 31 1.47 1.73 8 ## 4 31 1.75 2.77 8 ## 5 31 1.93 3.93 8 ## 6 31 1.71 4.95 8 ## 7 31 2.09 5.96 8 ## 8 31 2.13 6.98 8 ## 9 36 1.98 0.315 10 ## 10 36 1.80 0.983 10 ## # … with 6,135 more rows ``` ] --- # Subjects don't all start at the same time .pull-left[ Using features to extract minimum time ```r wages %>% * features(xp, list(min = min)) %>% ggplot(aes(x = min)) + geom_histogram(binwidth=0.5) + xlim(c(0, 13)) + xlab("First time in study") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-5-1.svg" width="100%" /> ] --- # There's a range of experience .pull-left[ Using features to extract range of time index ```r wages_xp_range <- wages %>% * features(xp, feat_ranges) ggplot(wages_xp_range, aes(x = range_diff)) + geom_histogram() + xlab("Range of experience") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-6-1.svg" width="100%" /> ] --- # Small spoonfuls of spaghetti .pull-left[ Sample some individuals ```r set.seed(20200720) wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-7-1.svg" width="100%" /> ] --- count: false # Small spoonfuls of spaghetti .pull-left[ Sample some individuals ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-8-1.svg" width="100%" /> ] --- count: false # Small spoonfuls of spaghetti .pull-left[ Sample some individuals ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-9-1.svg" width="100%" /> ] --- # Take a spoonful of different lengths .pull-left[ Sample experienced individuals ```r wages %>% add_n_obs() %>% * filter(n_obs > 7) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-10-1.svg" width="100%" /> ] --- # Take a spoonful of different lengths .pull-left[ Sample INexperienced individuals ```r wages %>% add_n_obs() %>% * filter(n_obs < 5) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-11-1.svg" width="100%" /> ] --- # Take a spoonful of different lengths .pull-left[ Sample average experience ```r wages %>% add_n_obs() %>% * filter(n_obs > 4, n_obs < 8) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-12-1.svg" width="100%" /> ] --- # Also we can - `facet_strata`: show the whole pot, neatly separated into equally portioned - `facet_sample`: show most of the pot in neatly separated portions --- <img src="slides_files/figure-html/unnamed-chunk-13-1.svg" width="100%" /> --- # Special features Compute longnostics for each subject - Slope, intercept from simple linear model - Variance, standard deviation - Jumps, differences --- # Increasing .pull-left[ ```r wages_slope <- wages %>% add_n_obs() %>% filter(n_obs > 4) %>% * add_key_slope(ln_wages ~ xp) %>% as_tsibble(key = id, index = xp) wages_slope %>% * filter(.slope_xp > 0.4) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-14-1.svg" width="100%" /> ] --- # Decreasing .pull-left[ ```r wages_slope %>% * filter(.slope_xp < (-0.7)) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-15-1.svg" width="100%" /> ] --- background-image: \url(https://cdn.mos.cms.futurecdn.net/xVDtHe3txNCijnegF8y4d6-970-80.jpg) background-size: 30% background-position: 50% 99% # A different style of five number summary Who is average? Who is different? Find those individuals who are representative of the min, median, maximum, etc of growth, using `keys_near()` .footnote[Image credit: Image credit: Flickr/paul dynamik] --- .pull-left[ ```r wages_threenum <- wages %>% add_n_obs() %>% filter(n_obs > 4) %>% key_slope(ln_wages ~ xp) %>% * keys_near(key = id, * var = .slope_xp, * funs = l_three_num) %>% left_join(wages, by = "id") %>% as_tsibble(key = id, index = xp) ``` ] .pull-right[ <img src="slides_files/figure-html/three_number_plot-1.svg" width="100%" /> ] --- <img src="slides_files/figure-html/five_number-1.svg" width="100%" /> --- # Sculpting spaghetti .pull-left[ Mixed effects model, education as fixed effect, subject random effect using slope. ```r wages_fit_int <- * lmer(ln_wages ~ xp + high_grade + * (xp |id), data = wages) wages_aug <- wages %>% add_predictions(wages_fit_int, var = "pred_int") %>% add_residuals(wages_fit_int, var = "res_int") ``` ] .pull-right[ <img src="slides_files/figure-html/model_plot-1.svg" width="100%" /> ] --- # Sample and show the data, too ```r set.seed(1) wages_aug %>% add_n_obs() %>% filter(n_obs > 4) %>% sample_n_keys(size = 12) %>% * ggplot(aes(x = xp, y = pred_int, group = id, colour = factor(id))) + geom_line() + * geom_point(aes(x = xp, y = ln_wages, colour = factor(id))) + scale_colour_ochre(palette = "emu_woman_paired") + facet_wrap(~id, ncol=4) + xlab("Years of experience") + ylab("Log wages") + invthm + theme(legend.position = "none") ``` --- <img src="slides_files/figure-html/unnamed-chunk-16-1.svg" width="100%" /> --- .pull-left[ # Multivariate Multiple response variables, expecting some association between them. - ln_wages - expens (simulated) - savings (simulated) ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-17-1.svg" width="100%" /> ] --- # Multivariate ```r library(tourr) wages_mv_df <- wages_mv %>% select(ln_wages, expens, savings, id) %>% as.data.frame() wages_12227 <- data.frame(from = 6137:6146, to = 6138:6147) %>% as.matrix() wages_735 <- data.frame(from = 408:418, to = 409:419) %>% as.matrix() edges <- rbind(wages_12227, wages_735) animate_xy(wages_mv_df[,1:3], axes = "bottomleft", col="grey90", edges = wages_12227) animate_xy(wages_mv_df[,1:3], axes = "bottomleft", col="grey90", edges = wages_735) render_gif(wages_mv_df[,1:3], grand_tour(), display_xy(axes = "bottomleft", col="grey90", edges = wages_12227), gif_file="./tour1.gif", apf=1/15, frames=200) render_gif(wages_mv_df[,1:3], grand_tour(), display_xy(axes = "bottomleft", col="grey90", edges = wages_735), gif_file="./tour2.gif", apf=1/15, frames=200) ``` --- .pull-left[ Subject 12227 <img src="tour1.gif" width="90%"> ] -- .pull-left[ Subject 735 <img src="tour2.gif" width="90%"> ] --- background-image: \url(images/mouldy-spaghetti.jpg) background-size: cover class: inverse center # I'm sorry about the wages data .footnote[Image source: [https://www.stayathomemum.com.au](https://www.stayathomemum.com.au/my-lifestyle/mouldy-food-how-far-is-too-far-gone/)] --- .pull-left[ Even though its messy, its too clean .small[ - Original cohort included 12,686 individuals, but this data has only 888 individuals - Process not transparent - Demographic groups have very different numbers ] ] .pull-right[ .small[ https://www.nlsinfo.org/content/cohorts/nlsy79/get-data - I have been trying to refresh the wages data - can extract approx 6000 individuals - many odd values, like hourly wages > $10,000 per hour - different coding of variables from one year to another ] ] --- Example of data issues <img src="slides_files/figure-html/bad-data-1.svg" width="100%" /> --- class: inverse middle center # Individual approach in use --- background-image: \url(images/NYTimes1.png) background-size: 80% .footnote[[NYTimes Coronavirus coverage](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html)] --- background-image: \url(images/NYTimes2.png) background-size: 100% .footnote[[NYTimes Coronavirus coverage](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html)] --- --- # What I hope you have heard today - We need .orange[more research], and .orange[acceptance], on methods for communicating the individual experience. - When the variation is large, summarising the variation is the honest thing to do. - .orange[Statistics needs to address the individual experience], so we can better engage with the public. *I feel this is important for evolving democracy, for literate citizenship. As more tools become readily available, people will do their own analysis*. --- background-image: \url(https://upload.wikimedia.org/wikipedia/commons/9/99/Brolga_%2835984742503%29.jpg) background-size: 40% background-position: 99% 99% class: inverse ### Read more about the `brolgar` package at # http://brolgar.njtierney.com <br> **br**owse **o**ver **l**ongitudinal data <br> **g**raphically and **a**nalytically in **R** .footnote[Image source: [wikicommons](https://upload.wikimedia.org/wikipedia/commons/9/99/Brolga_%2835984742503%29.jpg)] --- class: inverse middle background-image: url(images/people2.png) background-position: 99% 1% background-size: 35% # Acknowledgements Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan), with **iris theme** created from [xaringanthemer](https://github.com/gadenbuie/xaringanthemer). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). Slides are available at [https://dicook.org/files/IBC2020/slides.html](https://dicook.org/files/IBC2020/slides.html) and supporting files at [https://github.com/dicook/IBC2020](https://github.com/dicook/IBC2020). <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .footnote[Image credit: 2020 Australian Open spectators by Di Cook]