class: inverse middle background-image: url(images/wasps.gif) background-position: 90% 90% background-size: 30% # Building plots to explore data ## *Beyond histograms, barcharts and scatterplots to high-dimensions and statistical inference* ### Professor Di Cook <br> Monash University ### WSC 2019, Kuala Lumpur <br>Aug 23, 2019 Slides can be viewed at <br> [http://dicook.org/files/WSC2019/slides#1](http://dicook.org/files/WSC2019/slides#1) --- # How many clusters do you see? .pull-left[ <img src="images/toth_PRSB_2010.png" style="width: 100%; align: center" /> ] -- .pull-right[ <br> <br> <br> <br> <br> <br> <br> <br> .large[.green[... Hold onto this thought ...]] ] --- .large[I'm going to talk about] --
<i class="fas fa-hand-pointer fa-2x faa-float animated faa-slow " style=" color:#75A34D;"></i>
.large[.purple[inference for data plots]] --
<i class="fas fa-hand-peace fa-2x faa-vertical animated faa-slow " style=" color:#75A34D;"></i>
.large[.orange[with a focus on high-dimensional data,]] --
<i class="fas fa-hand-spock fa-2x faa-wrench animated faa-slow " style=" color:#75A34D;"></i>
.large[.green[and the use of tours]] --- background-image: url(images/vis_inf.png) background-size: contain --- # Visual inference 1. the plot is a statistic -- 2. the type of plot (specified by a grammar) implicitly defines the null hypothesis -- 3. a null generating mechanism provides draws from the sampling distribution, among which to embed the data plot -- 4. human observers are engaged to examine a lineup -- 5. statistical significance and power can be computed based on the proportion of observers choosing the data plot from the lineup --- # Lineup protocol I'm going to show you a page of plots -- Each has a
number
above it, this is its id -- Choose the plot that you think exhibits the
most separation
between groups -- *If you really need to choose more than one, or even not choose any, that is ok, too* --
Ready?
--- <img src="slides_files/figure-html/lineup of the wasps-1.png" width="100%" />
01
:
00
--- .pull-left[ The data plot is <img src="slides_files/figure-html/true wasp data plot-1.png" width="90%" /> ] .pull-right[ <img src="images/toth_PRSB_2010.png" style="width: 90%; align: center" /> It is the same data, as shown before. But I think that no-one noticed this. ] --- .pull-left[ <img src="images/toth_PRSB_2010.png" style="width: 80%; align: center" /> > LDA resulted in ... that gynes had the most divergent expression patterns .footnote[Toth et al (2010) Proc. of the Royal Society] ] -- .pull-right[ <img src="images/toth_Science_2007.png" style="width: 75%; align: center" /> > ... show that foundress and worker brain profiles are more similar to each other than to the other groups. .footnote[Toth et al (2007) Science] ] --- .pull-left[ True data <img src="slides_files/figure-html/true wasp data plot again-1.png" width="90%" /> ] .pull-right[ Null data <img src="slides_files/figure-html/null wasp data plot-1.png" width="90%" /> ] --- class: inverse middle center Space is big, and with few data points, classes can easily be separated --
spuriously
-- <br> <br>
The lineup protocol can help people understand the problem
--- If you first do dimension reduction (e.g. PCA), and then LDA, the problem goes away. LDA into three dimensions shown below. .pull-left[ All data <img src="images/wasps_true.gif" style="width: 75%; align: center" /> ] .pull-right[ Top 12 PCs <img src="images/wasps_pca_true.gif" style="width: 75%; align: center" /> ] --- class: inverse middle center .large[😓] Now we were worried about our own RNA-Seq analyses! --- # Lineup of our own data I'm going to show you a page of plots -- Each has a
number
above it, this is its id -- Choose the plot that you think exhibits the -
steepest green line
- with relatively
small spread
of the green points --
Ready?
--- background-image: url(images/plot_turk9_interaction_2_1.png) background-position: 50% 85% background-size: 75% --- background-image: url(images/RNASeq_disagreement.png) background-position: 90% 15% background-size: 40% Experimental design 2x2 factorial: - Two genotypes (EV, RPA) - Two growing conditions (I, S) - Three reps for each treatment - Approx 60,000 genes Results from two different procedures, edgeR and DESeq provided conflicting numbers of significant genes, but on the order of 300 significant genes. One of the top genes was selected for the lineup study, and independent observers engaged through Amazon's Mechanical Turk. --- # Turk results
Is there any significant structure in our data?
-- - 24 lineups were made, only one shown to an observer - 5 different positions of the data plot - 5 different sets of null plots Small number of observers engaged to evaluate the lineups --- .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Lineup name </th> <th style="text-align:right;"> No. detects </th> <th style="text-align:right;"> No. evals </th> <th style="text-align:right;"> Prop. detect </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_1_3.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 5 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 5 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_2_3.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_4_2.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_4_5.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_5_1.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_5_3.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_5_4.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_5_5.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_1_2.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 6 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 8 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.75 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_1_4.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.75 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_2_2.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.75 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_2_1.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 2 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.67 </td> </tr> </tbody> </table> ] -- .pull-right[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Lineup name </th> <th style="text-align:right;"> No. detects </th> <th style="text-align:right;"> No. evals </th> <th style="text-align:right;"> Prop. detect </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_2_4.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 6 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.67 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_2_5.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 2 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.67 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_4_3.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 2 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.67 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_3_5.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 5 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 8 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.62 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_1_5.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 5 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.60 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_3_2.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 2 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.50 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_3_4.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 6 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.50 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_4_4.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 3 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.33 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_5_2.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 2 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 6 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.33 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_3_1.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 4 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_3_3.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 2 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.00 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;font-size: 14px;"> plot_turk9_interaction_4_1.svg </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 1 </td> <td style="text-align:right;background-color: white !important;font-size: 14px;"> 0.00 </td> </tr> </tbody> </table> ] <br> <br> Overall detection rate 0.65 is high. There is some structure to our data.
<i class="fas fa-bolt faa-wrench animated faa-slow " style=" color:#75A34D;"></i>
--- background-image: url(images/RNASeq_explanation.png) background-position: 90% 60% background-size: 60% ## How does a <br> discrepancy <br> happen? --- class: inverse middle Two aspects of massive multiple testing - ruler on which to measure difference === .yellow[empirical Bayes] - false positives === .yellow[False Discovery Rate] -- <br>
Even with these, mistakes can happen, and visualising the data remains valuable
--- background-image: url(images/RNASeq_top25.png) background-position: 50% 50% background-size: contain --- # Someone else's data Data from "Sex-specific and lineage-specific alternative splicing in primates" Blekhman, Marioni, Zumbo, Stephens, Gilad, Genome Research, 2010 20: 180-189, http://genome.cshlp.org/content/suppl/2009/12/16/gr.099226.109.DC1.html Yields 3630 differentially expressed genes, at `\(FDR<0.01\)` Lineups created for top 10, 95'th-104'th, 995'th-1004'th, 1995'th-2004'th most significant genes. Pick the plot that shows the most difference between the two groups. --- background-image: url(images/human_chimp1.png) background-position: 50% 50% background-size: contain --- background-image: url(images/human_chimp2.png) background-position: 50% 50% background-size: contain --- background-image: url(images/human_chimp3.png) background-position: 50% 50% background-size: contain --- background-image: url(images/human_chimp_results.png) background-position: 80% 50% background-size: 50% Data is in positions 8, 5, 17 .green[*p-values*] from <br> human observer <br> study shown, <br> indicates only about <br> .green[*100 important genes*]. --- # Summary I've shown you some examples of working with .pink[high-dimensional, small sample size data]. -- We've worked through some graphics that were useful in .pink[diagnosing the modeling, and testing]. -- You've learned how to fit .pink[data visualisation] into a .pink[hypothesis testing] framework. -- Hopefully, you are taking away the message that .pink[visualisation] remains, and is .pink[increasingly important] for the large complex data we are working with today. --- class: inverse center background-image: url(images/wasps.gif) background-position: 50% 80% background-size: 30% <br> <br> Visualising data goes way beyond barcharts and pie charts and line plots, and scatterplots --- # References - Toth et al (2007) [Wasp Gene Expression ... Science 318](https://www.ncbi.nlm.nih.gov/pubmed/17901299) - Toth et al (2010) [Brain transcriptomic analysis ... Proc Roy Soc B](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2010.0090) - Roy Chowdhury et al (2015) [Using Visual Statistical Inference ... in High Dimension, Low Sample Size Data, Comp. Stat., 30(2):293-316](http://rd.springer.com/article/10.1007/s00180-014-0534-x) - Yin et al (2013) [Visual Mining Methods for RNA-Seq data ..., J. Data Mining in Genomics & Proteomics, 4(139)](https://www.longdom.org/open-access/visual-mining-methods-for-rnaseq-data-data-structure-dispersion-estimation-and-significance-testing-2153-0602.1000139.pdf) - R packages: [tourr](https://github.com/ggobi/tourr), [nullabor](https://github.com/dicook/nullabor) --- class: middle # Thanks for listening Joint work with Niladri Roy Chowdhury, Mahbub Majumder, Tengfei Yin, Heike Hofmann. Slides created with [R Markdown](https://rmarkdown.rstudio.com) using the R package [**xaringan**](https://github.com/yihui/xaringan), with **iris theme** created from [xaringanthemer](https://github.com/gadenbuie/xaringanthemer). Animated icons were created using the [**anicon**](https://github.com/emitanaka/anicon) package. GIF renderer for the tourr by Stuart Lee. Slides are available at [https://dicook.org/files/WSC2019/slides.html](https://dicook.org/files/WSC2019/slides.html) and supporting files at [https://github.com/dicook/WSC2019](https://github.com/dicook/WSC2019). --- class: middle center <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.