class: inverse middle background-image: \url(images/penguins.jpg) background-position: 100% 65% background-size: 55% # Going beyond 2D and 3D to visualise higher dimensions, for ordination, clustering and other models ## Di Cook <br> Monash University ### vISEC <br> June 22, 2020 [https://dicook.org/files/vISEC2020/slides_tourr.html](https://dicook.org/files/vISEC2020/slides_tourr.html) <br> <br> <br> <br> <br> .footnote[Image credit: [Gentoo Penguins, Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/thumb/0/04/Pygoscelis_papua_-Jougla_Point%2C_Wiencke_Island%2C_Palmer_Archipelago_-adults_and_chicks-8.jpg/273px-Pygoscelis_papua_-Jougla_Point%2C_Wiencke_Island%2C_Palmer_Archipelago_-adults_and_chicks-8.jpg)] --- # Outline - Getting started: tourr, spinifex, geozoo - What is a tour? - Different types of tours - Interpreting what you see - Saving your tour plot --- class: inverse middle center # Getting set up --- # `tourr` .left-code[ ```r install.packages("tourr") help(package="tourr") ``` ] .right-plot[ ```r library("tourr") ``` Implements geodesic interpolation and basis generation functions that allow you to create new tour methods from R. ] --- # `spinifex` .left-code[ ```r install.packages("spinifex") help(package="spinifex") ``` ] .right-plot[ ```r library("spinifex") ``` Implements manual control, where the contribution of a selected variable can be adjusted between -1 to 1, to examine the sensitivity of structure in the data to that variable. The result is an animation where the variable is toured into and out of the projection completely. ] --- # `geozoo` .left-code[ ```r install.packages("geozoo") help(package="geozoo") ``` ] .right-plot[ ```r library("geozoo") ``` Geometric objects defined in 'geozoo' can be simulated or displayed in the R package 'tourr'. ] --- class: inverse-nopad .tiny[ ``` ## R version 4.0.1 (2020-06-06) ## Platform: x86_64-apple-darwin17.0 (64-bit) ## Running under: macOS Mojave 10.14.6 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] geozoo_0.5.1 spinifex_0.2.0 tourr_0.5.6 ## [4] xaringanthemer_0.3.0 ## ## loaded via a namespace (and not attached): ## [1] sysfonts_0.8.1 digest_0.6.25 showtextdb_3.0 bitops_1.0-6 ## [5] magrittr_1.5 evaluate_0.14 xaringan_0.16 rlang_0.4.6 ## [9] stringi_1.4.6 rmarkdown_2.3 tools_4.0.1 stringr_1.4.0 ## [13] showtext_0.8-1 xfun_0.14 yaml_2.2.1 compiler_4.0.1 ## [17] htmltools_0.5.0 knitr_1.28 ``` ] --- Grab the `runthis.R` file from [https://github.com/dicook/vISEC2020](https://github.com/dicook/vISEC2020) in the `skills_showcase` folder. (Or the `slides_tour.Rmd` for everything!) --- class: inverse middle center # Get some new data --- ```r remotes::install_github("allisonhorst/palmerpenguins") ``` .small[ ```r library(tidyverse) library(palmerpenguins) penguins <- penguins %>% filter(!is.na(bill_length_mm)) ``` ] .tiny[
] .footnote[See https://allisonhorst.github.io/palmerpenguins/ for more details.] --- class: middle <table> <tr> <td width="40%"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Adélie_Penguin.jpg/320px-Adélie_Penguin.jpg" width="100%" /> </td> <td width="30%"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/04/Pygoscelis_papua_-Jougla_Point%2C_Wiencke_Island%2C_Palmer_Archipelago_-adults_and_chicks-8.jpg/273px-Pygoscelis_papua_-Jougla_Point%2C_Wiencke_Island%2C_Palmer_Archipelago_-adults_and_chicks-8.jpg" width="100%" /> </td> <td width="30%"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/A_chinstrap_penguin_%28Pygoscelis_antarcticus%29_on_Deception_Island_in_Antarctica.jpg/201px-A_chinstrap_penguin_%28Pygoscelis_antarcticus%29_on_Deception_Island_in_Antarctica.jpg" width="90%" /> </td> </tr> <tr> <td> Adélie .footnote[[Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Adélie_Penguin.jpg/320px-Adélie_Penguin.jpg)] </td> <td> Gentoo .footnote[[Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/thumb/0/04/Pygoscelis_papua_-Jougla_Point%2C_Wiencke_Island%2C_Palmer_Archipelago_-adults_and_chicks-8.jpg/273px-Pygoscelis_papua_-Jougla_Point%2C_Wiencke_Island%2C_Palmer_Archipelago_-adults_and_chicks-8.jpg)] </td> <td> Chinstrap .footnote[[Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/A_chinstrap_penguin_%28Pygoscelis_antarcticus%29_on_Deception_Island_in_Antarctica.jpg/201px-A_chinstrap_penguin_%28Pygoscelis_antarcticus%29_on_Deception_Island_in_Antarctica.jpg)]</td> </tr> </table> --- .left-plot[ .small[ ```r library(ochRe) ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, colour=species, shape=species)) + geom_point(alpha=0.7, size=2) + scale_colour_ochre( palette="nolan_ned") + theme(aspect.ratio=1, legend.position="bottom") ``` ] ] .right-plot[ <img src="slides_tourr_files/figure-html/runthis5-1.png" width="100%" /> ] --- class: inverse middle center # Our first tour --- .left-code[ ```r clrs <- ochre_pal( palette="nolan_ned")(3) col <- clrs[ as.numeric( penguins$species)] animate_xy(penguins[,3:6], col=col, axes="off", fps=15) ``` ] .right-plot[ <img src="penguins2d.gif" width="100%"> ] --- class: inverse middle # What did you see?
00
:
30
--- class: inverse middle - clusters ✅ -- - outliers ✅ -- - linear dependence ✅ -- - elliptical clusters with slightly different shapes ✅ -- - separated elliptical clusters with slightly different shapes ✅ -- --- .left-code[ # What is a tour? A grand tour is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to showing all possible low-dimensional projections; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces. ] .right-plot[ `\({\mathbf x}_i \in \mathcal{R}^p\)`, `\(i^{th}\)` data vector `\(F\)` is a `\(p\times d\)` orthonormal basis, `\(F'F=I_d\)`, where `\(d\)` is the projection dimension. The projection of `\({\mathbf x_i}\)` onto `\(F\)` is `\({\mathbf y}_i=F'{\mathbf x}_i\)`. Tour is indexed by time, `\(F(t)\)`, where `\(t\in [a, z]\)`. Starting and target frame denoted as `\(F_a = F(a), F_z=F(t)\)`. The animation of the projected data is given by a path `\({\mathbf y}_i(t)=F'(t){\mathbf x}_i\)`. ] --- # Geodesic interpolation between planes .left-code[ Tour is indexed by time, `\(F(t)\)`, where `\(t\in [a, z]\)`. Starting and target frame denoted as `\(F_a = F(a), F_z=F(t)\)`. The animation of the projected data is given by a path `\({\mathbf y}_i(t)=F'(t){\mathbf x}_i\)`. ] .right-plot[ <img src="images/geodesic.png" width="120%"> ] --- .left-code[ A .orange[grand tour] is like a random walk (with interpolation) through the space of all possible planes. ] .right-plot[ <img src="tour_path.gif" width="100%"> ] --- class: inverse middle center # Let's take a look at some common high-d shapes with a grand tour --- # 4D spheres .left-plot[ Hollow <img src="sphere4d_1.gif" width="70%"> ] .right-plot[ Solid <img src="sphere4d_2.gif" width="70%"> ] --- # 4D cubes .left-plot[ Hollow <img src="cube4d_1.gif" width="70%"> ] .right-plot[ Solid <img src="cube4d_2.gif" width="70%"> ] --- # Others .left-plot[ Torus <img src="torus4d.gif" width="70%"> ] .right-plot[ Mobius <img src="mobius.gif" width="70%"> ] --- class: inverse middle center # Reading axes - interpretation Length and direction of axes relative to the pattern of interest --- <img src="images/reading_axes.001.png" width="100%"> --- <img src="images/reading_axes.002.png" width="100%"> --- # Reading axes - interpretation <iframe src="penguins.html" width="800" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> --- .left-plot[ <img src="slides_tourr_files/figure-html/runthis13-1.png" width="90%" /> Gentoo from others in contrast of fl, bd ] .right-plot[ <img src="slides_tourr_files/figure-html/runthis14-1.png" width="90%" /> Chinstrap from others in contrast of bl, bm ] --- class: inverse middle left There may be multiple and different combinations of variables that reveal similar structure. ☹️ The tour can help to discover these, too. 😂 --- # Other tour types - .orange[guided]: follows the optimisation path for a projection pursuit index. - .orange[little]: interpolates between all variables. - .orange[local]: rocks back and forth from a given projection, so shows all possible projections within a radius. - .orange[dependence]: two independent 1D tours - .orange[frozen]: fixes some variable coefficients, others vary freely. - .orange[manual]: control coefficient of one variable, to examine the sensitivity of structure this variable. (In the .orange[spinifex] package) - .orange[slice]: use a section instead of a projection. --- class: inverse middle center # guided tour new target bases are chosen using a projection pursuit index function --- `$$\mathop{\text{maximize}}_{F} g(F'x) ~~~\text{ subject to } F \text{ being orthonormal}$$` .font_small[ - `holes`: This is an inverse Gaussian filter, which is optimised when there is not much data in the center of the projection, i.e. a "hole" or donut shape in 2D. - `central mass`: The opposite of holes, high density in the centre of the projection, and often "outliers" on the edges. - `LDA`/`PDA`: An index based on the linear discriminant dimension reduction (and penalised), optimised by projections where the named classes are most separated. ] --- .left-plot[ Grand <img src="penguins2d.gif" width="80%"> .small[ Might accidentally see best separation ] ] .right-plot[ Guided, using LDA index <img src="penguins2d_guided.gif" width="80%"> .small[ Moves to the best separation ] ] --- class: inverse middle center # manual tour control the coefficient of one variable, reduce it to zero, increase it to 1, maintaining orthonormality --- # Manual tour .left-plot[ - start from best projection, given by projection pursuit - bl contribution controlled - if bl is removed form projection, Adelie and chinstrap are mixed - bl is important for Adelie ] .right-plot[ <img src="penguins_manual_bl.gif" width="90%"> ] --- # Manual tour .left-code[ - start from best projection, given by projection pursuit - fl contribution controlled - cluster less separated when fl is fully contributing - fl is important, in small amounts, for Gentoo ] .right-plot[ <img src="penguins_manual_fl.gif" width="90%"> ] --- # Local tour .left-code[ Rocks from and to a given projection, in order to observe the neighbourhood ] .right-plot[ <img src="penguins2d_local.gif" width="90%"> ] --- # Projection dimension and displays .left-plot[ <img src="penguins1d.gif" width="90%"> ] .right-plot[ <img src="penguins2d_dens.gif" width="90%"> ] --- class: inverse middle center # How do I use tours --- - Classification: - to check assumptions of models - to examine separations between groups - determine variable importance - examine boundaries - random forest diagnostics vote matrix - Dimension reduction - go beyond 2 PCs - work with much higher dimensional data - check for not linear dependencies --- - Clustering - examine shape of clusters - separation between clusters - compare cluster solution - view the dendrogram in data space - Compositional data - shapes and clusters in a simplex --- # Saving for publication Method 1, using plotly (see `reading axes` code chunk): 1. Generate each frame, index each frame, a big array 2. Make one big ggplot, with all frames overplotted, and a non-used argument `frame` pointing to your index 3. Pass to `ggplotly` 4. Save to html using `htmltools::save_html()` or try using ``` spinifex::play_tour_path() ``` --- # Saving for publication Method 1, using `gifski` and `tourr::render_gif()`. See lots of code chunks! --- # Summary We can learn a little more about the data if have a tour in the toolbox. It can help us to understand - dependencies between multiple variables - examine shapes, of clusters - detect outliers --- # If you want to read more - [Visualizing statistical models: Removing the blindfold (2015)](https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.11271) - [tourr: An R Package for Exploring Multivariate Data with Projections]() --- class: middle # Thanks Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan), with **iris theme** created from [xaringanthemer](https://github.com/gadenbuie/xaringanthemer). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). Slides are available at [https://dicook.org/files/vISEC20/slides_tourr.html](https://dicook.org/files/vISEC20/slides_tourr.html) and supporting files at [https://github.com/dicook/vISEC2020](https://github.com/dicook/vISEC2020). <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.