class: center, middle, inverse, title-slide #
Visualising high-dimensional spaces with application to particle physics models ###
Di Cook
Monash University ###
bit.ly/IMS-Singapore-Cook
Feb 13, 2018 --- class: inverse middle center class: middle # Outline - Visualisation of high-dimensions using tours: the [tourr](https://cran.r-project.org/web/packages/tourr/index.html) package - A library of high-dimensional shapes: the [geozoo](https://cran.r-project.org/web/packages/geozoo/index.html) package - Model in the data-space vs data in the model-space - Physics data visualisation --- .left-column[ # Tours ### Definition ] .right-column[ A .red[grand tour] is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to any low-dimensional projection; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces. ] --- .left-column[ # Tours ### Definition ] .right-column[ ![](img/tour_path.png) ] --- .left-column[ # Tours ### Definition ### Notation ] .right-column[ - `\({\mathbf x}_i \in \Re^p\)`, `\(i^{th}\)` data vector - `\(d\)` projection dimension - `\(F\)` is a `\(p\times d\)` orthonormal frame, `\(F'F=I_d\)` - The projection of `\({\mathbf x}\)` onto `\(F\)` is `\({\mathbf y}_i=F'{\mathbf x}_i\)`. - Paths of projections are given by *continuous one-parameter* families `\(F(t)\)`, where `\(t\in [a, z]\)`. Starting and target frame denoted as `\(F_a = F(a), F_z=F(t)\)`. - The animation of the projected data is given by a path `\({\mathbf y}_i(t)=F'(t){\mathbf x}_i\)`. ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ] .right-column[ - Given a starting frame `\(F_a\)`, create a new target frame `\(F_z\)`. - Initialize interpolation. Generate planar rotations, `\(R({\mathbf \tau}) = R_m(\tau_m)...R_1(\tau_1), ~~~ {\mathbf \tau}=(\tau_1, ..., \tau_m)\)` such that `\(F_z=R({\mathbf \tau})F_a\)`. - Execute interpolation - `\(t \leftarrow min(1, t)\)` - `\(F(t)=R({\mathbf \tau}t)F_a\)` (gives frame) - `\({\mathbf y}_i(t)=F(t)'{\mathbf x}_i\)` - If `\(t=1\)` break iteration, else `\(t\leftarrow t+\delta\)` - Set `\(F_a=F_z\)`, start again ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ] .right-column[ ![](img/tour_shortest_path.png) Rotation out of the projection frame, is defined by the principal basis in `\(F_a\)` and `\(F_z\)`, defining the shortest distance between the planes, computed using singular value decomposition of `\(F_a'F_z=V_a\Lambda V_z'\)`, `$$G_a=F_aV_a~~~,~~~ G_z=F_zV_z$$`. ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ### Choosing targets ] .right-column[ - *Grand:* Randomly choose target - *Little:* Basis of `\(d\)` of the `\(p\)` variables - *Local:* Randomly within a small radius - *Guided:* Define structure of interest in projection, and optimise function - *Manual:* Control the contribution of a single variable, and move along this axis ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ### Choosing targets ### PP Guidance ] .right-column[ - *Holes:* finds projections with hollow centres `\(I(F)= \frac{1-\frac{1}{n}\sum_{i=1}^{n}\exp(-\frac{1}{2}{\mathbf y}_i{\mathbf y}_i')}{1-\exp(-\frac{p}{2})}\)` - *LDA:* finds separations between classes, classically `\(I(F) = 1- \frac{|F'WF|}{|F'(W+B)F|}\)`, where `\(B=\sum_{i=1}^gn_i(\bar{{\mathbf y}}_{i.}-\bar{{\mathbf y}}_{..})(\bar{{\mathbf y}}_{i.}-\bar{{\mathbf y}}_{..})'\)` , `\(W=\sum_{i=1}^g\sum_{j=1}^{n_i}({\mathbf y}_{ij}-\bar{{\mathbf y}}_{i.})({\mathbf y}_{ij}-\bar{{\mathbf y}}_{i.})'\)` - *PDA:* finds separations between classes, when there are many variables and few points `\(I(F, \lambda) = 1-\frac{|F'((1-\lambda)W+n\lambda I_p)F|}{|F'((1-\lambda)(B+W)+n\lambda I_p)F|}\)` ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ### Choosing targets ### PP Guidance ### R package: tourr ] .right-column[ - implements all of the tours except for manual - display projection dimension `\(d=1, ..., p\)`, using density plots, scatterplots, parallel coordinates, stereo 3D, scatterplot matrix, chernoff faces, stars, andrews curves, and images - guided tour using projection pursuit indices: holes, cmass, lda, pda - possible to generate a path and play it back ] --- .pull-left[ <iframe src="https://player.vimeo.com/video/255464691" width="300" height="285" frameborder="10" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> <iframe src="https://player.vimeo.com/video/255464393" width="300" height="285" frameborder="10" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> ] .pull-right[ # Tourr package Self-reflection: Example path of a tour for `\(d=1, p=3\)`, and `\(d=1, p=4\)`. ```r animate(s3_tp, grand_tour(), display_xy( axes = "bottomleft", col=col, pch=pch, edges=edges)) ``` ] --- # Tourr package Different types of displays, and projection dimension. ```r animate_dist(flea[, 1:6]) animate_scatmat(flea[, 1:6], grand_tour(6)) animate_pcp(flea[, 1:6], grand_tour(3)) animate_faces(flea[sort(sample(1:74, 4)), 1:6], grand_tour(4)) animate_stars(flea[sort(sample(1:74, 16)), 1:6], grand_tour(5)) ``` <iframe src="https://player.vimeo.com/video/255466661" width="300" height="290" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> --- .pull-left[ <iframe src="https://player.vimeo.com/video/255467472" width="300" height="290" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> ] .pull-right[ # Tourr package Guided tour, LDA index. ```r animate_xy(flea[, 1:6], guided_tour( lda_pp(flea$species)), sphere = TRUE, col=col, axes = "bottomleft") ``` ] --- .left-column[ # Library ### Overview ] .right-column[ The `geozoo` package is a library of high-dimensional shapes, and code to generate them. This includes cubes, spheres, simplices, mobius strips, torii, boy surface, enneper surface, dini surface, klein bottles, cones, ... Web site: [http://schloerke.com/geozoo/all/](http://schloerke.com/geozoo/all/) ] --- .left-column[ # Library ### Overview ### Cubes ] .right-column[ ```r c3 <- cube.iterate(p = 3) animate(c3$points, grand_tour(), display_xy(axes = "bottomleft", edges=c3$edges)) c5 <- cube.iterate(p = 5) animate(c5$points, grand_tour(), display_xy(axes = "bottomleft", edges=c5$edges)) c5_face <- cube.face(p = 5) animate(c5_face$points, grand_tour(), display_xy(axes = "bottomleft", edges=c5_face$edges)) ``` <img src="img/cube5.png" width="200px"> <img src="img/cube5b.png" width="200px"> ] --- .left-column[ # Library ### Overview ### Cubes ### Spheres ] .right-column[ ```r s4h <- sphere.hollow(p = 4, n = 4 * 500) colnames(s4h$points) <- paste0("V", 1:4) animate(s4h$points, grand_tour(), display_xy(axes = "bottomleft")) s4s <- sphere.solid.random(p = 4, n = 4 * 500) colnames(s4s$points) <- paste0("V", 1:4) animate(s4s$points, grand_tour(), display_xy(axes = "bottomleft")) ``` <img src="img/sphere_h.png" width="200px"> <img src="img/sphere_s.png" width="200px"> ] --- .left-column[ # Library ### Overview ### Cubes ### Spheres ### Simplices ] .right-column[ ```r sp3 <- simplex(p = 3) colnames(sp3$points) <- paste0("V", 1:3) sp3$edges <- as.matrix(sp3$edges) animate(sp3$points, grand_tour(), display_xy(axes = "bottomleft", edges=sp3$edges)) sp5 <- simplex(p = 5) colnames(sp5$points) <- paste0("V", 1:5) sp5$edges <- as.matrix(sp5$edges) animate(sp5$points, grand_tour(), display_xy(axes = "bottomleft", edges=sp5$edges)) ``` <img src="img/simplex3.png" width="200px"> <img src="img/simplex5.png" width="200px"> ] --- .left-column[ # Library ### Overview ### Cubes ### Spheres ### Simplices ### Why simplices ] .right-column[ ```r olive_rf <- randomForest(area~., data=olive_sub) votes <- f_composition(olive_rf$votes) animate(votes[,-4], grand_tour(), display_xy(axes = "bottomleft", col=col, edges=sp3$edges)) ``` <img src="img/votes.png" width="400px"> ] --- class: inverse middle center
Any requests? What would you like to look at? A torus, a klein, ... ? --- .left-column[ # Library ### Overview ### Cubes ### Spheres ### Simplices ### Why simplices ### Generation ] .right-column[ - Cube: - vertices: vectors of length `\(p\)`, with all combinations of `\(0,1\)` - edges: connect all the vertices of length 1 apart - Sphere hollow: `\({\mathbf x}_i\sim N_p(0,I_p)\)`, `\(\frac{{\mathbf x}_i}{||{\mathbf x}_i||}\)` ] .footnote[Schloerke et al (2016) "Escape from Boxland" The R Journal] --- # Model in the data space - It is common to show the data in the model space, for example, predicted vs observed plots for regression, linear discriminant plots, and principal components. - By displaying the model in the high-d data space, rather than low-d summaries of the data produced by the model, we expect to better understand the fit. .footnote[Wickham et al (2015) Visualizing statistical models: Removing the blindfold, SAM] --- # Hierarchical clustering Dendrogram: data in the model space <img src="figure/unnamed-chunk-17-1.svg" style="display: block; margin: auto;" /><img src="figure/unnamed-chunk-17-2.svg" style="display: block; margin: auto;" /> --- # Hierarchical clustering Model in the data space <iframe src="https://player.vimeo.com/video/255344610" width="320" height="300" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> <iframe src="https://player.vimeo.com/video/255344258" width="320" height="300" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> --- # Multidimensional physics - Need to interpret and compare models with multiple parameters - Predictions vs measurements, theorist A model vs theorist B model - The average theorist resorts to dropping all but 1 or 2 parameters (variables) - Potentially misses multivariate associations and differences --- # Higgs boson - Data from kaggle challenge. - Two parameter view of physicists, and multiparameter view in the tour. <img src="img/higgs.png" width="800px"> --- # How dark matter interacts .pull-left[ <iframe src="https://player.vimeo.com/video/255469149" width="450" height="430" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> ] .pull-right[ <img src="figure/unnamed-chunk-21-1.svg" style="display: block; margin: auto;" /> ] --- # Scagnostics shape differences - [Graph-Theoretic Scagnostics, Leland Wilkinson, Anushka Anand, Robert Grossman](http://www.ncdm.uic.edu/publications/files/proc-094.pdf) - Describe shapes of scatterplots: Outlying, Skewed, Clumpy, Sparse, Striated, Convex, Skinny, Stringy, Monotonic. - Designed to find most interesting pairs of variables in high-dimensional data --- <img src="figure/unnamed-chunk-22-1.svg" /> --- <img src="figure/unnamed-chunk-23-1.svg" style="display: block; margin: auto;" /> --- # Discreteness of scagnostics <img src="img/scagnostics.png" width="800px"> --- # Summary - The tourr package is available for you to look beyond 2D - High-dimensional shapes, how they are defined, what they look like, how they differ is interesting - Think about ways to look at the model in the data space - .red[Challenge: new ideas for defining shape differences] --- class: inverse # Joint work! - *Tours:* Andreas Buja, Debby Swayne, Heike Hofmann, Hadley Wickham - *Library of high-d shapes:* Barret Schloerke - *Physics application:* Ursula Laa, Michael Kipp, German Valencia Contact: [
](http://www.dicook.org) dicook@monash.edu, [
](https://twitter.com/visnut) visnut, [
](https://github.com/dicook) dicook <img src="img/lorikeets.png" width="200px"> .footnote[Slides made with Rmarkdown, xaringan package by Yihui Xie, and lorikeet theme using the [ochRe package](https://github.com/ropenscilabs/ochRe). Available at [https://github.com/dicook/IMS-Singapore-talk](https://github.com/dicook/IMS-Singapore-talk].) --- # Further reading - Buja et al (2004) [Computational Methods for High-Dimensional Rotations in Data Visualization](http://stat.wharton.upenn.edu/~buja/PAPERS/paper-dyn-proj-algs.pdf) - Cook, D., and Swayne, D. [Interactive and Dynamic Graphics for Data Analysis with examples using R and GGobi](http://www.ggobi.org) - Wickham et al (2011) [tourr: An R Package for Exploring Multivariate Data with Projections](http://www.jstatsoft.org/v40) - Wickham et al (2015) Visualising Statistical Models: Removing the Blindfold (with Discussion), Statistical Analysis and Data Mining. - Schloerke, et al (2016) [Escape from Boxland](https://journal.r-project.org/archive/2016/RJ-2016-044/ index.html) --- class: middle center <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.