class: center, middle, inverse, title-slide #
Visualisation of high-dimensional spaces with application to econometric data and models ###
Di Cook
Monash University ###
bit.ly/Melb_Econ
Sep 12, 2018 --- background-image: url(img/You_can't_see_beyond_3D.png) background-size: contain .left-column[ # High-dimensions ### You can't see beyond 3D ] .right-column[  ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ] .right-column[  Source: https://ultraculture.org/blog/2014/12/16/heres-visual-guide-10-dimensions-reality/ ] --- class: inverse middle center [Its not like that at all!](https://media1.tenor.com/images/d4c398703092842d4d17020b5414b0ee/tenor.gif?itemid=4995777) --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ### Its more like... [Flatland: A Romance of Many Dimensions (1884) Edwin Abbott Abbott](https://en.wikipedia.org/wiki/Flatland) ] .pull-right[ <p> <img src="img/Houghton_EC85_Ab264_884f_-_Flatland,_cover.jpg" width="80%"> The story describes a two-dimensional world occupied by geometric figures, where women are simple line-segments, and men are polygons with various numbers of sides. ] --- # How we see high-dimensions in statistics.. <img src="img/cubes.png" width="100%"> Increasing dimension adds an additional orthogonal axis. <p>If you want more high-dimensional shapes there is an R package, [geozoo](http://schloerke.com/geozoo/all/), which will generate cubes, spheres, simplices, mobius strips, torii, boy surface, enneper surface, dini surface, klein bottles, cones, various polytopes, ... --- .left-column[ # High-dimensions ### You can't see beyond 3D ### A universe of 10 dimensions ### Its more like... ### And in statistics it is everywhere ] .pull-right[ - Principal component analysis - Multidimensional scaling - Factor analysis - Projection pursuit - Regression - Linear discriminant analysis - Cluster analysis - Multivariate distributions - Posterior distributions in Bayesian models ] --- class: center <iframe src="s5.html" width="800" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> Can you tell the difference between these 5D objects? -- <p> .red[ Yep? You can see beyond 3D!] --- class: center <iframe src="multiDA.html" width="700" height="400" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> Can you tell the difference between these 10D objects? -- <p> .red[ Yep? You really can see beyond 3D!] -- <p> Set A are genes identified by Sarah Romanes [multiDA](https://github.com/sarahromanes/multiDA) procedure; set B are a random sample of genes. Sarah's selection are much more distinctly different than the random sample. --- # Outline - The tour algorithm: grand, guided, little, local, manual - R packages: tourr, geozoo, spinifex - Philosophy: model in the data space - Applications: multiple time series, cluster analysis, exploring posterior distributions --- # Tour algorithm <img src="img/tour_path.png" width="80%"> --- .left-column[ # Tours ### Definition ] .right-column[ A .red[grand tour] is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to any low-dimensional projection; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces. <img src="img/bilby.jpeg" width="75%"> ] --- Allen Morris [A 3D object 2D shadows](https://www.youtube.com/watch?v=aetj-Q4FuWY) <iframe width="560" height="315" src="https://www.youtube.com/embed/aetj-Q4FuWY" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> --- .left-column[ # Tours ### Definition ### Notation ] .right-column[ - `\({\mathbf x}_i \in \Re^p\)`, `\(i^{th}\)` data vector - `\(d\)` projection dimension - `\(F\)` is a `\(p\times d\)` orthonormal frame, `\(F'F=I_d\)` - The projection of `\({\mathbf x}\)` onto `\(F\)` is `\({\mathbf y}_i=F'{\mathbf x}_i\)`. - Paths of projections are given by *continuous one-parameter* families `\(F(t)\)`, where `\(t\in [a, z]\)`. Starting and target frame denoted as `\(F_a = F(a), F_z=F(t)\)`. - The animation of the projected data is given by a path `\({\mathbf y}_i(t)=F'(t){\mathbf x}_i\)`. ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ] .right-column[ - Given a starting frame `\(F_a\)`, create a new target frame `\(F_z\)`. - Initialize interpolation. Generate planar rotations, `\(R({\mathbf \tau}) = R_m(\tau_m)...R_1(\tau_1), ~~~ {\mathbf \tau}=(\tau_1, ..., \tau_m)\)` such that `\(F_z=R({\mathbf \tau})F_a\)`. - Execute interpolation - `\(t \leftarrow min(1, t)\)` - `\(F(t)=R({\mathbf \tau}t)F_a\)` (gives frame) - `\({\mathbf y}_i(t)=F(t)'{\mathbf x}_i\)` - If `\(t=1\)` break iteration, else `\(t\leftarrow t+\delta\)` - Set `\(F_a=F_z\)`, start again ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ] .right-column[  Rotation out of the projection frame, is defined by the principal basis in `\(F_a\)` and `\(F_z\)`, defining the shortest distance between the planes, computed using singular value decomposition of `\(F_a'F_z=V_a\Lambda V_z'\)`, `$$G_a=F_aV_a~~~,~~~ G_z=F_zV_z$$`. ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ### Choosing targets ] .right-column[ - *Grand:* Randomly choose target - *Little:* Basis of `\(d\)` of the `\(p\)` variables - *Local:* Randomly within a small radius - *Guided:* Define structure of interest in projection, and optimise function - *Manual:* Control the contribution of a single variable, and move along this axis ] --- .left-column[ # Tours ### Definition ### Notation ### Algorithm ### Avoiding whip-spin ### Choosing targets ### PP Guidance ] .right-column[ - *Holes:* finds projections with hollow centres `\(I(F)= \frac{1-\frac{1}{n}\sum_{i=1}^{n}\exp(-\frac{1}{2}{\mathbf y}_i{\mathbf y}_i')}{1-\exp(-\frac{p}{2})}\)` - *LDA:* finds separations between classes, classically `\(I(F) = 1- \frac{|F'WF|}{|F'(W+B)F|}\)`, where `\(B=\sum_{i=1}^gn_i(\bar{{\mathbf y}}_{i.}-\bar{{\mathbf y}}_{..})(\bar{{\mathbf y}}_{i.}-\bar{{\mathbf y}}_{..})'\)` , `\(W=\sum_{i=1}^g\sum_{j=1}^{n_i}({\mathbf y}_{ij}-\bar{{\mathbf y}}_{i.})({\mathbf y}_{ij}-\bar{{\mathbf y}}_{i.})'\)` - *PDA:* finds separations between classes, when there are many variables and few points `\(I(F, \lambda) = 1-\frac{|F'((1-\lambda)W+n\lambda I_p)F|}{|F'((1-\lambda)(B+W)+n\lambda I_p)F|}\)` ] --- .left-column[ # Packages <img src="img/tour_path.png" width="300px"> ] .right-column[ - Visualisation of high-dimensions using tours: the [tourr](https://cran.r-project.org/web/packages/tourr/index.html) package - *Grand:* Randomly choose target - *Little:* Basis of *d* of the *p* variables - *Local:* Randomly within a small radius - *Guided:* Define structure of interest in projection, and optimise function - *Manual:* Control the contribution of a single variable, and move along this axis (coming soon in the R package `spinifex`) - A library of high-dimensional shapes: the [geozoo](https://cran.r-project.org/web/packages/geozoo/index.html) package, and paper [Escape from Boxland](https://journal.r-project.org/archive/2016/RJ-2016-044/index.html) ] --- # Philosophy - It is common to show the data in the model space, for example, predicted vs observed plots for regression, linear discriminant plots, and principal components. - By displaying the model in the high-d data space, rather than low-d summaries of the data produced by the model, we expect to better understand the fit. .footnote[Wickham et al (2015) Visualizing statistical models: Removing the blindfold, SAM] --- .left-column[ # Example: Last 4 months of currency USD cross-rates ] .right-column[ - Data extracted from http://openexchangerates.org/api/historical - R packages `jsonlite`, processed with `tidyverse`, `lubridate` - We are going to cluster the currencies <img src="figure/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> ] --- # Hierarchical clustering Dendrogram: .red[data] in the .red[model space] .pull-left[ Ward's linkage <img src="figure/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ Average linkage <img src="figure/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> ] --- Tour: .red[Model] in the .red[data space] .pull-left[ <iframe src="cluster_ward.html" width="400" height="400" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ] .pull-right[ <iframe src="cluster_average.html" width="400" height="400" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ] --- # Clusters of currencies Ward's linkage .pull-left[
] .pull-right[