+ - 0:00:00
Notes for current slide
Notes for next slide

Visualising high-dimensional spaces with application to particle physics models



Di Cook

Monash University

bit.ly/IMS-Singapore-Cook



Feb 13, 2018

1 / 34

Outline

  • Visualisation of high-dimensions using tours: the tourr package
  • A library of high-dimensional shapes: the geozoo package
  • Model in the data-space vs data in the model-space
  • Physics data visualisation
2 / 34

Tours

Definition

A grand tour is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to any low-dimensional projection; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces.

3 / 34

Tours

Definition

4 / 34

Tours

Definition

Notation

  • xip, ith data vector
  • d projection dimension
  • F is a p×d orthonormal frame, FF=Id
  • The projection of x onto F is yi=Fxi.
  • Paths of projections are given by continuous one-parameter families F(t), where t[a,z]. Starting and target frame denoted as Fa=F(a),Fz=F(t).
  • The animation of the projected data is given by a path yi(t)=F(t)xi.
5 / 34

Tours

Definition

Notation

Algorithm

  • Given a starting frame Fa, create a new target frame Fz.
  • Initialize interpolation. Generate planar rotations, R(τ)=Rm(τm)...R1(τ1),   τ=(τ1,...,τm) such that Fz=R(τ)Fa.
  • Execute interpolation
    • tmin(1,t)
    • F(t)=R(τt)Fa (gives frame)
    • yi(t)=F(t)xi
    • If t=1 break iteration, else tt+δ
  • Set Fa=Fz, start again
6 / 34

Tours

Definition

Notation

Algorithm

Avoiding whip-spin

Rotation out of the projection frame, is defined by the principal basis in Fa and Fz, defining the shortest distance between the planes, computed using singular value decomposition of FaFz=VaΛVz,

Ga=FaVa   ,   Gz=FzVz.

7 / 34

Tours

Definition

Notation

Algorithm

Avoiding whip-spin

Choosing targets

  • Grand: Randomly choose target
  • Little: Basis of d of the p variables
  • Local: Randomly within a small radius
  • Guided: Define structure of interest in projection, and optimise function
  • Manual: Control the contribution of a single variable, and move along this axis
8 / 34

Tours

Definition

Notation

Algorithm

Avoiding whip-spin

Choosing targets

PP Guidance

  • Holes: finds projections with hollow centres I(F)=11ni=1nexp(12yiyi)1exp(p2)
  • LDA: finds separations between classes, classically I(F)=1|FWF||F(W+B)F|, where B=i=1gni(y¯i.y¯..)(y¯i.y¯..) , W=i=1gj=1ni(yijy¯i.)(yijy¯i.)
  • PDA: finds separations between classes, when there are many variables and few points I(F,λ)=1|F((1λ)W+nλIp)F||F((1λ)(B+W)+nλIp)F|
9 / 34

Tours

Definition

Notation

Algorithm

Avoiding whip-spin

Choosing targets

PP Guidance

R package: tourr

  • implements all of the tours except for manual
  • display projection dimension d=1,...,p, using density plots, scatterplots, parallel coordinates, stereo 3D, scatterplot matrix, chernoff faces, stars, andrews curves, and images
  • guided tour using projection pursuit indices: holes, cmass, lda, pda
  • possible to generate a path and play it back
10 / 34

Tourr package

Self-reflection: Example path of a tour for d=1,p=3, and d=1,p=4.

animate(s3_tp,
grand_tour(),
display_xy(
axes = "bottomleft",
col=col, pch=pch,
edges=edges))
11 / 34

Tourr package

Different types of displays, and projection dimension.

animate_dist(flea[, 1:6])
animate_scatmat(flea[, 1:6], grand_tour(6))
animate_pcp(flea[, 1:6], grand_tour(3))
animate_faces(flea[sort(sample(1:74, 4)), 1:6], grand_tour(4))
animate_stars(flea[sort(sample(1:74, 16)), 1:6], grand_tour(5))
12 / 34

Tourr package

Guided tour, LDA index.

animate_xy(flea[, 1:6],
guided_tour(
lda_pp(flea$species)),
sphere = TRUE,
col=col,
axes = "bottomleft")
13 / 34

Library

Overview

The geozoo package is a library of high-dimensional shapes, and code to generate them. This includes cubes, spheres, simplices, mobius strips, torii, boy surface, enneper surface, dini surface, klein bottles, cones, ...

Web site: http://schloerke.com/geozoo/all/

14 / 34

Library

Overview

Cubes

c3 <- cube.iterate(p = 3)
animate(c3$points, grand_tour(),
display_xy(axes = "bottomleft",
edges=c3$edges))
c5 <- cube.iterate(p = 5)
animate(c5$points, grand_tour(),
display_xy(axes = "bottomleft",
edges=c5$edges))
c5_face <- cube.face(p = 5)
animate(c5_face$points, grand_tour(),
display_xy(axes = "bottomleft",
edges=c5_face$edges))

15 / 34

Library

Overview

Cubes

Spheres

s4h <- sphere.hollow(p = 4, n = 4 * 500)
colnames(s4h$points) <- paste0("V", 1:4)
animate(s4h$points, grand_tour(),
display_xy(axes = "bottomleft"))
s4s <- sphere.solid.random(p = 4, n = 4 * 500)
colnames(s4s$points) <- paste0("V", 1:4)
animate(s4s$points, grand_tour(),
display_xy(axes = "bottomleft"))

16 / 34

Library

Overview

Cubes

Spheres

Simplices

sp3 <- simplex(p = 3)
colnames(sp3$points) <- paste0("V", 1:3)
sp3$edges <- as.matrix(sp3$edges)
animate(sp3$points, grand_tour(),
display_xy(axes = "bottomleft", edges=sp3$edges))
sp5 <- simplex(p = 5)
colnames(sp5$points) <- paste0("V", 1:5)
sp5$edges <- as.matrix(sp5$edges)
animate(sp5$points, grand_tour(),
display_xy(axes = "bottomleft", edges=sp5$edges))

17 / 34

Library

Overview

Cubes

Spheres

Simplices

Why simplices

olive_rf <- randomForest(area~.,
data=olive_sub)
votes <- f_composition(olive_rf$votes)
animate(votes[,-4], grand_tour(),
display_xy(axes = "bottomleft",
col=col, edges=sp3$edges))

18 / 34

Any requests? What would you like to look at? A torus, a klein, ... ?

19 / 34

Library

Overview

Cubes

Spheres

Simplices

Why simplices

Generation

  • Cube:
    • vertices: vectors of length p, with all combinations of 0,1
    • edges: connect all the vertices of length 1 apart
  • Sphere hollow: xiNp(0,Ip), xi||xi||

Schloerke et al (2016) "Escape from Boxland" The R Journal

20 / 34

Model in the data space

  • It is common to show the data in the model space, for example, predicted vs observed plots for regression, linear discriminant plots, and principal components.
  • By displaying the model in the high-d data space, rather than low-d summaries of the data produced by the model, we expect to better understand the fit.

Wickham et al (2015) Visualizing statistical models: Removing the blindfold, SAM

21 / 34

Hierarchical clustering

Dendrogram: data in the model space

22 / 34

Hierarchical clustering

Model in the data space

23 / 34

Multidimensional physics

  • Need to interpret and compare models with multiple parameters
  • Predictions vs measurements, theorist A model vs theorist B model
  • The average theorist resorts to dropping all but 1 or 2 parameters (variables)
  • Potentially misses multivariate associations and differences
24 / 34

Higgs boson

  • Data from kaggle challenge.
  • Two parameter view of physicists, and multiparameter view in the tour.

25 / 34

How dark matter interacts

26 / 34

Scagnostics shape differences

27 / 34

28 / 34

29 / 34

Discreteness of scagnostics

30 / 34

Summary

  • The tourr package is available for you to look beyond 2D
  • High-dimensional shapes, how they are defined, what they look like, how they differ is interesting
  • Think about ways to look at the model in the data space
  • Challenge: new ideas for defining shape differences
31 / 34

Joint work!

  • Tours: Andreas Buja, Debby Swayne, Heike Hofmann, Hadley Wickham
  • Library of high-d shapes: Barret Schloerke
  • Physics application: Ursula Laa, Michael Kipp, German Valencia

Contact: dicook@monash.edu, visnut, dicook

Slides made with Rmarkdown, xaringan package by Yihui Xie, and lorikeet theme using the ochRe package. Available at https://github.com/dicook/IMS-Singapore-talk

32 / 34

Further reading

33 / 34

Outline

  • Visualisation of high-dimensions using tours: the tourr package
  • A library of high-dimensional shapes: the geozoo package
  • Model in the data-space vs data in the model-space
  • Physics data visualisation
2 / 34
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow