My caption 😄

Human vs computer: when visualising data, who wins?


Can computers relieve data analysts of the arduous task of graphically diagnosing models? Computer vision has come a long way in recent years. It primarily addresses reading and analysing images, and the models have advanced to the state where they can be used to automatically inspect quality of items emerging along production lines, identifying objects in photos, and even navigating an autonomous vehicle. Despite the fact that visualisation plays a weighty role in data analysis, for both exploration and model diagnosis, the use of, and interpretation of graphics by data scientists/statisticians is subjective. Data scientists/statisticians rely almost entirely on their own judgement, years of experience, and an implicit calculation of uncertainty, when interpreting graphics. Considering data plots as a type of statistic, allows data analysts to move away from subjectivity, towards an inferential approach to reading data plots. Defining data plots as statistics is made explicit by the tidyverse and the grammar of graphics, and in conjunction with a null generating mechanism, data plots can be measured against null plots. In this visual inference context, we are also better posed to build computer vision models to automatically read data plots. The null generating mechanism provides the framework to create a large volume null plots upon which to train a computer vision model. This talk will discuss these ideas, along with our results from comparing the results from a database of human evaluated residual plots, with the performance of a computer vision model for the task. Who do you think wins? This is joint work with Shuofan Zhang, and builds on joint work with Heike Hofmann, Mahbub Majumder, Andreas Buja, Hadley Wickham, Deborah Swayne, Eun-kyung Lee, Niladri Roy Chowdhury, Lendie Follett, Susan Vanderplas, Adam Loy, Yifan Zhao, Nathaniel Tomasetti.

Stats Society of Victoria
Melbourne, Australia