Download your data You can get access to your own electricity and gas usage data from https://www.citipower.com.au/our-services/myenergy. You will need a copy of your power bill, which has your smart meter number and meter id, to register for an account.
Reading the data The data structure is described here.
The data is not especially nicely formatted (surprise). The main components are:
The time resolution is half-hourly. And values for each day are spread across the columns.
In this assignment, the focus was to practice data cleaning. Students suggested questions to build a class survey, to get to know the interests of other class members, and then completed the composed survey. After cleaning the data, a few summary plots of interesting aspects of the data were made. There are some common mistakes that rookies often make when constructing data plots: packing too much into a single graphic, leaving categorical variables unordered, reversing norms for response and explanatory variables, conditioning in wrong order, plotting counts when proportions should be the focus, not normalizing by counts, using a boxplot for small sample size.
In the days since the death of Justice Antonin Scalia, there has been a lot of discussion on what is going to happen now - whether President Obama should or should not nominate a candidate to fill the vacancy in the supreme court. As I write this, FoxNews reports that Americans are almost 2:1 in favor of a nomination by President Obama, politifact has rated the claim from the Republican rumour mill of an `80 year old tradition to not nominate a supreme court candidate during an election’ as half right (which could also be read as half wrong, just to indicate my side of things).
I’m sitting watching cricket tonight, the first day of the Australia vs West Indies Boxing Day test. Just now video of retired batsman Chris Rogers being honored was played, along with a plot of his batting record, shown on screen similar to this one below:
Howzat? What are they trying to show? What’s the data in this plot? Is it a bar chart? A histogram? What does color mean?
This week I have been visiting the Department of Statistical Sciences at Cornell University. This is the home of many venerable statisticians. At first sight it appears that statisticians are spread all over the university, and technically they are because funding comes from many directions, but almost all are actually located in a suite in Comstock Hall. Professor Paul Velleman is one of the pioneers of data-centrist thinking about statistics. He produced the software called DataDesk in the early 90s that some saw as rivaling LispStat and particularly JMP for introductory statistics classes.
This week I have been visiting the new Center for Statistics and Applications in Forensic Evidence. The center involves four universities, CMU, ISU, UC-Irvine, U. Virginia, and is a NIST Center of Excellence. The kickoff event occurred over Oct 26-27 at ISU, organized by Center Director, Professor Alicia Carriquiry. The speaker list included Barry Scheck (Co-Founder, The Innocence Project), Jo Handelsman (The White House Office of Science & Technology Policy), Philip Dawid (Emeritus Professor of Statistics, University of Cambridge), Anil Jain (Michigan State University) and Stephen Feinberg (CMU).
Its exciting to report on the graduations from the working group this year.
Niladri Roy Chowdhury defended his PhD thesis in Aug 2014, titled “Explorations of the lineup protocol for visual inference: application to high dimension, low sample size problems and metrics to assess the quality”, under my direction. He is a scientist at Novartis, Boston, MA. Susan Vanderplas defended her PhD in May, titled “Perception in Statistical Graphics”, under the direction of Professor Heike Hofmann.
On Nov 10 I was part of a celebration of John W. Tukey at the United Nations. This event kicked off a new UN initiative called Unite Ideas. Details of the event, and the initiative can be found here. There were five talks relayed live to an audience of several thousand, using google hangouts and a youtube channel, and listeners could post questions using the Q/A tool.
My talk was titled “An Exploratory Data Analysis of OECD’s 2012 PISA Survey”s and I delivered it by computer from my office in Iowa.
The new version of nullabor contains numerical measures that quantify how close the plot of the data is to the null plots in a lineup. It is very difficult to quantify all patterns that might be read from plots, so these should be taken in a spirit ofa Herculean task. The goal is to get some sense of what people are reacting to in a plot, which could be then associated with the text descriptions from people, or with data from an eyetracker.