Q: What is Anscombe’s quartet?
A: Anscombe’s quartet is a set of four datasets, each with 11 observations on two variables x and y, that all have (nearly) identical descriptive statistics but appear very different when graphed. The datasets all have approximately the same means (for both X and Y), variances (for both X and Y), correlations, linear regression lines, and coefficients of determination. But they look like this when graphed:
The idea behind the quartet of datasets, developed by Francis Anscombe in 1973, was to demonstrate the importance of graphing/visualizing your data in addition to just looking at its summary values.
One thing that’s not known about these datasets is exactly how Anscombe created them. But a 2017 paper titled “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing” shows a method for creating differently “shaped” datasets that all have the same summary values. In this paper, the “Datasaurus Dozen” are produced: a set of 12 differently-shaped datasets (when graphed) that have the same summary stats. The paper talks about a method used to create these datasets.
It’s super cool and very interesting. Check it out here!