Tag Archives: graphing

Graphin’ It

I think my favorite topic we covered in ASHA this past winter semester was the “good visualizations, bad visualizations” topic, in which I showed the students some examples of some really good visualizations of data/information as well as some really bad ones.

Visualization of information is, in my opinion, overlooked in introductory (and advanced) statistics courses. It’s important to not only understand the analyses and the data behind the analyses, but it’s just as important to be able to convey the meanings and interpretations of the data in ways that make said meanings/interpretations easy to understand. This can be done very effectively and efficiently with a good visualization.

And it can be done very misleadingly with a poor visualization.

I think I’m going to incorporate a little unit (maybe a lab or two?) focusing on the importance of good visualization next time I teach a higher-level stats course. Even if I just give similar examples to the ones I gave in ASHA (as well as, of course, general guidelines for different types of common visualizations), I can at least get students thinking about the visual presentation of information and how important it can be.

Anyway.

Geometry is for Squares

(Note: this has nothing to do with geometry.)

God I love stats.

A lot of the time it seems like the visual representation of data is sacrificed for the actual numerical analyses—be they summary statistics, ANOVAs, factor analyses, whatever. We seem to overlook the importance of “pretty pictures” when it comes to interpreting our data.

This is bad.

One of the first statisticians to recognize this issue and bring it into the spotlight was Francis Anscombe, an Englishman working in the early 1900s. Anscombe was especially interested in regression—particularly in the idea of how outliers can have a nasty effect on an overall regression analysis.

In fact, Anscombe was so interested in the idea of outliers and of differently-shaped data in general, he created what is known today as Anscombe’s quartet.

No, it’s not a vocal quartet who sings about stats (note to self: make this happen). It is in fact a set of four different datasets, each with the same mean, the same variance, the same correlation between the x and y points, and the same regression line equation.

From Wiki:

6666666

So what’s different between these datasets? Take a look at these plots:

55656

See how nutso crazy different all those datasets look? They all have the same freaking means/variances/correlations/regression lines.

If this doesn’t emphasize the importance of graphing your data, I don’t know what does.

I mean seriously. What if your x and y variables were “amount of reinforced carbon used in the space shuttle heat shield” and “maximum temperature the heat shield can withstand,” respectively Plots 2 and 3 would mean TOTALLY DIFFERENT THINGS for the amount of carbon that would work best.

So yeah. Graph your data, you spazmeisters.

GOD I LOVE STATS.