Are You There, God? It’s Me, Non-Normality


I’m here to talk to you today about nonparametric statistics. What are nonparametric statistics, you ask? Well, they’re a collection of statistical tests/procedures that we can use when data do not satisfy the assumptions that need to be met in order for conventional tests/procedures to be carried out. For example, suppose Test A requires that the data are normally distributed. You gather data of interest and find that they are not normally distributed. Thus, Test A should not be used because its results might be inaccurate or maybe even uninterpretable with these non-normal data. Instead, you must use a test for which normality is not a requirement. If such a test exists—say, maybe it’s Test B—then Test B may be considered a nonparametric test. It can be used in place of Test A if Test A’s assumptions are not met. Cool, huh?

Let’s look at a few examples, because I haven’t pressed any statistics on you guys in a while.

Example 1: Comparing Different Treatments
Scenario: You are a plant scientist. You have a specific type of plant and you want to see to what extent lighting affects this plant’s growth. You have three lighting conditions: sunlight, fluorescent light, and red light. Basically, you want to see if there is a significant difference in the amount of growth for plants grown in these three different conditions.

Parametric test: Analysis of variance (ANOVA) seems appropriate here; you can basically assess the differences in growth by comparing the mean growths for each of the three conditions.

Nonparametric test: For ANOVA to be accurate at all, the data need to be normally distributed. Suppose your data aren’t! What do you do instead? A Kruskal-Wallis Test! A Kruskal-Wallis is basically an ANOVA done on the ranked data rather than on the raw data and allows you to compare the groups without needing to meet the assumption of normality.

Example 2: Correlation
Correlation, as I’m sure you’re all aware, is a measure of association. Basically, if you’ve got variables X and Y, correlation is a measure of how much X changes in relation to the changes in Y (or vice-versa). A correlation of 1 suggests that there is a perfect increasing relationship; a correlation of -1 suggests that there is a perfect decreasing relationship.

Scenario: You have a bunch of measurements on two variables. Drug measures the amount of a medicine in a patient’s system. Response measures the amount of some disease marker in the same patient. You want to see if there’s a relationship between the amount of medicine in a patient’s system and the amount of the disease marker present in a patient’s system.

Parametric test: the “usual” correlation, the Pearson Product-Moment Correlation, seems appropriate.

Nonparametric test: The key with the “usual” measure of correlation is that it simply measures the degree of linear association between your variables. If you suspect, for whatever reason, that the relationship between Drug and Response is anything but linear, it’s a good idea to use the Spearman Rank Correlation Coefficient, which is sensitive to non-linear monotonic relationships between variables.

Here, I wanna give you an example of this last one, ‘cause it’s cool. Just as a dumb example, let your sample size be obscenely small (n = 10). Here are your data:

drug response
1     1
2    16
3    81
4   256
5   625
6  1296
7  2401
8  4096
9  6561
10 10000

Notice two things: first, if we plot these two variables, their relationship is clearly nonlinear.

Untitled

Second, you’ll notice that there is a perfect relationship between drug and response, it’s just not linear. Specifically, Response is just the corresponding Drug value raised to the 4th power, meaning that Response is a perfect monotonic function of Drug! We can easily calculate both Pearson’s and Spearman’s correlations here to see what they’ll say:

Pearson correlation: 0.882
Spearman correlation: 1

Spearman’s picks up on the perfect relationship, but Pearson’s does not! Why? Because it’s not a linear relationship! Pretty cool, huh?

THIS IS WHY YOU ALWAYS PLOT YOUR DATA, DAMMIT.

Side note: Pearson feuded with Spearman over his “adaptation” of Pearson’s beloved correlation coefficient and actually brought the issue in front of the Royal Society for consideration. Oh, you statisticians.

Advertisements

What sayest thou? Speak!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: