Week 35: The Pearson Product-Moment Correlation Coefficient

Today we’re going to talk about our first measure of association: the Pearson product-moment correlation coefficient!

When Would You Use It?
The Pearson product-moment correlation coefficient is a parametric test used to determine, in the population, if the correlation between values on two variables some value other than zero. More specifically, it is used to determine if there is a significant linear relationship between the two variables.

What Type of Data?
Pearson product-moment correlation coefficient requires interval or ratio data.

Test Assumptions

  • The sample has been randomly selected from the population it represents.
  • The variables are interval or ratio in nature.
  • The two variables have a bivariate normal distribution.
  • The assumption of homoscedasticity is met.
  • The residuals are independent of one another.

Test Process
Step 1: Formulate the null and alternative hypotheses. The null hypothesis claims that in the population, the correlation between the scores on variable X and variable Y is equal to zero. The alternative hypothesis claims otherwise (that the correlation is less than, greater than, or simply not equal to zero.)

Step 2: Compute the test statistic, a t-value. To do so, the actual correlation coefficient, r, must be calculated first. This calculation is as follows:


To compute the t-statistic, the following equation is used:


Step 3: Obtain the p-value associated with the calculated t-score. The p-value indicates the probability of observing a correlation as extreme or more extreme than the observed sample correlation, under the assumption that the null hypothesis is true.

Step 4: Determine the conclusion. If the p-value is larger than the prespecified α-level, fail to reject the null hypothesis (that is, retain the claim that the correlation in the population is zero). If the p-value is smaller than the prespecified α-level, reject the null hypothesis in favor of the alternative.

I’m going to look at my music data again! I want to see if there is a significant correlation between the length of a song and the number of times I’ve played it. I suspect that I play longer songs less often than shorter ones (I just have a preference for slightly shorter songs, not sure why), so I’m going to guess that there’s a negative correlation. I took a sample of n = 100 songs and let α = 0.05.

H0: ρ = 0
Ha: ρ < 0



Since our calculated p-value is larger than our α-level, we fail to reject H0 and conclude that the correlation in the population is not significantly smaller than zero.

Example in R
x=read.table('clipboard', header=T)
cor.test(length, playcount, alternative = "less")
        Pearson's product-moment correlation
data:  length and playcount
t = -1.0232, df = 98, p-value = 0.1544
alternative hypothesis: true correlation is less than 0
95 percent confidence interval:
 -1.00000000  0.06374622
sample estimates:

What sayest thou? Speak!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: