Today we’re going to talk about our first measure of association: the Pearson product-moment correlation coefficient!
When Would You Use It?
The Pearson product-moment correlation coefficient is a parametric test used to determine, in the population, if the correlation between values on two variables some value other than zero. More specifically, it is used to determine if there is a significant linear relationship between the two variables.
What Type of Data?
Pearson product-moment correlation coefficient requires interval or ratio data.
- The sample has been randomly selected from the population it represents.
- The variables are interval or ratio in nature.
- The two variables have a bivariate normal distribution.
- The assumption of homoscedasticity is met.
- The residuals are independent of one another.
Step 1: Formulate the null and alternative hypotheses. The null hypothesis claims that in the population, the correlation between the scores on variable X and variable Y is equal to zero. The alternative hypothesis claims otherwise (that the correlation is less than, greater than, or simply not equal to zero.)
Step 2: Compute the test statistic, a t-value. To do so, the actual correlation coefficient, r, must be calculated first. This calculation is as follows:
To compute the t-statistic, the following equation is used:
Step 3: Obtain the p-value associated with the calculated t-score. The p-value indicates the probability of observing a correlation as extreme or more extreme than the observed sample correlation, under the assumption that the null hypothesis is true.
Step 4: Determine the conclusion. If the p-value is larger than the prespecified α-level, fail to reject the null hypothesis (that is, retain the claim that the correlation in the population is zero). If the p-value is smaller than the prespecified α-level, reject the null hypothesis in favor of the alternative.
I’m going to look at my music data again! I want to see if there is a significant correlation between the length of a song and the number of times I’ve played it. I suspect that I play longer songs less often than shorter ones (I just have a preference for slightly shorter songs, not sure why), so I’m going to guess that there’s a negative correlation. I took a sample of n = 100 songs and let α = 0.05.
H0: ρ = 0
Ha: ρ < 0
Since our calculated p-value is larger than our α-level, we fail to reject H0 and conclude that the correlation in the population is not significantly smaller than zero.
Example in R x=read.table('clipboard', header=T) attach(x) cor.test(length, playcount, alternative = "less") Pearson's product-moment correlation data: length and playcount t = -1.0232, df = 98, p-value = 0.1544 alternative hypothesis: true correlation is less than 0 95 percent confidence interval: -1.00000000 0.06374622 sample estimates: cor -0.102812