Today we’re going to talk about yet another measure of association: the biserial correlation coefficient!
When Would You Use It?
The biserial correlation coefficient is a parametric test used to determine, in the population, if the correlation between values on two variables some value other than zero. More specifically, it is used to determine if there is a significant linear relationship between the two variables.
What Type of Data?
The biserial correlation coefficient requires both variables to be interval or ratio data, but one of these variables to have been transformed into a dichotomous nominal or ordinal scale.
- The sample has been randomly selected from the population it represents.
- The underlying distributions for both the variables involved is assumed to be continuous and normal.
Step 1: Formulate the null and alternative hypotheses. The null hypothesis claims that in the population, the correlation between the scores on variable X and variable Y is equal to zero. The alternative hypothesis claims otherwise (that the correlation is less than, greater than, or simply not equal to zero).
Step 2: Compute the test statistic, a z-value. To do so, the actual correlation coefficient, rb, must be calculated first. This calculation is as follows:
The value h represents the ordinate of the point in the standard normal distribution that divides the proportions p0 and p1. To obtain h, first find the z-value that delineates the point on the normal curve for which the proportion of cases corresponding to the smaller of p0 and p1 falls above that point and the larger of the two proportions p0 and p1 falls below. This table lists the ordinates for specific z-scores.
To compute the z-statistic, the following equation is used:
Step 3: Obtain the p-value associated with the calculated z-score. The p-value indicates the probability of observing a correlation as extreme or more extreme than the observed sample correlation, under the assumption that the null hypothesis is true.
Step 4: Determine the conclusion. If the p-value is larger than the prespecified α-level, fail to reject the null hypothesis (that is, retain the claim that the correlation in the population is zero). If the p-value is smaller than the prespecified α-level, reject the null hypothesis in favor of the alternative.
Let’s look at the exam grades for one of the old STAT 213 classes. I want to see if there is a significant correlation between the average grade of students’ two midterm tests and whether or not they got a grade higher than a C+ on the final. I will code a grade higher than a C+ as 1 and a grade equal to or lower than a C+ as a 0. I suspect a positive correlation. Here, n = 107 and let α = 0.05.
H0: ρb = 0
Ha: ρb > 0
First, let’s find h. In the sample, p0 = 0.28 and p1 = 0.72. The z-score for which 0.72 of the distribution falls below and 0.28 of the distribution falls above is 0.58. The ordinate, h, of this value is 0.3372 according to the table. So,
Since our calculated p-value is larger than our α-level, we fail to reject H0 and conclude that the correlation in the population is not significantly greater than zero.