Today we’re going to talk about variances. Specifically, the F test for two population variances!
When Would You Use It?
The F test for two population variances is a parametric test used to determine if two independent samples represent two populations with homogeneous (similar) variances.
What Type of Data?
The F test for two population variances requires interval or ratio data.
Step 1: Formulate the null and alternative hypotheses. The null hypothesis claims that the two population variances are equal. The alternative hypothesis claims otherwise (one population variance is greater than the other, less than the other, or that the variances are simply not equal).
Step 2: Compute the test statistic, an F value. The test statistic is computed as follows:
Step 3: Obtain the p-value associated with the calculated F value. The p-value indicates the probability of a difference in the two sample variances that is equal to or more extreme than the observed difference between the sample variances, under the assumption that the null hypothesis is true. The two degrees of freedom associated with the F value are df1 = n1-1 and df2 = n2-1, where n1 and n2 are the respective sample sizes of the first and second sample.
Step 4: Determine the conclusion. If the p-value is larger than the prespecified α-level, fail to reject the null hypothesis (that is, retain the claim that the population means are equal). If the p-value is smaller than the prespecified α-level, reject the null hypothesis in favor of the alternative.
The data for this example come from my walking data from 2013 and 2015. I want to see if there is a significant difference in the mileage variance for these two years (in other words, was I less consistent with the length of my walks in one year versus the other? Set α = 0.05.
H0: σ12 = σ22 (or σ12 – σ22 = 0)
Ha: σ12 ≠ σ22 (or σ12 – σ22 ≠ 0)
Since our p-value is smaller than our alpha-level, we reject H0 and claim that the population variances are significantly different (with evidence in favor of the variance being higher for 2015).
Example in R
year2013=read.table('clipboard', header=F) #data from 2013 year2015=read.table('clipboard', header=F) #data from 2015 s1=var(year2013) s2=var(year2015) df1=length(year2013)-1 df2=length(year2015)-1 F=s2/s1 #test statistic pval = (1-pf(F, df2, df1)) #p-value