Last week we did a test for population variance, which represents the second moment about the mean. Today we’re going to go one moment further and do a single-sample test to evaluate population skewness (which represents the third moment about the mean)!
When Would You Use It?
The test of population skewness test is a parametric test used in a single sample situation to determine if a sample originates from a population that is symmetrical (that is, not skewed).
What Type of Data?
The test for skewness requires interval or ratio data.
Step 1: Formulate the null and alternative hypotheses. The null hypothesis claims that the skewness parameter γ in the population is equal to 0, which corresponds to symmetry; the alternative hypothesis claims otherwise (the population skewness parameter is greater than, less than, or not equal to the value specified in the null hypothesis, suggesting there is some skew).
Step 2: Compute the test statistic value, a z-score. The test statistic requires several calculations to be obtained. The calculations are as follows:
Step 3: Obtain the p-value associated with the calculated chi-square. The p-value indicates the probability of observing a skew as extreme or more extreme than the observed sample skew, under the assumption that the null hypothesis is true.
Step 4: Determine the conclusion. If the p-value is larger than the prespecified α-level, fail to reject the null hypothesis (that is, retain the claim that there is symmetry (no skew) in the population). If the p-value is smaller than the prespecified α-level, reject the null hypothesis in favor of the alternative.
As in the last test, the data for this example come from my n = 365 song downloads from 2010. I want to create a hypothesis test regarding the skew of the distribution of song lengths (in seconds). Based on the following histogram, I’m going to say that this distribution has a right skew.
H0: γ = 0
Ha: γ > 0
Set α = 0.05.
Since our p-value is basically zero, it is smaller than our alpha-level, and we reject H0 and claim that the population is indeed positively skewed (γ > 0)
Example in R
dat = read.table('clipboard',header=T) #'dat' is the name of the imported raw data hist(dat) #creates histogram of data n = 365 m3 =(n*sum((dat-mean(dat))^3))/((n-1)*(n-2)) s3 = sqrt((sum(dat^2)-(((sum(dat))^2)/n))/(n-1)) g1 = m3/(s3)^3 b1 = ((n-2)*g1)/(sqrt(n*(n-1))) A = b1*sqrt(((n+1)*(n+3))/(6*(n-2))) B = (3*((n^2)+(27*n)-70)*(n+1)*(n+3))/((n-2)*(n+5)*(n+7)*(n+9)) C = sqrt(2*(B-1))-1 D = sqrt(C) E = 1/sqrt((log(D))) F = A/(sqrt(2/(C-1))) z = E*log(F+sqrt((F^2)+1)) #test statistic pval = (1-pnorm(z)) #p-value