Last week we did a test for population skew, which represents the third moment about the mean. Now we’re going to move onto the fourth moment by doing a single-sample test to evaluate population kurtosis!
When Would You Use It?
The test of population kurtosis test is a parametric test used in a single sample situation to assess if a sample originates from a population that is mesokurtic (as opposed to leptokurtic or platykurtic).
What Type of Data?
The test for kurtosis requires interval or ratio data.
Step 1: Formulate the null and alternative hypotheses. The null hypothesis claims that the kurtosis parameter γ2 in the population is equal to 0, which corresponds to a mesokurtic distribution; the alternative hypothesis claims otherwise (the population kurtosis parameter is greater than, less than, or not equal to the value specified in the null hypothesis, suggesting a leptokurtic or platykurtic distribution).
Step 2: Compute the test statistic value, a z-score. The test statistic requires several calculations to be obtained. The calculations are as follows:
Step 3: Obtain the p-value associated with the calculated chi-square. The p-value indicates the probability of observing a test statistic as extreme or more extreme than the observed test statistic, under the assumption that the null hypothesis is true.
Step 4: Determine the conclusion. If the p-value is larger than the prespecified α-level, fail to reject the null hypothesis (that is, retain the claim that the population distribution is mesokurtic). If the p-value is smaller than the prespecified α-level, reject the null hypothesis in favor of the alternative.
The data for this example come from my n = 365 song downloads from 2010. I want to create a hypothesis test regarding the kurtosis of the distribution of song lengths (in seconds).
H0: γ2 = 0
Ha: γ2 0
Set α = 0.05.
Since our p-value is basically zero, it is smaller than our alpha-level, and we reject H0 and claim that the population is not mesokurtic (γ2 ≠ 0).
Example in R
dat = read.table('clipboard',header=T) #'dat' is the name of the imported raw data n = length(dat) xbar = mean(dat) k = ((((sum((dat-xbar)^4))*n*(n+1))/(n-1))-(3*((sum((dat-xbar)^2))^2)))/((n-2)*(n-3)) s = sqrt(((sum(dat^2))-(((sum(dat))^2)/n))/(n-1)) g2 = k/(s^4) A = (24*n*(n-2)*(n-3))/(((n+1)^2)*(n+3)*(n+5)) B = (g2*(n-2)*(n-3))/((n-1)*(n+1)*sqrt(A)) C = ((6*((n^2)-(5*n)+2))/((n+7)*(n+9)))*sqrt((6*(n+3)*(n+5))/(n*(n-2)*(n-3))) D = 6+(8/C)*((2/C)+sqrt(1+(4/(C^2)))) E = (1-(2/D))/(1+(B*sqrt(2/(D-4)))) z = (1-(2/(9*D))-((E)^(1/3)))/(sqrt(2/(9*D))) #test statistic pval = (1-pnorm(z))*2 #p-value