# Week 20: The Chi-Square Test of Independence

Hello, people! Today we’re going to talk about another chi-square test: the chi-square test of independence!

When Would You Use It?
The chi-square test of independence is a nonparametric test used to determine if the two variables represented in a contingency table are independent of one another.

What Type of Data?
The chi-square test of independence requires categorical or nominal data.

Test Assumptions

• The data represent a random sample of independent observations.
• The expected frequency of each cell in the contingency table is at least 5.

Test Process
Step 1: Formulate the null and alternative hypotheses. The data appropriate for this type of test is usually summarized in an r x c table, where r is the number of rows of the table and c is the number of columns of the table (see the example below to get a better understanding of this). The null hypothesis claims that the in the population from which the sample was drawn, the observed frequency of each cell in the table is equal to the respective expected frequencies of each cell in the table. The alternative hypothesis claims that for at least one cell, the observed and expected frequencies are different.

Step 2: Compute the test statistic. The test statistic here, unsurprisingly, a chi-square value. To compute this value, use the following equation:

Eij, the expected cell count for the ijth cell, is calculated as follows:

Step 3: Obtain the critical value. The critical value can be obtained using a chi-square table (such as this one here). Find the column corresponding to your specified alpha-level, then find the row corresponding to your degrees of freedom. The degrees of freedom is calculated as df = (r – 1)(c – 1), where r is the number of rows in the table and c is the number of columns in the table. Compare your obtained chi-square value to the value at the intersection of your selected alpha-level and degrees of freedom.

Step 4: Determine the conclusion. If your test statistic is equal to or greater than the table value, reject the null hypothesis. If your test statistic is smaller than the table value, fail to reject the null (that is, claim that the observed cell frequencies match those of the expected cell frequencies).

Example
The example I’ll use today involves looking at some Nobel Prize data. Specifically, I want to see if the category of Nobel Prize (chemistry, physics, etc.) is independent of gender. The data come from here. The sample size I used was n = 761; I omitted organizations who had won the award and just looked at individuals. I also chose to omit the “Economics” category, as that had been the most recently added and did not have a lot of observations for either gender yet. Set α = 0.05.

H0: Nobel Prize category is independent of gender
Ha: Nobel Prize category is not independent of gender

Observed counts are in the following table:

The expected cell counts, as calculated by the Eij formula above, are displayed in the following table:

Calculating the chi-square value gives us:

The degrees of freedom for this test is df = (5 – 1)(2 – 1) = 4, which gives us a critical chi-square value of 9.488 by the table. Since our calculated chi-square value, 32.894, is larger than the table value, this suggests that we reject the null and claim that prize category and gender are not independent.