Key Concepts About Chi-Square Test

The chi-square test is used to test the independence of two variables cross classified in a two-way table. (A chi-square statistic with n degrees of freedom is based on a statistic equal to the sum of the squares of n independent normally distributed random variables with mean=0 and unit variance.)

For example, suppose we wished to test the hypothesis that blood pressure cuff size is independent of gender and that we have the following observed frequencies obtained as a result of the cross-classification of blood pressure cuff sizes and gender.

 

Blood pressure cuff size
1 2 3 4 Cumulative
Men 63 1387 2409 453  4312
Women 222 2065 2002 493  4782
Both genders 285 3452 4411 946  9094

In a simple random sample setting (unweighted data), the expected cell frequencies under the null hypothesis that blood pressure cuff size and gender are independent could be obtained by multiplying the marginal total for the jth column by the proportion of individuals in the ith row.

 

For example, the expected value of blood pressure cuff size 1 for men would be 285*(4312/9094)=135;  the expected value of blood pressure cuff size 4 for women would be 946*(4782/9094)=497.

 

Thus, if Oij   = the observed frequency of the ith row and jth column, where i=1,2, … i and j=1,2, … j and

Eij   = the expected frequency of the ith row and jth column

 Then the formula to test the null hypothesis of independence, using the chi-square statistic, would be


 

Equation to Test the Null Hypothesis
Equation to Test the Null Hypothesis (1)

 

This statistic has degrees of freedom equal to the number of rows minus 1, multiplied by the number of columns minus 1.

In a complex sample setting, you would use a statistic similar to equation (1) above, modified to account for survey design with degrees of freedom equal to the number of PSUs minus the number of strata containing observations. This statistic can be obtained through SAS proc surveyfreq (CHISQ, based on the Rao-Scott chi-square with an adjusted F statistic). The analogous procedure in SUDAAN version 9.0 (proc crosstab), provides limited chi-square statistics based on Wald chi-square and does not provide an F adjusted p-value. However, SUDAAN regression models do provide F adjusted chi-square statistics which are recommended for analyzing NHANES data.

The Cochran Mantel Haenzel Test, an extension of the Pearson Chi-Square, can be applied to stratified two-way tables to test for homogeneity or independence in a non-survey setting. For a complex sample its analogue can be obtained in SUDAAN proc crosstab (cmh).

 

References:

Agresti A. An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics. 1996. New York.

 

close window icon Close Window