Task 3: Key Concepts about Using the Chi-Square Test

The chi-square test is used to test the independence of two variables cross classified in a two-way table. (A chi-square statistic with n degrees of freedom is based on a statistic equal to the sum of the squares of n independent normally distributed random variables with mean=0 and unit variance.)

For example, suppose we wished to test the hypothesis that osteoporosis treatment status is independent of gender and that we have the following observed frequencies:

 

Frequency of Osteoporosis Treatment Status
  Treated = Yes Treated = No Total
Males 30 2,241 2,271
Females 212 2,244 2,456
Total 242 4,485 4,727

In a simple random sample setting (unweighted data), the expected cell frequencies under the null hypothesis that osteoporosis treatment status and gender are independent could be obtained by multiplying the marginal total for the jth column by the proportion of individuals in the ith row.

For example, the expected number of males being treated for osteoporosis would be 242*(2,271/4,727)=116.3;  the expected value of not being treated for osteoporosis in females would be 4,485*(2,456/4,727)=2,330.3.

Thus, if Oij   = the expected frequency of the ith row and jth column, where i=1,2, … i and j=1,2, … j and

Eij   = the expected frequency of the ith row and jth column

 Then the formula to test the null hypothesis of independence, using the chi-square statistic, would be:

 

Equation to Test the Null Hypothesis

equation to test the null hypothesis

 

This statistic has degrees of freedom equal to the number of rows minus 1, multiplied by the number of columns minus 1.

In a complex sample setting, you would use a statistic similar to the equation above, modified to account for survey design with degrees of freedom equal to the number of PSUs minus the number of strata containing observations. There are several different ways to calculate this statistic using SAS and SUDAAN.  In SAS, the surveyfreq procedure is used, which is based on the Rao-Scott chi-square with an adjusted F statistic).  In SUDAAN, the proc crosstab procedure is used, which provides limited chi-square statistics based on Wald chi-square and does not provide an F adjusted p-value.

The Cochran Mantel Haenzel Test, an extension of the Pearson Chi-Square, can be applied to stratified two-way tables to test for homogeneity or independence in a non-survey setting. For a complex sample its analogue can be obtained in SUDAAN proc crosstab. 

 

close window icon Close Window to return to module page.