Continuous NHANES Web Tutorial: Variance Estimation: Degress of Freedom and Confidence Limits

Key Concepts About Degrees of Freedom for Performing Statistical Tests and Calculating Confidence Limits

Degrees of Freedom and NHANES Subgroups

Estimates are often calculated for various subgroups of interest within the total NHANES population. When the number of first stage sampling units (PSUs) is small, the z-statistic should be replaced by a value from a t-distribution when computing confidence limits for these estimates (see SUDAAN 1995 — ref from NHANES III analytic guidelines).

To calculate the correct value for the t-statistic from a t-distribution and a selected level of significance, you must calculate the proper degrees of freedom for the estimate .

In addition, it is important to examine the number of degrees of freedom from which a standard error estimate is based. Continuing research on issues related to stability of variance estimates in subdomains of NHANES have been published and show that standard error estimates based on small numbers of paired PSUs (i.e., degrees of freedom) are prone to instability.

The reliability of the estimated standard error, as measured by its relative standard error (i.e., (standard error of the standard error of the estimate/standard error of the estimate)*100), is inversely proportional to its degrees of freedom. As the number of degrees of freedom increases, the relative standard error decreases and the reliability of the estimate increases. The NHANES guidelines recommended a relative standard error of at most 30%. This corresponds to at least 12 degrees of freedom.

Degrees of freedom are properly calculated by subtracting the number of clusters in the first level of sampling (strata) from the number of clusters in the second level of sampling (PSUs) for each subgroup you are analyzing as shown the in equation below.

Equation for Degrees of Freedom

degress of freedom equals number of PSUs minus number of strata

Differences in Degrees of Freedom for Subgroups in SUDAAN and SAS Survey Procedures

For both SUDAAN and SAS Survey procedures, the degrees of freedom are calculated in the same way when looking at the entire sample population or in subgroups where all strata and PSUs are represented.

However, when you analyze data on a subgroup of sample persons who may not be represented in all strata and PSUs (e.g., Mexican Americans), the degrees of freedom provided in the output may differ. For example, SUDAAN will correctly count the number of PSU's and strata with at least one valid observation for each cell of the table being requested. In contrast, SAS 9.1 Survey procedures, such as proc surveymeans, compute the degrees of freedom as the number of clusters (PSUs) in the non-empty strata minus the number of non-empty strata. This means that if your data have empty strata (no persons in the population for either PSU) the number of degrees of freedom will increase. This is incorrect and SAS is currently working on correcting this problem. For more information on methods of correctly calculating degrees of freedom using SAS 9.1 Survey procedures, please see the following two SAS 9.1 Survey procedures macros.

%SMSUB macro provides additional capabilities for SAS 9.1 proc surveymeans

http://support.sas.com/ctx/samples/index.jsp?sid=541

PURPOSE:

Provides additional subgroup capabilities beyond those provided by the domain statement in proc surveymeans. This includes:

presenting subgroup and overall estimates in one table (TABLES=),

computing ratio estimates for subgroups (RATIO=),

computing contrasts for means, totals, and ratios (CONTRAST=),

restricting table requests to a subpopulation (SUBPOP=), and

incorporating missing values into the variance computations.

%SREGSUB macro provides additional capabilities for SAS 9.1 proc surveyreg

http://support.sas.com/ctx/samples/index.jsp?sid=483

PURPOSE:

Provides linear regression capabilities currently not available in proc surveyreg. This includes:

restricting the regression analysis to a subpopulation (SUBPOP= ), and

incorporating missing values into the variance computations

NOMCAR option provides additional capabilities in SAS 9.2

NOMCAR requests that the procedure treat missing values in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYREG computes variance estimates by analyzing the nonmissing values as a domain or subpopulation, where the entire population includesboth nonmissing and missing domains. See the section Missing Values for more details.

By default, PROC SURVEYREG completely excludes an observation from analysis if that observation has a missing value, unless you specify the MISSING option. Note that the NOMCAR option has no effect on a classification variable when you specify the MISSING option, which treats missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you request with the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option.

VADJUST option on the model statement provides aditional capabilities in SAS 9.2

VADJUST=DF | NONE specifies whether to use degrees of freedom adjustment in the computation of the matrix for the variance estimation. If you do not specify the VADJUST= option, by default, PROC SURVEYREG uses the degrees-of-freedom adjustment that is equivalent to the VARADJ=DF option. If you do not want to use this variance adjustment, you can specify the VADJUST=NONE option.

SOURCE: SAS 9.2 Documentation SAS/STAT(R) 9.2 User's Guide

Both SAS Survey procedures (proc surveymeans) and SUDAAN version 9.1 (proc descript) produce 95% confidence intervals (CI). These 95% CIs are calculated using the Wald method, which is based on a t-statistic for the number of degrees of freedom in the entire NHANES sample. However, they do not correct for the reduction in the degrees of freedom in subdomains where not all strata and PSUs are represented. Details on how to correctly produce 95% confidence intervals (CI) will be discussed in the next task, How to Perform Statistical Tests and Calculate Confidence Limits with Degrees of Freedom.