NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees of Freedom: Analyze Subgroups

Task 2: Key Concepts about Analyzing Subgroups

Sometimes you may wish to analyze only a certain demographic subgroup of interest, such as a particular age range or gender, or only those survey participants who were tested for a particular diet-related lab analyte, such as serum carotenoids.

As a general rule, when working in any survey analysis software package, such as SUDAAN or SAS, the dataset used as input to all procedures should contain all individuals in the sample with non-missing or non-zero values of the appropriate sample weighting variable. That is, you should use the entire dataset (instead of creating smaller subset of the data) and then use coding statements to select the subpopulation of interest. Although estimates of descriptive statistics might be the same if you used a subset of the entire file, the estimated standard errors would not be appropriately calculated. This is particularly true if the subset is based on a characteristic measured in the survey. For example, it would not be appropriate to create a smaller data file comprised of only those who are diabetic or those who are hypertensive.

The only time that you can create separate datasets for smaller subgroups is when those subgroups are based on specific values of the variables used in constructing the sample weight (e.g., gender, race/ethnicity, age). It should be noted that if a smaller dataset is created based on these demographic characteristics, the standard errors may not differ greatly from the standard errors from the full dataset. However, as a general rule, the full data set should be used with the subgroups defined in the following manner:

In SUDAAN, it is safest to define a subset of your sample population using the SUBPOPN statement in the procedure itself.
In SAS, the SURVEYMEANS and SURVEYFREQ procedures have special syntax that can be used to conduct domain analyses. With other SAS survey procedures, special SAS-provided macros may be used to perform subgroup analyses, but these analyses are beyond the scope of this course. SAS does not use SUBPOPN statements that are used in SUDAAN.

Close Window to return to module page.