NHANES III Web Tutorial: Keep & Merge Datasets: Merging Datasets

WARNING

The master dataset prepared in the Keep & Merge Data Module contains all sample persons who have completed the household interview. In other words, the master dataset includes observations on participants who were interviewed only (but not MEC examined), plus those both interviewed and examined. The master dataset also includes both Youth and Adult Questionnaire datasets, even though you may only need the Adult Questionnaire variables for your analyses.

The reasons for including all sample persons are listed below:

For SUDAAN procedures, it is important that you do not create a smaller subgroup based on any non weight-related groups of interests (e.g. demographic, laboratory or examination variables) in the SAS data step before executing the SUDAAN procedure. Instead, it is highly recommended that you create a subset of your sample population using the subpopn statement in the SUDAAN procedure itself and not in the SAS data step. SUDAAN procedures require that all observations in the dataset being read into a procedure have the same sample weight.

For SAS Survey Procedures, there is no subpopn statement. Instead, most SAS Survey procedures use a domain statement for domain analysis, also known as subgroup analysis or subpopulation analysis.

One important reason not to pre-select your study population in the SAS Data step (e.g. males aged 25 and over) is that for software such as SUDAAN, you will have over-estimated the variance associated with any statistical tests you calculate because you will not have taken into account the full sample size of the survey. This may yield p-values which are greater than they should be.

It is worth pointing out that some analysts would select the study population at the SAS data step, and choose to save the SAS data file with only observations meeting the selection criteria for the study, for instance, only include those who have completed MEC exams. Besides the reasons stated above, the disadvantage of pre-selection is that you will no longer be able to examine the household interview items by using interview weights, such as looking at interview questions on blood pressure by demographic variables. Also you will not be able to examine non-response rates for the MEC items (i.e. the rate of those who are interviewed but not examined).

For these reasons, in this tutorial example you were instructed to include all sample persons in the dataset at the data step.

Key Concepts About Merging Data in NHANES III