Key Concepts About Appending Data in NHANES

Since most of the environmental chemical data are subsamples of the total examined sample within specific age groups, the sample sizes for a single two-year cycle may be too small to produce statistically reliable estimates. The NHANES sample design makes it possible to combine data from multiple survey cycles to increase the sample size for an analysis. Increasing the sample size improves the statistical power, reliability, and stability of estimates for population sub-domains including race-ethnic groups.

The process of combining data for multiple survey cycles or years is called appending, similar to adding rows in a table.

Always check the contents of each data file before appending the data files because some environmental chemicals were not measured in every survey cycle. If the added or deleted variables are not relevant to your analysis, you can simply append the data files as described and use only variables of interest for your analysis. The extra variables will not affect your analysis if you do not include them in your dataset. 

 

Warning icon When extracting variables from an NHANES data file or appending NHANES data, you should always include the SEQN variable, which is the unique identifier for each participant in NHANES. Failing to include this variable in your dataset will lead to problems when you sort or merge your data files at a later time.

 

When appending the data it is convenient to create a new variable for the multi-year sample weight. This is done by summing rescaled versions of the existing weight variables, after combining data from different survey cycles. (For further detail see Module 11 in the Continuous NHANES Tutorial). Estimates with the new weight variable will be representative of the population at the midpoint of the combined survey period. The new weight variable simply rescales the values of the weight variables from the separate cycles so that the sum of the new weights matches the survey population size at the midpoint of that period.

When combining two-year data cycles, it is extremely important to:

  1. Verify that data items collected in all combined years are comparable in methods by checking the documentation in  the Laboratory Procedure Manuals, and
  2. Select the same type of sample weight from each cycle when constructing the new weight variable in the combined data set (e.g. subsample weight).  

After appending the data, you will need to check the results. Make sure that all your variables of interest were included and any renamed or recoded variables are corrected across all the years of data.

 

close window icon Close Window