Task 2: How to Append Data in NHANES

Warning iconThe steps below assume that you are already familiar with the SAS code used to append NHANES datasets. If you need more detailed instructions, please review the Append & Merge Datasets module in the Continuous NHANES Web Tutorial before continuing.

Step 1: Compare Variable Names and Labels

The first step before appending data is to examine the contents of the data files. Using the PROC CONTENTS procedure, you will have a list of variables and their attributes for each data file selected. While reviewing the output of the PROC CONTENTS procedure, compare variable names and labels to see whether any changes or differences occurred from cycle to cycle. 

Phthlate Data (1999–2004):

The example below uses the sample "Phthalate" program. Notice that the SAS labels for variable URXMHP are “Mono-(2-ethyl)-hexyl phthalate (ng/mL)” in 1999–2000 and 2001–2002, and “Mono-(2-ethyl)-hexyl phthalate” in 2003–2004. By checking the Documentation for the 2003–2004 survey, you will know that the unit of measure for this metabolite is also ng/mL in 2003–2004, so you can append the data directly.

Note: It is important to check whether the variable names and labels are consistent between datasets before appending.

 

Program to Check Data Contents & Compare Variable Names and Labels

proc contents data =DEMOPHT_A varnum ;

proc contents data =DEMOPHT_B varnum ;

proc contents data =DEMOPHT_C varnum ;

run ;

 

Additional Resources

 

Step 2: Append Directly, if Variables are Identical

After carefully reviewing, you will find that the variables of interest in the three survey cycles remained the same. Therefore, you can directly append without any further changes.

 

Warning icon When appending NHANES data you should always include the sequence number (SEQN). Failing to do so will lead to problems if you want to sort or merge your data files at a later time.

 

No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure was completed successfully. Additionally, you can use SAS Explorer to see that the new six-year dataset (Phthalate) is in your WORK library, which is the default temporary library created for each SAS session.  This library is deleted when the SAS session is complete. (To find out how to save the dataset to a SAS-accessible library, see the Save a Dataset module of the NHANES Dietary Tutorial.)

 

Program to Directly Append Datasets

data Phthalate;

      set DEMOPHT_A

          DEMOPHT_B

          DEMOPHT_B;

run ;

 

Additional Resources

 

 

Step 3: Construct Subsample Weights for Data Obtained by Combining Survey Cycles

In general, when combining multiple survey cycles, the basic sample weight variable for each cycle should be divided by the number of cycles in the combined data set. Then, these rescaled weights can be summed to form a new weight for the combined survey cycles.  When analyses require sample weights for more than two-year cycles, several options exist. Four-year weights are often available on the environmental chemical data file. For more than four-year weights, refer to Module 11 of the Continuous NHANES Web Tutorial to learn how to construct various sample weights. A specific example follows, which describes how to construct weights for three survey cycles (1999-2000, 2001-2002, and 2003-2004).

 

Computing Phthalate six-year subsample weight 

To construct a subsample weight for six years of data, use the four-year subsample weight variable for 1999–2002 surveys and the two-year subsample weight variable for 2003–2004 survey:

if sddsrvyr in ( 1,2) then WTSPH6YR=WTSPH4YR* 2/3;

else if sddsrvyr= 3 then WTSPH6YR=WTSB2YR* 1/3;

 

Warning icon A subsample weight variable is usually included in its corresponding environmental chemical dataset if the laboratory measurements are only performed in a subsample of the total examined survey cycle sample. Some subsamples are mutually exclusive. When combining datafiles for two or more survey cycles, you need to select appropriate subsample weight variables. Refer to the table in Task 3 of the Locate Variables module for further information.

 

Step 4: Check Results

After appending the data files, it is recommended to check the contents again and make sure that the files were appended correctly. Use the PROC CONTENTS procedure, as demonstrated in Step 1, to check the combined files. Consult the Program to Check Data Contents & Compare Variable Names and Labels, above, for further instruction, if necessary.

Double check variable names and labels, and make sure that variables are renamed correctly, if necessary. Pay special attention to the number of observations in the combined dataset. You could use option IN=Variable and IF statement to specify only keeping observations matching those in the input environmental chemical data in your output dataset. Otherwise, the total number of observations in the final appended dataset should be the sum of the observations in the two data files.

 

Additional Resources

 

close window icon Close Window