Task 1: How to Merge Data in NHANES

Warning iconThe steps below assume that you are already familar with the code to merge NHANES datasets. If you need more detailed instructions, please review the Append & Merge Datasets module in the Continuous NHANES Web Tutorial before continuing.

Here are the steps for merging NHANES environmental chemical data with demographics data:

Phthalate Data (1999-2004):

Step 1: Sort Data Files by Unique Identifier

The first step in merging data is to sort each data file by a unique identifier.  In NHANES data, this unique identifier is known as the respondent sequence number (SEQN). You need to sort each data file by the SEQN variable to ensure that all observations are ordered in the same way in each data file that you are going to merge.

 

Program to sort data

proc sort data =NH.DEMO out =DEMO;

    by SEQN;

proc sort data =NH.PHPYPA out =PHPYPA;

    by SEQN;

 

proc sort data =NH.DEMO_B out =DEMO_B;

    by SEQN;

proc sort data =NH.PHPYPA_B out =PHPYPA_B;

    by SEQN;

 

proc sort data =NH.DEMO_C out =DEMO_C;

    by SEQN;

proc sort data =NH.L24PH_C out =L24PH_C;

    by SEQN;

run ;

 

Step 2: Merge Data by a Unique Identifier

After sorting the data files, you can perform a one-to-one match-merging. To accomplish this, use a BY statement immediately following the MERGE statement. When preparing an environmental chemical analytic dataset, you will always use one-to-one match-merging by a unique identifier, in this case, the SEQN variable, since nearly all NHANES environmental chemical and demographics files are individual-level data. You will have only one record per sample person.

One-to-one match-merging combines observations from two or more SAS datasets in a new dataset according to the values of a unique identifier, SEQN. The number of observations in the output dataset is the sum of the largest number of observation in each BY group in all input datasets, in this case, total sample persons in demographics data. However, sample persons in the demographics file who are not selected for environmental chemical measurements will not contribute to the environmental chemical data analysis. You can use the option IN=Variable (in this example, the variable name is called “PH”) and an IF statement (“PH=1”) to retain only the matched records with the phthalate data. Also, you can use a KEEP statement to retain selected variables of interest in the output dataset. For the limited number of environmental chemicals which are measured in the full sample (such as lead), it would be useful to retain all the records when merging those laboratory results with other NHANES variables.

 

 

Program to merge data

data DEMOPHT;

    merge DEMO PHPYPA( in =PH);

    by SEQN;

    if PH= 1;

    keep SEQN RIAGENDR RIDAGEYR RIDRETH1 SDMVSTRA SDMVPSU URXMHP URXUCR WTSPH4YR;

 

data DEMOPHT_B;

    merge DEMO_B PHPYPA_B( in =PH);

    by SEQN;

    if PH= 1;

    keep SEQN RIAGENDR RIDAGEYR RIDRETH1 SDMVSTRA SDMVPSU URXMHP URXUCR WTSPH4YR;H4YR;

 

data DEMOPHT_C;

    merge DEMO_C L24PH_C( in =PH);

    by SEQN;

    if PH= 1;

    keep SEQN RIAGENDR RIDAGEYR RIDRETH1 SDMVSTRA SDMVPSU URXMHP URDMHPLC URXUCR WTSB2YR;

run ;

 

Step 3: Check the Results

After you have merged the data files, it is advisable that you check the contents again to make sure that the files merged correctly.  Use the PROC CONTENTS procedure to list all variables and their attributes. Use the PROC MEANS procedure to check the number of observations, as well as missing, minimum, and maximum values, for each variable.

 

Program to check contents of new datasets

proc contents data =DEMOPHT_A varnum ;

 

proc contents data =DEMOPHT_B varnum ;

 

proc contents data =DEMOPHT_C varnum ;

 

proc means data =DEMOPHT_A N Nmiss min max maxdec = 2;

 

proc means data =DEMOPHT_B N Nmiss min max maxdec = 2;

 

proc means data =DEMOPHT_C N Nmiss min max maxdec = 2;

 

run ;

 

Additional Resources

 

 

close window icon Close Window