Task 3: How to Merge and Append NHANES Data for CVX Analyses

To merge and append NHANES data for the CVX analysis, you will need to:

Sort Data Files by a Unique Identifier

The first step in merging data is to sort each data file by a unique identifier. Each study participant is assigned a unique identifier, represented by the variable SEQN. Use the PROC SORT procedure to sort the DEMO and CVX files by the SEQN variable. If you have just downloaded the data from NCHS, the files are already sorted by SEQN and you can comment out the proc sort steps. However, the code is included in case any changes have been made to the files before merging. In this segment of code, we use the “out” statement to store the sorted dataset to the SAS temporary library titled ‘WORK’. You can explore the WORK library by accessing the SAS Explorer and navigating to “Libraries”.

Sample Code

proc sort data = demo_b.demo_b out = demo_b;
 by seqn;
run ;

proc sort data = cvx_b.cvx_b out = cvx_b;
 by seqn;
run ;

proc sort data = demo_c.demo_c out = demo_c;
 by seqn;
run ;

 

proc sort data =cvx_c.cvx_c out = cvx_c;
 by seqn;
run ;

Merge Data Files by the Unique Identifier

Merging, as well as sorting, is done using a unique identifier. Use the SEQN variable to merge the demographic (DEMO) and cardiovascular fitness (CVX) data. We have created two intermediate datasets for this task: cvx1 and cvx3. The cvx1 intermediate dataset merges the DEMO_B and CVX_B files from the NHANES 2001-2002 cycle.  Similarly, the cvx3 intermediate dataset merges the DEMO_C and CVX_C from the 2003-2004 NHANES cycle.

After you have merged the data files, check the contents of your intermediate datasets to make sure the files merged correctly. Use the PROC CONTENTS procedure to list all variable names and labels. You can also use the PROC MEANS procedure to check the number of observations for each variable as well as missing, minimum, and maximum values.

Sample Code

data cvx1;
merge demo_b cvx_b;
by SEQN;
run ;

proc contents data = cvx1;
run ;

data cvx3;
merge demo_c cvx_c;
by SEQN;
run ;

proc contents data = cvx3;
run ;

Append Data for Multiple NHANES Cycles

Before appending data from two or more cycles, examine the contents of the data file to identify variables whose names may have changed between cycles. 

  • If the names or labels of the variables of interest are identical in the selected cycles, you can append the data files directly.
  • If the variables of interest have changed, you will need to evaluate the differences in the wording of the question, definitions, and response choices that were used during data collection. You may need to recode the variables before the files can be appended.  Notably, the NHANES CVX data do not include any significant changes in the variable names or labels for data collected during each 2-year cycle for the continuous NHANES between 2001 and 2004.  Thus, we will perform a direct append of the data from the 2001-2002 and 2003-2004 NHANES cycles.  Code for the resulting intermediate dataset titled “cvx” is included below.

Sample Code

data cvx;
set cvx1 cvx3;
run ;

Construct new sample weights

When you combine two or more 2-year cycles of the continuous NHANES for NHANES 2001-2002 and beyond, you must construct sample weights before beginning any analyses. When survey cycles are combined, the estimates will be representative of the population at the midpoint of the combined survey period.

For the 4 years of CVX data from 2001-2004, a weight should be constructed as:

Newly Constructed Weight =  1/2 * WTMEC2YR = WTMEC4BC

For the CVXMSTR.sas dataset, we name the newly constructed weight for the 2001-2004 CVX data “WTMEC4BC.”