Task 1a: How to Append NHANES Data in SAS

Here are the steps to appending NHANES data in SAS:

 

Step 1: Compare variable names and labels

The first step before appending data is to examine the contents of the data files. Using the SAS proc contents procedure, you will be able to get a list of variable names and variable labels for each data file selected. While reviewing the output of the proc contents procedure, you should compare variable names and labels to see whether any changes or differences occurred from cycle to cycle.

 

Program to Check Datasets' Contents
Statements Explanation
libname NH "C:\NHANES\DATA" ;

Use the libname statement to refer to the data folder.

proc contents data =NH.ALQ varnum ;

Use the proc contents procedure to list the contents of the 1999-2000 Alcohol Questionnaire file (NH.ALQ).

Use the varnum option to list the variables according to their positions in the dataset. Otherwise, SAS lists the variables alphabetically.

proc contents data =NH.ALQ_b varnum ;

Use the proc contents procedure to list the contents of the 2001-2002 Alcohol Questionnaire file (NH.ALQ_B).

proc contents data =NH.DEMO varnum ;

Use the proc contents procedure to list the contents of the 1999-2000 Demographic file (NH.DEMO).

proc contents data =NH.DEMO_b varnum ;

Use the proc contents procedure to list the contents of the 2001-2002 Demographic file (NH.DEMO_B).

 

Highlighted results from this demonstration:

 

 

Step 2: Append directly, if variables are identical

After carefully reviewing the Demographic, Blood Pressure Examination, Blood Pressure Questionnaire, Laboratory 13, and Medical Conditions Questionnaire files, you will find that the variables of interest in the two cycles remain the same. Therefore, you can directly append without any further changes.

Because you are interested only in a subset of the variables, you can use the keep option statement to select relevant variables. No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure was completed successfully. Additionally, you can use SAS Explorer to see that the new 4-year datasets (demo_4yr, bpx_4yr, bpq_4yr, mcq_4yr, and lab13_4yr) are in your WORK library.

 

Program to Directly Append Datasets
Statements Explanation
libname NH "C:\NHANES\DATA" ;

Use the libname statement to refer to the data folder.

data demo_4yr;

Use the data step to create a dataset for your 4 years of demographic data (DEMO_4YR).

set NH.demo ( keep =seqn sddsrvyr ridstatr ridpreg sdmvpsu sdmvstra wtmec4yr riagendr ridageyr ridreth1 dmdeduc)

NH.demo_b (keep =seqn sddsrvyr ridstatr ridpreg sdmvpsu sdmvstra wtmec4yr riagendr ridageyr ridreth1 dmdeduc);

Use the set statement to append the 2001-2002 demographic data file (NH.DEMO_B) to the 1999-2000 demographic data file (NH.DEMO). Use the keep statement to select the  variables of interest.

Notice that in the keep statement, a variable named " seqn" is included. SEQN stands for sequence number and should be included whenever datasets are appended. SEQN is a unique identifier for each observation (participant) in NHANES. Every time you extract variables from an NHANES data file, you should include the SEQN variable in your selection. Failing to do so will lead to problems if you want to sort or merge your data files at a later time. See Append & Merge Module Task 2 for more information on Merging.

data bpx_4yr;

Use the data step to create a dataset for your 4 years of blood pressure examination data (BPX_4YR).

set NH.bpx ( keep =seqn bpxsy1-bpxsy4 bpxdi1-bpxdi4)

NH.bpx_b (keep =seqn bpxsy1-bpxsy4 bpxdi1-bpxdi4);

Use the set statement to append the 2001-2002 blood pressure examination data file (NH.BPX_B) to the 1999-2000 blood pressure examination data file (NH.BPX). Use the keep statement to select the  variables of interest.

 

data bpq_4yr;

Use the data step to create a dataset for your four years of blood pressure questionnaire data (BPQ_4YR).

set NH.bpq ( keep =seqn bpq010 bpq020 bpq030 bpq050a bpq070 bpq080 bpq100d)

NH.bpq_b (keep =seqn bpq010 bpq020 bpq030 bpq050a bpq070 bpq080 bpq100d);

Use the set statement to append the 2001-2002 blood pressure questionnaire data file (NH.BPQ_B) to the 1999-2000 blood pressure questionnaire data file (NH.BPQ). Use the keep statement to select the variables of interest.

data mcq_4yr;

Use the data step to create the dataset for your 4 years of medical conditions questionnaire data (MCQ_4YR).

set NH.mcq (keep=seqn mcq160b mcq160c mcq160d mcq160e mcq160f)

NH.mcq_b (keep=seqn mcq160b mcq160c mcq160d mcq160e mcq160f);

Use the set statement to append the 2001-2002 medical conditions questionnaire data file (NH.MCQ_B) to the 1999-2000 medical conditions data file (NH.MCQ). Use the keep statement to select the variables of interest.

data lab13_4yr;

Use the data step to create a dataset for your 4 years of laboratory data (LAB13_4YR).

set NH.lab13 (keep=seqn lbxtc)

NH.l13_b (keep=seqn lbxtc);

run ;

Use the set statement to append the 2001-2002 laboratory data file (NH.LAB13_B) to the 1999-2000 laboratory data file (NH.LAB13). Use the keep statement to select variables of interest.

 

Step 3: Rename variables and/or recode variables before appending, if variables are different

Because the 1999-2000 Alcohol Questionnaire data files contains a variable (ALQ100) that was subsequently renamed in 2001-2002 (ALD100), you will need to rename the variable first and then append the data. If the response categories of the variables are different, you will also need to recode.

You will see in the code that the variable ALD100 in the 2001-2002 Alcohol Questionnaire data file was renamed to ALQ100, the same as the variable name in the 1999-2000 Alcohol Questionnaire data file. After renaming the 2001-2002 variable, you will be ready to append the data files with selected variables of interest.

 

Program to Rename Variables and Append
Statements Explanation
libname NH "C:\NHANES\DATA" ;

Use the libname statement to refer to the data folder.

data alq_4yr;

Use the data step to create the dataset for your 4 years of data (ALQ_4YR).

set NH.alq

   NH.alq_b

   (rename=(ald100=alq100));

run ;

Use the set statement to append the 2001-2002 alcohol questionnaire data file (NH.ALQ_B) to the 1999-2000 alcohol questionnaire data file (NH.ALQ).

Use the rename statement to rename the variable ALD100 in the 2001-2002 alcohol questionnaire data file to ALQ100, which is the name given to the same variable in the 1999-2000 alcohol questionnaire data file.

 

No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure completed successfully. Additionally, you can use SAS Explorer to see that the new 4-year datasets (alq_4yr) is in your WORK library.

 

Step 4: Check results

After appending the data files, it is a good idea to check the contents again to make sure that the files were appended correctly. Use the proc contents procedure, as demonstrated in Step 1, to check the combined files. Please consult the table, Program to Check the Datasets' Contents, above for further instruction, if necessary.

Double check variable names and labels, and make sure that variables are renamed correctly. Pay special attention to the number of observations in the combined dataset, which should be the sum of the observations in the two data files.

Highlighted results of the proc contents procedure on the new dataset are:

 

close window icon Close Window