Task 1a: How to Identify and Recode Missing Data in NHANES Using SAS

The first task is to identify missing data and recode it Here are the steps:

 

Step 1: Identify missing and unavailable values

In this step, you will use the proc means procedure to check for missing, minimum and maximum values of continuous variables, and the proc freq procedure to look at the frequency distribution of categorical variables in your master analytic dataset. The output from these procedures provides the number and frequency of missing values for each variable listed in the procedure statement. 

 

warning iconWARNING

Typically, proc means is used for continuous variables, and proc freq is used for categorical variables. In the following example, we provide proc means and proc freq procedures on the same set of variables without distinguishing continuous and categorical variables. If you perform a proc freq on a continuous variable with many values, the output could be extensive.

 

proc means
Statements Explanation
Proc means data=demo_BP N Nmiss min max; Use the proc means procedure to determine the number of missing observations (Nmiss), minimum values (min), and maximum values (max) for the selected variables.
where ridstatr= 2 and ridageyr>= 20 ; Use the where statement to select the participants who were interviewed and examined in the MEC and who were age 20 years and older.
var BPQ010--BPQ100d  MCQ160b--MCQ160f; Use the var statement to indicate the variables of interest.

 

proc freq
Statements Explanation
Proc freq data=demo_BP; Use the proc freq procedure to determine the frequency of each value of the variables listed.
where ridstatr= 2 and ridageyr>= 20 ; Use the where statement to select the participants who were interviewed and examined in the MEC and who were age 20 years and older.
Table BPQ010--BPQ100d  MCQ160b--MCQ160f/list missing; Use the table statement to indicate the variables of interest. Use the list missing option to display the missing values.

 

Highlighted items from proc means and proc freq output:

 

Step 2: Recode unavailable values as missing

Two options can be used to recode the missing data:

Option 1 – Assign Missing Values One Variable at a Time
Statements Explanation

Data demo_BP1;
     set
demo_BP;

Use the data statement to create a new dataset from your existing dataset; the name of the existing dataset is listed after the set statement.

if BPQ010 in ( 7 , 9 ) then BPQ010= . ;

Use the if…then statement to recode "7" and "9" values of a variable as missing.
Option 2 - Assign Missing Values by Group Using an Array
Statements Explanation
Data demo_BP1; set demo_BP; Use the data statement to create a new dataset from your existing dataset; the name of the existing dataset is listed after the set statement.

array _rdmiss bpq020 bpq070 bpq080 mcq160b--mcq160f;

  do over _rdmiss;

  if _rdmiss in ( 7 , 9 ) then _rdmiss= . ;

  end ;

Use the array statement to recode "7" and "9" values of a variable as missing.  In this example, _rdmiss designates the name of the array. Use this option when you want to recode multiple variables that use the same numeric value for "refused" and "don't know". 

 

Step 3: Evaluate extent of missing data

In this step we will use the proc freq procedure to ensure that the recoding done in the previous step was done correctly. As a general rule, if 10% or less of your data for a variable are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment. However, if more than 10% of the data for a variable are missing, you may need to determine whether the missing values are distributed equally across socio-demographic characteristics, and decide  whether further imputation of missing values or use of adjusted weights are necessary. (Please see Analytic Guidelines for more information.)

 

Check the extent of missing data
Statements Explanation
Proc freq data =demo_BP1; Use the proc freq procedure to determine the frequency of each value of the variables listed.
where ridstatr= 2 and ridageyr>= 20 ; Use the where statement to select the study group who were interviewed and MEC examined and who were age 20 years and older.
table BPQ010--BPQ100d MCQ160b--MCQ160f/ list missing ;
run ;
Use the table statement to indicate the variables of interest.

 

Highlighted items from the proc freq output for recoding missing values:

 

close window icon Close Window