Task 1c: How to Identify and Recode Missing Data in NHANES Using Stata

The first task is to identify missing data and recode it. Here are the steps:

 

Step 1: Identify missing and unavailable values

In this step, you will use the tabstat and nmissing commands to check for missing, minimum and maximum values of continuous variables, and the tabulate command to look at the frequency distribution of categorical variables in your master analytic dataset. The output from these commands provides the number and frequency of missing values for each variable listed in the procedure statement. 

 

warning iconWARNING

Typically the commands, tabstat or summarize are used for continuous variables, and tabulate is used for categorical variables. In the following example, tabstat and tabulate commands are provided on the same set of variables without distinguishing continuous and categorical variables. If you use the tabulate command on a continuous variable with many values, the output could be extensive.

 

Use the tabstat and nmissing commands to determine the minimum values (min), and maximum values (max), and the number of missing observations for the selected variables for participants who were interviewed and examined in the MEC  and who were age 20 years and older.

 

info iconIMPORTANT NOTE

The nmissing command can be installed from http://www.stata-journal.com/software/sj5-4/dm67_3/.

 

tabstat bpq* mcq* if (ridageyr >=20 & ridageyr <.) & ridstatr==2, stat(n min max)
nmissing bpq* mcq* if (ridageyr >=20 & ridageyr <.) & ridstatr==2

 

Use the tabulate command to determine the frequency of each value of the variables listed for participants who were interviewed and examined in the MEC and who were age 20 years and older. Use the missing option to display the missing values.

 

tabulate bpq010 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing

 

Highlighted items from the commands tabstat, nmissing and tabulate output:

 

 

Step 2: Recode unavailable values as missing

Two options can be used to recode the missing data:

Option 1 – Assign Missing Values One Variable at a Time

Use the if qualifier to recode "7" and "9" values of a variable as missing.

replace bpq010=. if bpq010==7 | bpq010==9

 

Option 2 - Assign Missing Values by Group

Use the foreach loop command to recode "7" and "9" values of a variable as missing.

Use this option when you want to recode multiple variables that use the same numeric value for "refused" and "don't know".  Use the save command to create a new dataset with the recoded values.

foreach i in bpq020 bpq050a bpq100d bpq070 bpq080 mcq160b mcq160c mcq160d mcq160e mcq160f {
replace `i' =. if `i' >=7
save C:\Nhanes\Data\demo_bp1, replace

 

 

Step 3: Evaluate extent of missing data

In this step you will use the tabulate command to ensure that the recoding done in the previous step was done correctly. As a general rule, if 10% or less of your data for a variable are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment. However, if more than 10% of the data for a variable are missing, you may need to determine whether the missing values are distributed equally across socio-demographic characteristics, and decide whether further imputation of missing values or use of adjusted weights are necessary. (Please see Analytic Guidelines for more information.)

 

Check the extent of missing data

Use the tabulate command to determine the frequency of each value of the variables listed for participants who were interviewed and examined in the MEC and who were age 20 years and older. Use the missing option to display the missing values. Use the foreach loop command to get the frequency of multiple variables.

 

tabulate bpq010 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
 
foreach i in bpq020 bpq070 bpq080 mcq160b mcq160c mcq160d mcq160e mcq160f {
    tabulate `i' if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
}

 

Highlighted items from the tabulate output for recoding missing values:

 

close window icon Close Window