Task 1: Key Concepts about Missing Data in NHANES

A number of variables in NHANES contain missing values, that is data for some individuals that are unavailable for analysis.  Because these missing values may distort your results, you must evaluate the extent of missing data in your dataset to determine whether the data are useable without additional reweighting for item non-response. 

As a general rule, if 10% or less of the data for a variable or interest are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment.  However, if more than 10% of the data for a variable are missing, you may need to determine whether the missing values are distributed equally across socio-demographic characteristics, and decide whether further imputation of missing values or use of adjusted weights is necessary.  (See the Analytic and Reporting Guidelines for more information.)

Data that are coded as “missing” in the dataset are completely unavailable (for example, a measurement may have been unobtainable or spoiled.)  In the codebooks of current NHANES data, missing values are denoted in the following way:

It is important to note that the period used for missing data within a numeric variable actually represents a number less than zero in SAS. Therefore, it is important to remember that when recoding data and creating variables, SAS logical expressions using the less than operator (<) with no lower bound may include missing data erroneously.

Other types of values in the dataset also are important to consider as unavailable for analysis, or missing.  When a participant refuses to answer a question, a “refused” response is assigned a value of either “7,” “77,” or “777” depending on the number of digits in the variable value range. On the other hand, a “don’t know” response is assigned a value of either “9,” “99,” or “999,” depending on the number of digits in the variable value range.  

If you fail to identify these variations of missing data, and treat the assigned values for “refused” or “don’t know” as real values, your statistical analyses will produce distorted results.  Therefore, you should recode “refused” or “don’t know” responses as true missing values (either as a period (.) for numeric variables or as a blank for character variables).

 

Unavailable Values in NHANES Data

NHANES codes

Description

Action

. (period)

Missing numeric value

None

 (blank space)

Missing character value

None

7 or 77 or 777

"Refused" response

Code as missing (period or blank)

9 or 99 or 999

"Don't know" response

Code as missing (period or blank)

 

Missing data present a few concerns for dietary recall data

For most other persons, missing data are not much of an issue in examining dietary recall data, because the data were evaluated before release and complete and reliable recalls are identified by a specific variable.  It may be important to check the variables for recall status (DR1DRSTZ, DR2DRSTZ) and the variables for pregnant and lactating status (RIDEXPRG in the Demographics file and RHQ200 in the Reproductive Health file) depending on your analysis. 

Missing data are not a big issue for the dietary supplement data.  However, analysts will encounter missing data when looking at the following two variables:

NHANES collects data on prescription drugs after the dietary supplement collection.  The questions are slightly different, and for the prescription drug section, information is not collected on the number of days the prescription was taken in the past 30 days or how much was taken on each day.  Occasionally, prescription and over-the-counter dietary supplements and/or antacids are reported by participants during the prescription medication section.  NHANES moves these responses to the dietary supplement section when the data is released. However, data for these two variables are missing because they were never collected.

 

Missing data are an issue with the food frequency questionnaire because individuals sometimes failed to answer some items.  FFQ data include a file including the raw data (where missing data can be identified) and one that has been cleaned for use of the data as covariates.  

warning iconWARNING

Even if your dietary variable of interest contains no missing values, missing data can still be an important issue if you are looking at dietary data in relation to other NHANES variables.  Data values are sometimes missing for demographic variables and health status measures.

 

close window icon Close Window to return to module page.