Task 3: Key Concepts about Outliers in NHANES Data

Outliers, or extreme values in the data, are common in surveys such as the NHANES.  They can occur as a result of errors in the data collection or recording, or for other reasons.  Data collection and recording errors should be minimal within the publicly released NHANES dietary data, as discussed in the task on missing values, because the data were reviewed carefully before release.  The more problematic outliers are the legitimate values that are far outside the range of other values in the data.  Examples of these might include a report of consuming an entire watermelon during a single eating occasion or taking 20 or more kinds of supplements every day. 

Consider outliers carefully, as their presence may substantially affect your results, especially if the sample weight associated with the outlying value is large (see Key Concepts about Identifying Correct Sample Weights and Their File Locations, in the Locate Variables module, for more information about sample weights).  In some types of analysis, outliers have the potential to distort statistical estimates, alter apparent relationships, and lead to faulty conclusions.  In these cases, the outliers may be deleted or the data transformed to lessen their impact.  On the other hand, if the data are assumed to be correct and the statistical methods are robust in dealing with outlying values, outliers may sometimes be accommodated.

Please consult the Analytical Guidelines for more information on this topic.

 

close window icon Close Window to return to module page.