Task 2: How to Identify Outliers and Evaluate Their Impact

In this task, you will check for outliers and their potential impact using the following steps:

Warning iconThe steps below assume that you are already familiar with the SAS code used to identify outliers and evaluate their impact in NHANES datasets. If you need more detailed instructions, please review the Clean & Recode Data module in the Continuous NHANES Web Tutorial before continuing.

Step 1: Check for Outliers by Running a "PROC Univariate" Analysis

Before you analyze your data, examine the distribution and normality of the data, and identify outlying values.

 

Program to Obtain Descriptive Statistics for Numeric Variables

proc univariate data =Phthalate normal plot ;

    var URXMHP;

    id seqn;

run ;

Additional Resources

 

Step 2: Plot Sample Weight Against the Variable of Interest

Example: Plot the phthalate subsample weight (WTSPH6YR) against the values of urinary mono-(2-ethyl)-hexyl phthalate to identify any outliers.

 

Program to Plot the Values of Weight against the Values of Urinary Phthalate

/********************************************************************************

* Use the PROC GPLOT procedure to plot urinary mono-(2-ethyl)-hexyl phthalate   *

* (URXMHP) by the corresponding weights for each observation in the dataset.    *

* Symbol and height are option statements used to format the output of the plot *

********************************************************************************/

symbol1 value =dot height = .2;

proc gplot data =Phthalate;

    plot WTSPH6YR*URXMHP/ frame ;

run ;

Additional Resources

 

Step 3: Identify Outliers and Compare Estimates with Outliers Deleted Against the Original Estimates with Outliers Included

In this step you will:

For this example, assume that four observations may be outliers.

 

Program to Create Dataset Without Outliers and to Produce Means

/*******************************************************************************

* Use the IF, THEN, and DELETE statements to remove the identified outliers.   *

* Use the PROC MEANS procedure to produce means and standard error for the     *

* dataset with and without outlier values.                                     *

********************************************************************************/

data Exclu4SPs;

    set Phthalate;

    if seqn in ( 3140,11249,14737,24817) then delete ;   

 

proc means data =Phthalate mean stderr maxdec = 1;

    title 'Without exclusion' ;

    var URXMHP;

    class RIAGENDR;

    weight WTSPH6YR;

 

proc means data =Exclu4SPs mean stderr maxdec = 1;

    title 'After removing 4 outlier values' ;

    var URXMHP;

    class RIAGENDR;

    weight WTSPH6YR;

run ;

 

Additional Resources

 

 

close window icon Close Window