How to Check Data Normality

In this example, you will use SAS to analyze descriptive statistics for normality on the analyte URXMHP, and also assess the normality using graphical approaches.

 

Descriptive Statistic Estimates and Plots of Normality in SAS - the Univariate Procedure

Basic information about how the data are distributed will be provided by a data summary using the following five numbers:

More detailed descriptive statistics can be examined with the PROC UNIVARIATE procedure.  It will provide statistical measures such as:

Descriptive plots, including histograms, box plots, and normal probability plots, provide graphic representation on the shape of the data distribution. A box-and-whisker plot conveys location and variation information including the mean, median, quartiles, and minimum and maximum observations, and ‘potential outliers’. A histogram displays tabulated frequencies of data in bins and if it is bell-shaped, the underlying distribution is symmetric and perhaps approximately normal.

A normal probability plot, sometimes called a “Quantile-Quantile” (QQ) plot, can help evaluate observed response values against a theoretical normal distribution. If the normal distribution is a good fitting for the variable, you would see a nearly 45 degree linear pattern. If a plot is not a line, then you need to seek a suitable transformation in the data analysis, or use a non-parametric test.

None of the above statistics or graphs can be used as a sole factor in determining the normality of data distributions. You need to combine the tools when testing for a normal distribution. This task, The Key Concepts and How to Check Frequency Distribution and Normality in SAS from the Continuous NHANES Web Tutorial, provides more details about the Descriptive Statistics.

Below is some example code to produce descriptive statistics and plots of normality using environmental chemical variables.

Program to Calculate Descriptive Statistics and Plots of Normality


Sample Code

libname mh 'C:\myfiles\temp\NHANES';

proc univariate data=mh.mehp plot normal;
     var URXMHP;
     freq WTMHP6YR;
     title "Check data distribution for URXMHP";
run;
proc univariate data=mh.mehp noprint;
     qqplot URXMHP / cframe = ligr;
     title "QQ plot for URXMHP";

run;

Output of Program

Descriptive Statistics Output [PDF - 76 KB]

Graph of Plot of Normality [PDF - 88 KB]

 

close window icon Close Window to return to module page.