Task 1a: How to Check Frequency Distribution and Normality

The SAS procedure, proc univariate, generates descriptive and summary statistics that are useful in describing the characteristics of a distribution. These statistics can also be used to determine whether parametric (for a normal distribution) or non-parametric tests are appropriate to use in your analysis. As noted in the Clean & Recode Data module it is advisable to check for extreme weights and outliers before starting any analysis.

 

Step 1: Use the univariate procedure to generate descriptive statistics in SAS

Use the SAS procedure, proc univariate, to generate descriptive statistics. The frequency distribution can be presented in table or graphic format. The freq option generates the frequency distribution in tabular form by listing the number of observations for each value of the variable. Due to the large sample size and the possibility of a long list of different values, it is not reasonable to request the freq option for variables that are not nominal or ordinal. The plot option generates the frequency distribution in graphic form (histogram, box, and normal probability plots), and the normal option generates statistics to test the normality of the distribution.

 

information icon These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.


SAS Univariate Procedure for Descriptive Statistics

Statements Explanation

proc sort data=analysis_data;

 by riagendr age;

run;

Use the sort procedure to sort data by the same variables used in the by statement of the univariate procedure. In the example, data is sorted by gender (riagendr) and age (age).

 

PROC UNIVARIATE PLOT NORMAL;

 

 

Use the univariate procedure to generate descriptive statistics, which include number of missing values, mean, standard errors, percentiles, and extreme values. Use the plot option to generate histogram, box and normal probability plots, and the normal option to generate statistics to test normality.

In this example, plots (plot) and normality test statistics (normal) are requested and the results will be sorted and generated separately for each combination of the variables on the by statement.

where

ridageyr >= 20;