Task 2b: How to Generate Population Counts in SAS Survey Procedures

In this example, you will use SAS Survey Procedures to combine age subgroups and generate population estimates for high blood pressure (HBP) by sex and race/ethnicity for persons 20 years and older.  The method outlined in this module uses a SAS data file with CPS population totals. The process for combining subgroups and calculating population estimates is then automated using the code outlined below. 

Alternatively, you can use the CPS population totals located on the respective survey cycle NHANES web page (referred to in Key Concepts), plus the results from a proc surveymeans procedure and manually calculate population estimates within a spreadsheet.  If you choose this option, you will need to define the age, race/ethnicity and gender subgroups of interest and calculate population totals within the spreadsheet on your own.

 

 

Step 1: Calculate Prevalence of Health Condition of Interest

The SAS Survey Procedure, proc surveymeans, is used to generate population estimates.  The general program for obtaining population estimates is outlined in the 3-step process below:

In the first step, you will calculate the prevalence of the health condition (i.e. HBP) by sub-domains of interest.  You will need to use appropriate weights, especially when combining across survey cycles.

 

The health outcome must be coded as a dichotomous (0, 100) variable for absence (0) or presence (100) of the health condition of interest (i.e.  HBP and HBPX).

 

hbpx=. ;

if hbp= $1 then hbpx= 100 ;

else if hbp= $1 then hbpx= $1 ;

 

A new variable (sel) will be created to reflect the study subpopulation of interest (age 20 years and older) used in the domain statement of the proc surveymeans procedure.

 

sel=. ;

If ridageyr ge 20 then sel=1;

Else sel=2;

 

Population estimates will not be age standardized, so the estimates reflect the true population sampled. The results will be output to a SAS data file using the ods output statement below.  

 

Info iconIMPORTANT NOTE

These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

SAS Survey Procedure for Generating Prevalence Rates
Statements Explanation
proc surveymeans data=ANALYSIS_DATA nobs mean stderr clm;

Use the proc surveymeans procedure to obtain number of observations, mean, standard error and confidence intervals.

strata sdmvstra; 

Use the stratum statement to define the strata variable (sdmvstra).

cluster sdmvpsu; 

Use the cluster statement to define the PSU variable (sdmvpsu).

class

Use the class statement to specify the discrete variables used to select the subpopulations of interest (i.e., gender [riagendr] and race [race]).

var hbpx; 

 

Use the var statement to specify which variable(s) will be analyzed. In this example, the HBP variable (hbpx) is used.

weight

Use the weight statement to account for the unequal probability of sampling and non-response. In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

domain sel sel*riagendr sel*race sel*riagendr*race;

Use the domain statement to specify the subpopulations of interest.

ods OUTPUT domain(match_all)=unadj;

run ;

Use the ods statement to output the SAS dataset of estimates from the subdomains listed on the domain statement.  This set of commands will output four datasets for each domain specified in the domain statement above (unadj for sel  unadj1 for sel*riagendr, unadj2 for sel*race, and undadj3 for sel*riagendr*race).

 

Format Data from SAS Output Dataset
Statements Explanation
data bp_stats;
set unadj unadj1 unadj2 unadj3;

Use the data statement to create a new dataset (bp_stats) from the SAS dataset created previously (unadj unadj1 unadj2 unadj3).

if sel= 1 ;

if race= . then race= 0 ;

if riagendr= . then riagendr= 0