Data and Documentation

The GTSS uses a two-stage sample cluster design. To reflect the complex sample design, there are two sample design variables on your data set named STRATUM and PSU (acronym for Primary Sampling Unit).

The variable STRATUM usually consists of two schools that are paired so that both schools have similar enrollment sizes. However, sometimes a STRATUM may have only one school. For example, if a school has 100% chance of being in the selected school list (due to large enrollment) it will be the only school in that stratum, and we call this type of school a Certainty School.

In most cases, the Primary Sampling Unit represents a school. If the school is a Certainty School then the PSUs are the classes within the school.

The sampling weight variable is named FINALWGT.

Each student in the data set is assigned a sampling weight, which accounts for the following:

Selection probability of the school
Selection probability of the class
Distribution of the population by grade and sex
Non-responding schools
Non-responding students
Non-responding classes

Point estimates and 95% confidence intervals can be calculated using several software packages for statistical analysis of correlated data. Below are sample codes for EPIINFO, SUDAAN and STATA.

EPIINFO Sample Code:
FREQ CR3 STRATAVAR = Stratum WEIGHTVAR=FinalWgt PSUVAR=PSU

[GRAPHIC HERE]

SUDAAN Sample Code:
proc sort data = sasdata.dataset;
by stratum psu;
run;

proc crosstab data = sasdata.dataset design = wr;
nest stratum psu/missunit;
weight finalwgt;
tables cr3;
run;

Multimedia

Follow CDCGlobal

Smoking & Tobacco Use Media

File Formats Help:

How do I view different file formats (PDF, DOC, PPT, MPEG) on this site?

Page last reviewed: December 20, 2016
Page last updated: December 20, 2016
Content source:
- Office on Smoking and Health, National Center for Chronic Disease Prevention and Health Promotion

Get Email Updates

Quick Links

Related CDC Sites