Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content

Lesson 2: Summarizing Data

Section 1: Organizing Data

Whether you are conducting routine surveillance, investigating an outbreak, or conducting a study, you must first compile information in an organized manner. One common method is to create a line list or line listing. Table 2.1 is a typical line listing from an epidemiologic investigation of an apparent cluster of hepatitis A.

A variable can be any characteristic that differs from person to person, such as height, sex, smallpox vaccination status, or physical activity pattern. The value of a variable is the number or descriptor that applies to a particular person, such as 5'6" (168 cm), female, and never vaccinated.

The line listing is one type of epidemiologic database, and is organized like a spreadsheet with rows and columns. Typically, each row is called a record or observation and represents one person or case of disease. Each column is called a variable and contains information about one characteristic of the individual, such as race or date of birth. The first column or variable of an epidemiologic database usually contains the person's name, initials, or identification number. Other columns might contain demographic information, clinical details, and exposures possibly related to illness.

Table 2.1 Line Listing of Hepatitis A Cases, County Health Department, January — February 2004

IDDate of
Diagnosis
TownAge (Years)SexHospJaundiceOutbreakIV DrugsIgM PosHighest ALT*
0101/05B74MYNNNY232
0201/06J29MNYNYY285
0301/08K37MYYNNY3250
0401/19J3FNNNNY1100
0501/30C39MNYNNY4146
0602/02D23MYYNYY1271
0702/03F19MYYNNY300
0802/05I44MNYNNY766
0902/19G28MYNNYY23
1002/22E29FNYYNY543
1102/23A21FYYYNY1897
1202/24H43MNYYNY1220
1302/26B49FNNNNY644
1402/26H42FNNYNY2581
1502/27E59FYYYNY2892
1602/27E18MYNYNY814
1702/27A19MNYYNY2812
1802/28E63FYYYNY4218
1902/28E61FYYYNY3410
2002/29A40MNYYNY4297

* ALT = Alanine aminotransferase

Some epidemiologic databases, such as line listings for a small cluster of disease, may have only a few rows (records) and a limited number of columns (variables). Such small line listings are sometimes maintained by hand on a single sheet of paper. Other databases, such as birth or death records for the entire country, might have thousands of records and hundreds of variables and are best handled with a computer. However, even when records are computerized, a line listing with key variables is often printed to facilitate review of the data.

Epi InfoIcon of the Epi Info computer software developed at CDC

One computer software package that is widely used by epidemiologists to manage data is Epi Info, a free package developed at CDC. Epi Info allows the user to design a questionnaire, enter data right into the questionnaire, edit the data, and analyze the data. Two versions are available:

Epi Info 3 (formerly Epi Info 2000 or Epi Info 2002) is Windows-based, and continues to be supported and upgraded. It is the recommended version and can be downloaded from the CDC website: http://www.cdc.gov/epiinfo/downloads.htm.

Epi Info 6 is DOS-based, widely used, but being phased out.

This lesson includes Epi Info commands for creating frequency distributions and calculating some of the measures of central location and spread described in the lesson. Since Epi Info 3 is the recommended version, only commands for this version are provided in the text; corresponding commands for Epi Info 6 are offered at the end of the lesson.

Top