Lesson 2: Summarizing Data
Section 1: Organizing Data
Whether you are conducting routine surveillance, investigating an outbreak, or conducting a study, you must first compile information in an organized manner. One common method is to create a line list or line listing. Table 2.1 is a typical line listing from an epidemiologic investigation of an apparent cluster of hepatitis A.
The line listing is one type of epidemiologic database, and is organized like a spreadsheet with rows and columns. Typically, each row is called a record or observation and represents one person or case of disease. Each column is called a variable and contains information about one characteristic of the individual, such as race or date of birth. The first column or variable of an epidemiologic database usually contains the person's name, initials, or identification number. Other columns might contain demographic information, clinical details, and exposures possibly related to illness.
Table 2.1 Line Listing of Hepatitis A Cases, County Health Department, January — February 2004
ID | Date of Diagnosis | Town | Age (Years) | Sex | Hosp | Jaundice | Outbreak | IV Drugs | IgM Pos | Highest ALT* |
---|---|---|---|---|---|---|---|---|---|---|
01 | 01/05 | B | 74 | M | Y | N | N | N | Y | 232 |
02 | 01/06 | J | 29 | M | N | Y | N | Y | Y | 285 |
03 | 01/08 | K | 37 | M | Y | Y | N | N | Y | 3250 |
04 | 01/19 | J | 3 | F | N | N | N | N | Y | 1100 |
05 | 01/30 | C | 39 | M | N | Y | N | N | Y | 4146 |
06 | 02/02 | D | 23 | M | Y | Y | N | Y | Y | 1271 |
07 | 02/03 | F | 19 | M | Y | Y | N | N | Y | 300 |
08 | 02/05 | I | 44 | M | N | Y | N | N | Y | 766 |
09 | 02/19 | G | 28 | M | Y | N | N | Y | Y | 23 |
10 | 02/22 | E | 29 | F | N | Y | Y | N | Y | 543 |
11 | 02/23 | A | 21 | F | Y | Y | Y | N | Y | 1897 |
12 | 02/24 | H | 43 | M | N | Y | Y | N | Y | 1220 |
13 | 02/26 | B | 49 | F | N | N | N | N | Y | 644 |
14 | 02/26 | H | 42 | F | N | N | Y | N | Y | 2581 |
15 | 02/27 | E | 59 | F | Y | Y | Y | N | Y | 2892 |
16 | 02/27 | E | 18 | M | Y | N | Y | N | Y | 814 |
17 | 02/27 | A | 19 | M | N | Y | Y | N | Y | 2812 |
18 | 02/28 | E | 63 | F | Y | Y | Y | N | Y | 4218 |
19 | 02/28 | E | 61 | F | Y | Y | Y | N | Y | 3410 |
20 | 02/29 | A | 40 | M | N | Y | Y | N | Y | 4297 |
* ALT = Alanine aminotransferase
Some epidemiologic databases, such as line listings for a small cluster of disease, may have only a few rows (records) and a limited number of columns (variables). Such small line listings are sometimes maintained by hand on a single sheet of paper. Other databases, such as birth or death records for the entire country, might have thousands of records and hundreds of variables and are best handled with a computer. However, even when records are computerized, a line listing with key variables is often printed to facilitate review of the data.
One computer software package that is widely used by epidemiologists to manage data is Epi Info, a free package developed at CDC. Epi Info allows the user to design a questionnaire, enter data right into the questionnaire, edit the data, and analyze the data. Two versions are available:
Epi Info 3 (formerly Epi Info 2000 or Epi Info 2002) is Windows-based, and continues to be supported and upgraded. It is the recommended version and can be downloaded from the CDC website: http://www.cdc.gov/epiinfo/downloads.htm.
Epi Info 6 is DOS-based, widely used, but being phased out.
This lesson includes Epi Info commands for creating frequency distributions and calculating some of the measures of central location and spread described in the lesson. Since Epi Info 3 is the recommended version, only commands for this version are provided in the text; corresponding commands for Epi Info 6 are offered at the end of the lesson.
Previous Page Next Page: Section 2
- Page last reviewed: May 18, 2012
- Page last updated: May 18, 2012
- Content source: