Skip to Content Skip to Search Skip to Left Navigation U.S. Department of Transportation (US DOT) Logo Research and Innovative Technology Administration (RITA) Logo National Transportation Library
  ABOUT RITA | CONTACT US | PRESS ROOM | CAREERS | SITE MAP
 

Statistical Policy Working Paper 13 - Federal Longitudinal Surveys




Click HERE for graphic.

 

 

 

 

 

MEMBERS OF THE FEDERAL COMMITTEE ON

STATISTICAL METHODOLOGY

 

                          (November 1985)

 

Maria Elena Gonzalez (Chair)       Daniel Kasprzyk

Office of Information and          Bureau of the Vensus

Regulatory Affairs (OMB)             (Commerce)

 

Barbara A. Bailar                  William E. Kibler

Bureau of the Census               Statistical Reporting Service

(Commerce)                           (Agriculture)

 

Yvonne M. Bishop                   David Pierce

Energy Information                 Federal Reserve Board

Administration (Energy)

 

Edwin J. Coleman                   Thomas Plewes

Bureau of Economic Analysis        Bureau of Labor Statistics

(Commerce)                          (Labor)

 

John E. Cremeans                   Jane Ross

Business Analysis                  Social Security Administration

(Commerce)                         (Health and Human Services)

 

Zahava D. Doering                  Fritz Scheuren

Defense Manpower Data Center       Internal Revenue Service

(Defense)                           (Treasury)

 

Daniel H. Carnick                  Monroe G. Sirken

Bureau of Economic Analysis        National Center for Health

(Commerce)                          Statistics (Health and

                                    Human Services)

 

Terry Ireland                      Thomas G. Staple

National Security Agency           Social Security Administration

 (Defense)                         (Health and Human Services)

 

Charles D. Jones                   Robert D. Tortora

Bureau of the Census               Statistical Reporting Service

 (Commerce)                         (Agriculture)

 

 

                              PREFACE

 

The Federal Committee on Statistical Methodology was organized by

OMB in 1975 to investigate methodological issues in Federal 

statistics. Members of the committee, selected by OMB on the basis

of their individual expertise and interest in statistical methods,

serve in their personal capacity rather than as agency

representative.  The committee carries out its work through

subcommittees that are organized to study particular issues and

that are open to any federal employees who wish to participate in

the studies.  Working papers are prepared by the subcommittee

members and reflect only their individual and collective views.

 

This working paper of the Subcommittee on Federal Longitudinal

Surveys discusses the goals, management, operations, sample

designs, estimation methods, and analysis of longitudinal surveys. 

Conclusions are drawn about where to use longitudinal surveys, and

the need to have an evaluation component in these surveys.  The

Appendices contain twelve case studies of recent longitudinal

surveys.  The report is intended primarily to be useful to Federal

agencies in choosing to do, and then in designing, carrying out,

and analyzing data from longitudinal surveys.  The Federal

Committee on Statistical Methodology intends to organize seminars

to discuss the report with interested Federal agency staff members.

 

The Subcommittee on Federal Longitudinal Surveys was co-chaired by

Barbara A. Bailar and Daniel Kasprzyk, Bureau of Census, Department

of Commerce.

 

 

 

 

    MEMBERS OF THE SUBCOMMITTEE ON FEDERAL LONGITUDINAL SURVEYS

 

 

Barbara A. Bailar* (Co-chair)      Lawrence Ernst

Bureau of the Census (Commerce)    Bureau of the Census (Commerce)

 

Daniel Kasprzyk* (Co-chair)        Marie E. Gonzalez* (ex officio)

Bureau of the Census (Commerce)    Office of the Information and

                                     Regulatory Affairs (OMB)

 

Barry Bye                          Catherine Hines

Social Security Administration     Bureau of the Census (Commerce)

(Health and Human Services)

 

Dennis Carroll                     Curtis Jacobs

Center for Statistics              Bureau of Labor Statistics

     (Education)                     (Labor)

 

Robert Casady                      Inderjit Kundra

National Center for Health         Energy Information 

Statistics                          Administration

(Health and Human Services)        (Energy)

 

Steven B. Cohen                    Bruce Taylor

National Center for Health         Bureau of Justice Statistics

 Services Research (Health           (Justice)

 and Human Services)

 

               ADDITIONAL CONTRIBUTOR TO THE REPORT

 

 

Lawrence Corder

Research Triangle Institute

  (Previously National Center

     for Health Statistics)

 

 

*Member, Federal Committee on Statistical Methodology

 

 

 

 

 

                         ACKNOWLEDGEMENTS

 

 

     This report is the result of collective work and many meetings

of the Subcommittee on Federal Longitudinal Surveys.  Each chapter

had a principal author (or authors), as noted below, but the final

report, particularly the introduction and summary sections,

reflects contributions from all of the Subcommittee

 

 

     Many useful suggestions on content and organization were made 

by Maria Gonzales, chairperson of the Federal Committee on

Methodology (FCSM).

 

     Barbara Bailar, Co-Chair of the Subcommittee, prepared the

Introduction and the concluding Chapter, which embody the

discussions held by the whole Subcommittee.

 

     All of the FCSM members reviewed several drafts and made many

important suggestions.  The Subcommittee in particular wishes to

recognize the valuable contributions made by the primary reviewers:

Zahava Doering, Fritz Scheuren and especially Monroe Sirken, who

read and commented on two drafts of the complete report.

 

     The principal authors of each chapter of the report are:

 

     Chapter One         Catherine Hines

     Chapter Two         Lawrence Corder

     Chapter Three       Bruce Taylor

     Chapter Four        Daniel Kasprzyk and Lawrence Ernst

     Chapter Five        Barry V. Bye

 

     The Subcommittee thanks also the following persons who were

responsible for preparing the Case Studies that appear in the

Appendix: Edith McArthur (SIPP), Curtis Jacobs (CPI), Steve Kaufman

(ECI), Dennis Carroll (NLS-72, HS&B;), Catherine Hines (NLS), Barry

V. Bye (RHS, WIE), Stephen B. Cohen (NMCES), Robert Casady

(NMCUES), James L. Monahan (LED), John DiPaolo, Robert Wilson, and

Peter J. Sailer (SOI).

 

     Catherine Hines edited the report.  Joanne Watson (Bureau of

the Census) prepared each of the drafts, and the Subcommittee

thanks her for her patience and accuracy.

 

 

iii

 

 

 

                     GLOSSARY OF ABBREVIATIONS

 

 

AHS       American Housing Survey (Formerly Annual Housing Survey)

 

CPI       Consumer Price Index

 

CPS       Current Population Survey

 

ECI       Employment Cost Index

 

HCFA      Health Care Financing Administration

 

HS&B;      Longitudinal Survey of High School and Beyond

 

ISDP      Income Survey Development Program

 

ISR       Institute for Social Research (University of Michigan)

 

NCES      National Center for Education Statistics

 

NCHS      National Center for Health Statistics

 

NCS       National Crime Survey

 

NLS       National Longitudinal Surveys of Labor Market Experience

 

NLS-72    National Longitudinal Study of the High School Class of

          1972

 

NMCES     National Medical Care Expenditure Survey

 

NMCUES    National Medical Care Utilization and Expenditure Survey

 

OSIRIS    Statistical Analysis software, Survey Research Center, U.

          Michigan

 

PSID      Panel Survey on Income Dynamics

 

RAMIS     Data base management system, Mathematical Research Inc.,

          Princeton, N.J.

 

RAPID     Data base management system, Statistics Canada, Ottawa

 

RHS       Retirement History Study

 

SAS       Data base management system, SAS Institute, Cary, N.C.

 

SSA       Social Security Administration

 

SIPP      Survey of Income and Program Participation

 

SIR       Data base management system, SIR, Inc., Evanston, IL

 

 

SOL       Statistics of Income Program, IRS

 

WIE       Work Incentive Experiment, SSA

 

 

iv

 

 

 

                         TABLE OF CONTENTS

 

                                                               Page

GLOSSARY OF ABBREVIATIONS                                        vi

 

INTRODUCTION                                                      1

 

Chapter I:     The Goals of Longitudinal Research                 5

 

Chapter   II:  Managing Longitudinal Surveys                     11

 

Chapter   III: Longitudinal Survey, Operations                   19

 

Chapter   IV:  Sample Design and Estimation                      35

 

Chapter   V:   Longitudinal Data Analysis                        49

 

Chapter   VI:  Summary and Conclusions                           63

 

APPENDIX:

 

Case  Study 1  Survey of Income and Program Participation        67

 

Case  Study 2  Consumer Price Index                              75

 

Case  Study 3  Employment Cost Index                             89

 

Case  Study 4  National Longitudinal Study of the High School    97

               Class of 1972

 

Case  Study 5  High School and Beyond                           101

 

Case  Study 6  National Longitudinal Surveys of Labor Market    105

               Experience

 

Case Study 7   Social Security Administration's Retirement      111

               History Study

 

Case Study 8   Social Security Administration's Disability      115

               Program Work Incentive Experiments

 

Case Study 9   National Medical Care Expenditures Survey        123

 

Case  Study 10 National Medical Care Utilization and Expendi    127

               tures Survey

 

Case  Study 11 Longitudinal Establishment Data File             137

 

Case  Study 12 Statistics of Income Data Program                147

 

REFERENCES                                                      153

 

 

 

 

 

                                                       INTRODUCTION

 

     Since the 1960's, the Federal government has sponsored an

increasing number of longitudinal surveys as vehicles for research

on administrative and policy issues.  The goal of the Federal

Committee on Statistical Methodology's subcommittee on Federal

Longitudinal Surveys is to identify the strengths and limitations

of longitudinal surveys, and to propose some guidelines for using

them most effectively.

 

     Beginning its work, the subcommittee found that there were

multiple definitions of a longitudinal survey, so our first task

was to define what this report would mean by the term.  The

difficulty arises because there are two facets to the definition,

design and analysis.  To be absolutely clear, one must distinguish

between a longitudinally designed survey and a survey with

longitudinal analysis.  We have elected to put these components

together in our definition.  The distinguishing features of a

longitudinal survey are:

     -    repeated data collection for a sample of observational

          units over time;

     -    the linkage of data records for different time periods to

          create a longitudinal record for each observational unit;

          and

     -    the analysis is based on the longitudinal microdata and

          refers to data collected over time.

 

The essential feature is that, from the beginning, there is a plan

to elicit data from the future for each observational unit.

 

     This definition excludes some surveys with longitudinal

elements, such as the Current Population Survey (CPS).  The Survey

of Income and Program Participation (SIPP) is included here as a

longitudinal survey, although there are as yet no longitudinal

analyses of SIPP.  Federal agencies also conduct surveys of

establishments that have longitudinal elements but these are not

yet true longitudinal surveys either.  There is an effort to create

a longitudinal file for manufacturing firms at the Bureau of the

Census.  We included this program as a case study in this report

because, although it does not meet our definition, it may be of

interest to readers.  Similarly, Federal agencies maintain

longitudinal files of administrative records that do not meet our

definition.  Yet they may be used in ways that are similar to the

analysis of longitudinal surveys, so we have included an example,

the Statistics of Income Data Program, as a case study.

 

1

 

 

 

     Rotating panel surveys* are often described as longitudinal

surveys.  They are not, but they may share many sampling,

estimation, and analysis characteristics with longitudinal surveys. 

In addition, there is a tendency for ongoing rotating panel surveys

to be changed to make longitudinal analysis possible.  The National

Crime Survey (NCS) is currently considering such a transition, and

one possible result of the current redesign activities will be to

create a longitudinal NCS data file if the cost is not prohibitive. 

There is interest in moving in the same direction with both CPS and

the American Housing Survey (AHS, formerly the Annual Housing

Survey).  We should anticipate that eventually more rotating panel

surveys will be modified, or designed from the beginning, to make

longitudinal analysis possible.  At this time, however, many

rotating panels lack longitudinal data files, and many longitudinal

surveys are designed without rotating panels.

 

     The subcommittee members examined in detail 12 recent

longitudinal surveys sponsored by the Federal Government, as

examples and illustrations.  These are: (1) the Survey of Income

and Program Participation (SIPP); (2) the Consumer Price Index

(CPI); (3) the Employment Cost Index Survey (ECI); (4) the National

Longitudinal Study of the High School Class of 1972 (NLS-72); (5)

High School and Beyond (HS-B); (6) The National Longitudinal

Surveys of Labor Market Experience (NLS); (7) the Social Security

Administration's Retirement History Survey (RHS); (8) The Social

Security Administration's Disability Program Work Incentive

Experiments (WIE); (9) The National Medical Care Expenditure Survey

(NMCES); (10) the National Medical Care Utilization and Expenditure

Survey (NMCUES); (11) the Longitudinal Establishment Data File; and

(12) the Statistics of Income Data Program (SOI).  The surveys

chosen for case study treatment were selected to represent a

variety of sponsors, research questions and kinds of respondents. 

Each of the 12 case studies is described in the Appendix,,and they

are frequently cited to illustrate important points throughout the

text.

 

     We hope that the chapters of the text and the case studies in

the Appendix will convince readers of four points that emerged from

the subcommittee's review of longitudinal surveys.  First,

longitudinal survey designs are appropriate, and even required, for

certain kinds of research.  These include, but are not limited to,

such topics as gross change, the causes of change, or the role of

attitudes in change.  However, many longitudinal surveys have not

made full use of their longitudinal design in the analysis.

 

     Second, longitudinal survey design, operation, and analysis

techniques are still evolving.  There are a number of important

design issues that are not yet explored or understood.  An example

is the optimal length of time between interviews, and the number of

interviews to conduct to achieve research objectives.  To some

extent the variations in survey design

 

 

 

___________________________

 

* A panel is a sample of persons selected to participate at a

particular point in the longitudinal sequence.  In a rotating panel

survey the sample units have a fixed duration.  As they leave the

sample, they are replaced by new units which are introduced at

specific points in time.

 

2

 

 

 

reflect the wide and legitimate differences between the research

goals that each survey was designed to accomplish.  This does not

explain, however, all the existing variation in methods . Decisions

about sample design and attrition, about selecting the best

respondent or analytical units, about the best estimation,

imputation or weighting schemes, or about the impact of varying

personal, mail or telephone interviews over the course of a

longitudinal survey, have not always been consistent.

 

     Third, the important question of the costs of longitudinal

surveys compared to cross-sectional surveys has yet to be answered. 

There are conflicting reports about the relative costs of the two

types of survey.  Costs are usually cited as higher for

longitudinal surveys, but the costs being reported are confined to

data collection costs and processing costs.  This does not compare

the full range of survey costs including quality costs, costs of

analysis, and other such elements which could, in the long run,

change the picture of the relative costs.

 

     The fourth and final point that emerged from the

subcommittee's review was that the surest method for learning

answers to design, operational, and analysis issues is to build an

evaluation component into a longitudinal survey.  By this means a

record of comparative performance is created which benefits others. 

The case studies presented in this report, in particular, show how

progress occurs when evaluation is built into survey operations,

and how forethought and planning, far more than additional expense,

are needed to increase our knowledge about longitudinal survey

design.

 

     This report is presented in 6 chapters.  The first chapter is

a review of the kind of research question for which a longitudinal

approach is appropriate, illustrated with examples.  The second and

third chapters describe some of the problems encountered in

planning and managing longitudinal surveys.  Chapter four discusses

problems related to sample design and analytical units in

longitudinal surveys, and special problems of estimation and

weighting.  Chapter five describes and evaluates major approaches

to the analysis of longitudinal surveys.  The final chapter, number

six, summarizes some issues the subcommittee members recognized as

important, and outlines the need for building an evaluation

component into prospective longitudinal surveys; both to answer

questions about the quality of data derived from each survey and to

answer questions about optimal design for future longitudinal

surveys.

 

3

 

 

 

 

                                                          CHAPTER 1

 

                                 THE GOALS OF LONGITUDINAL RESEARCH

 

     There are at least five distinctive advantages to using a

longitudinal survey rather than a cross-sectional survey  some of

these advantages are shared by rotating panel surveys.

 

     1.   A longitudinal sample reduces sampling variability in

          estimates of change.  This is an advantage shared with

          rotating panel surveys such as CPS and NCS.

 

     2.   A matched longitudinal file provides a measure of

          individual gross change for each sample unit.  This is an

          advantage shared to some extent by rotating panels, which

          can provide a measure of gross change, but not usually on

          an individual basis.

 

     3.   Longitudinal survey interviews usually have a shorter,

          bounded reference period that reduces recall bias in

          comparison to a retrospective interview with a long

          reference period.  Rotating panels such as CPS and NCS

          also share this advantage.  Longitudinal surveys with

          long intervals between interviews may lose this

          advantage.

 

     4.   Longitudinal data are collected in a time sequence that

          clarifies the direction as well as the magnitude of

          change among variables.

 

     5.   Longitudinal interviews reduce the respondent burden

          involved in creating a record that contains many

          variables.  A single interview could not collect

          comparable detail without excessive respondent burden and

          fatigue.  In addition, the quantity of data collected in

          a longitudinal survey is usually greater than that from

          several cross-sectional surveys because of the

          correlational structure of longitudinal data.

 

     There are also some distinct disadvantages to longitudinal

surveys. Some of these are:.

 

     1.   The analysis of longitudinal surveys is dependent on the

          assembly of the microrecord data.  The full advantage of

          compiling a detailed longitudinal record with many

          variables may not be available until years after the

          start of data collection.

 

     2.   Beginning refusal rates may be comparable to those of

          cross-sectional surveys, but the attrition suffered over

          time may create serious biases in the analysis.

 

 

Principal Author: Catherine Hines

 

 

5

 

 

 

     3.   A longitudinal survey, including several data

          collections, is more costly than a single retrospective

          cross-sectional survey.  A longitudinal survey may be

          less costly than a series of cross-sectional surveys.  It

          is speculative whether a longitudinal survey is more

          costly than a rotating panel survey.

 

     4.   The estimates of gross change derived from longitudinal

          surveys tend to be inflated over time by simple response

          variance, The combined or net effect of such influences

          as simple response variance, response bias and time-in-

          sample bias effect on longitudinal estimates of gross

          change are still poorly measured.

 

     5.   Longitudinal surveys are often improperly analyzed, not

          taking into account longitudinal characteristics or

          attrition.

 

     For some research goals, the advantages clearly outweigh the

disadvantages.  For other research goals this may not be the case. 

Research goals that demand longitudinal surveys are described in

this chapter.

 

A.   Measuring Change

 

     Both cross-sectional and longitudinal surveys can be used to

measure change.  The National monthly estimate of unemployment

based on the CPS is always compared to the estimate for the

previous month or the same month a year ago.  Estimates of such

things as crime victimizations, retail sales, housing starts, or

health conditions are all compared to estimates from a previous

time period.  None of these data are currently based on

longitudinal surveys.

 

     Which measures of change need a longitudinal file structure?

One example is the components of individual change.  These are

measures of gross change for the observational units between points

in time.* Longitudinal data are frequently displayed in a time-

referenced table, showing the characteristics, attitudes, or

beliefs of the sample at time 1; cross-tabulated by the same

characteristics, attitudes, or beliefs at time 2. Another example

is the average change for an observational unit.  As pointed out by

Duncan and Kalton (1985), if data are available for several time

points for each observational unit, then a measure of average

change or trend can be estimated.  Finally, a longitudinal design

permits the measurement of stability or lack of stability for each

observational unit.

 

     Measures of gross change are of interest in several of the

case studies described in this report.  Respondents are followed

through employment and unemployment (NLS), training and the labor

force (NLS-72, HS&B;), into and out, of poverty (SIPP), or between

health, treatment, and disability (NMCES, NMCUES, RHS, WIE).  The

focus is sometimes on movement across an arbitrary threshold (such

as poverty, defined by household composition and income), and

sometimes on a continuous measure.

 

     The observation periods in a longitudinal survey are commonly

called waves.  A wave describes one complete cycle of interviewing,

from sampling to data collection, regardless of its duration.

 

6

 

 

 

 

 

     In independent (i.e., cross-sectional) samples, sub-

populations with very different gross-change patterns are

indistinguishable if the sum of the changes is similar.  This has

been important to studies of employment.  The NLS, for example, can

distinguish a hypothetical population where 15% of the people are

never employed, from a population where at each interview a

different 15 % respondents report unemployment.  A cross-sectional

survey could not make the same distinction, which is vital to the

development of intervention policies.  Another example can be cited

from the field of social indicators research.  A series of

variables, measured longitudinally, can be used to construct models

for estimation to examine change over time with great elegance.

(See Land, 1971, 1975.)

 

     Young adults in the years after full-time school are frequent

longitudinal survey subjects (NLS Youth Cohorts, NLS-72, HS&B;)

because individuals in these years are known to pass between

statuses (employment and unemployment, school and training

programs, in and out-of the armed services, between households)

rapidly and irregularly.  Cross-sectional studies would miss all

the individual reversals and repetitive change.  To develop

detailed models of the causes of change in these fluid populations,

longitudinal measures are needed to capture the record of

individual and gross change.

 

     For example, cross-sectional studies of college enrollments

have generally found relatively high stability over a number of

years, whereas analysis of NLS-72 data identified frequent

individual change occurring at a stable rate.  A substantial

percentage of the college students surveyed exhibited erratic

enrollment patterns characterized by dropping out or transferring

between 4-year and 2-year colleges.  In light of these findings,

student financial assistance (grants and loans) have changed. 

Legislation has shifted aid to channel the funds directly to the

students, who choose the college they wish to attend -- rather than

channelling the funds to college officials, who decide how the

funds are doled out to enrolled students.

 

     Studying the relationship between attitudes and behavioral

change poses particularly difficult problems in research design. 

The problems inherent in determining which variable in a pair

changes first are present, and they are exacerbated by the problems

encountered in surveys of subjective phenomena, such as attitudes. 

Using retrospective questions to ask respondents to reconstruct

thoughts or feelings as they existed in the past has proved

unreliable.

 

     Prospective longitudinal surveys provide the most reliable

data on change in knowledge or attitudes, because longitudinal

measures are collected while the subjective states actually exist. 

This appears to reduce the bias frequently caused by suppression or

distortion of respondent recall.  In addition, unlike retrospective

measures of attitudes, contemporary measures can sometimes be

probed or even verified.

 

     The longitudinal surveys of high school students (NLS-72 and

HS&B;) demonstrate the method's power to collect data on changing

subjective states, and to study causation.  These surveys have

measured attitudes and expectations about employment, and

subsequent employment experiences and behavior.  The data, which

could not have been collected cross-sectionally, can be analyzed to

understand the formation of attitudes, as well as to evaluate the

effects that attitudes have on subsequent behavior.

 

                                                                  7

 

 

 

 

 

When the research goal is to measure a component of individual

change, longitudinal surveys have strong advantages.  They are the

only method available to collect data on a recent occurrence basis

over a long period of time.  Although a retrospective cross-

sectional survey could be used to attempt the same thing, the

recall bias may be a strong force against this decision.  The bias

from the attrition in a longitudinal survey has to be balanced

against the bias or lack of information in a retrospective cross-

sectional survey.  The bias from attrition is usually preferred.

 

     Price and wage changes are measured in longitudinal surveys

(i.e., the CPI and ECI) because the longitudinal sample design

holds other variables constant.  The assumption can be made that

whatever unknown sampling bias exists in later waves was also

present in earlier waves, and can be dismissed as a possible source

of the changes being measured.

 

 

B.   Assembling Detailed Individual Records

 

     Longitudinal surveys generally provide researchers with more

detailed records for each individual than is practicable through a

cross-sectional design.  In a longitudinal design, an extremely

detailed record can be accumulated for each subject without making

any single observation period (i.e., interview or wave) excessively

burdensome.  By 1982, for example, records for the original

respondents in the NLS contained up to 1,000 data items for each

sample case.  To create a record of comparable detail complexity

would have required a one-time questionnaire of extraordinary

length.  In addition, responses referring to earlier time periods

would have been reconstructed from memory, reducing their

reliability.  In many instances, researchers are looking for cause-

and-effect relationships that are more likely to be accurate if the

data are compiled on a current rather than retrospective basis.

 

 

C.   Collecting Data That is Hard to Recall

 

Some surveys ask questions that respondents have difficulty in

answering precisely or objectively after much time has passed. 

These include questions that call for the kind of detail that

people seldom recall clearly (such as complete records of

expenditures, or health treatments), and questions that refer to

events that respondents tend to telescope, embellish or suppress in

their memories after time has passed (such as crime victimization,

health problems, or visits to the doctor).

 

     Questions such as these have been used successfully in

longitudinal surveys, in which the previous interview provides a

clear marker to bound respondent recall, and which are constructed

with short reference periods between interviews.  For example, the

Consumer Expenditure Survey, conducted as part of the CPI program,

collects detailed records of household spending patterns through

longitudinal interviews. (See Case Study no. 2 in the appendix.)

 

     A longitudinal survey with relatively short reference periods

is one of the best methods for producing aggregated data for a

longer time period, such as a year.  For example, the primary goal

of the NMCES and,NMCUES programs

 

8

 

 

 

 

was to develop estimates of medical expenditures for a calendar

year.  This was accomplished by obtaining medical expenditure data

every 3 months and Compiling an annual total.  A similar example is

the new continuing Consumer Expenditure Survey, which covers all

consumer expenditures.  The SIPP program employs a similar design,

using interviews at 4 month intervals to produce annual aggregates. 

The relatively short, bounded reference periods for these

longitudinal surveys improve reporting by eliciting events closer

to the time they occur.  This increases the completeness of

aggregated estimates and reduces error.

 

D.   Modelling Studies and Pilot Programs

 

     The detailed case histories built up in longitudinal surveys

are important in analyzing the impact of alternative policies or

intervention strategies.  The complex individual case records

accumulated in a longitudinal panel survey provide a microcosm in

which the impact of changes can be simulated.  Questions can be

answered about the probable impact of changing a program's

eligibility criteria, for example, or about the benefits which

specified classes of respondents might anticipate under,various

program changes.  Intervention programs can be evaluated through

longitudinal surveys to Study their effect on respondents with

known characteristics.  A sufficiently detailed record makes it

possible to simulate alternative interventions, and predict a range

of effects. (See Case Study 9 on the WIE, for example.)

 

     In some cases longitudinal surveys, pilot intervention

programs and Federal policy experiments evolved together in the

1960's.  Several longitudinal surveys authorized as components of

pilot or experimental intervention programs to measure program

effects and ensure that decision-making information would be

available when it was needed.  Longitudinal data collection

components were built into pilot income maintenance programs, for

example, administered temporarily in cities in New Jersey, Indiana,

Colorado and Washington State.

 

     In conclusion, tho points about the periodicity of

longitudinal research should be stressed.  First, longitudinal data

are never available immediately; any data that are based on the

sequence of measures over time cannot be fully extracted until the

final measures are collected.  If information is needed at once,

another research design has to be used which incorporates some

alternative to a true longitudinal approach; such as retrospective

measures, or the use of administrative records.  Even if the

quality of data from a longitudinal survey would be clearly

superior, that would be irrelevant if the schedule outweighs these

other considerations.

 

     Second, longitudinal data can be used cross-sectionally to

provide immediate data as long as the research focus is not

specifically on changing measures over time.  Each wave of a

longitudinal survey can also be analyzed as a cross-sectional

survey.  Thus some data can always be made available immediately. 

Record data from non-going longitudinal surveys can be analyzed

quickly from a cross-sectional perspective to serve certain

analytical purposes without delay.  It is also possible to add

questions to the current waves of a longitudinal survey to meet

immediate data needs, using an existing longitudinal sample and

base-line demographic data for maximum efficiency.  In these ways a

longitudinal design adds analytical strengths without sacrificing

the potential for cross-sectional research.

 

                                                                  9

 

 

 

 

                                                          CHAPTER 2

 

                                      MANAGING LONGITUDINAL SURVEYS

 

 

As described in the previous chapter, prospective longitudinal

surveys have proved to be an important research approach, but

certain limitations have also emerged that must be considered when

these surveys are planned.  The problems related to staff and

management of longitudinal research differ in kind as well as

degree from those encountered in cross-sectional research.

 

     The core of the problem in managing a longitudinal survey is a

conflict between the need for long-term and for short-term

resources.  Plans and funding must be stable over many years, but

the need for staff rises and falls over the course of a

longitudinal survey.  Most organizations sponsoring longitudinal

surveys have solved the dilemma through some combination of

permanent and temporary staff.  Fluctuations in resources are less

pronounced in longitudinal surveys that employ non-going rotating

panels (such as SIPP or, to some extent, the CPI) than they are in

fixed panel surveys in which interviews are conducted at longer

intervals (such as NLS, NLS-72, or HS&B;).

 

     The major difficulty faced in planning and managing a

longitudinal survey is in maintaining a core group dedicated to the

project, and maintaining consensus between this group and senior

agency staff.  These groups tend to view long-term commitment of

Staff and resources in different ways.  The schedule, funding, and

staff needs of a longitudinal survey are viewed differently by

survey designers, by agency directors, and by those responsible for

operations.  It is a constant challenge to generate commitment to a

long-term goal such as analysis of data, when senior staff with

direct authority over the project often changes before the survey

is completed.

 

A.   The Need for Long-Range Planning

 

     The need for long-range planning and organization for a

longitudinal survey should be brought to the attention of senior

staff very early with a planning document that outlines the

workload, survey tasks, and anticipated products over time.  The

planning document should be prepared in conjunction with an

analysis plan, and the design of the instruments and procedures

will then follow once all groups are in agreement with the planning

document.

 

     Long range planning is vitally important to a longitudinal

survey, because it promotes enduring support at a senior agency

level, it widens the pool of sponsors and supporters; and it begins

the process of documentation that ensure continuity of operations.

 

Principal Author: Lawrence Corder

 

11

 

 

 

A large-scale longitudinal Federal survey generally has at least

nine principal management phases which may be briefly described as

follows:

 

     1.   Budget Planning.  Up to five years before data collection

          is to begin, a general plan must be conceived and

          provisions made to obtain continuing staff and funding

          resources throughout the longitudinal project.

 

     2.   Development of Position Papers.  These are draft planning

          documents which discuss options, costs, and yields

          associated with various sampling plans, data collection

          designs, or questionnaires.  These ensure widespread and

          enduring support for the longitudinal research.

 

     3.   Procuring outside assistance.  If a contract is to be

          awarded, requests for proposals must be prepared, cleared

          and advertised, and responses must be evaluated before a

          contract is signed.  This is a common approach to

          levelling out resource needs.

 

     4.   Final Research Plans.  This stage includes final OMB

          clearance, conduct of field tests, revisions as

          necessary, and detailed agreements with any other

          cooperating agencies.

 

     5.   Data Collection.  This refers to the full-scale field

          data collection.  Longitudinal surveys (such as NLS)

          which have been extended beyond the original research

          period have repeated these 5 stages independently several

          times.

 

     6 .  File Preparation.  Development of the system for data

          entry, data base design, processing, etc., may also

          require systems for optical scanning of questionnaires,

          machine/or manual edit steps, preparation of code books,

          the construction of composite variables, plans to

          preserve privacy in public data files, and numerous other

          activities.  Each operation must be fully documented, to

          ensure comparability between waves.

 

     7.   Planning the Analysis.  While the overall goals oft he

          analysis must be planned in the early stages, some

          details cannot be finalized until the data are available

          on computer files and code books are completed.  Also, as

          policies shift, new analytical priorities must be met. 

          In all cases, this process requires plans which may

          include in-house analyses and contracts for analyses. 

          Contracts require a repetition of the procurement process

          described in phase 3.

 

     8.   Conduct of Analyses.  These may go on for several years. 

          Cross-sectional, analyses can be conducted as soon as one

          wave of interviews has taken place.  Longitudinal

          analyses take place after some or all other waves are

          completed.

 

     9.   Publications.  With in-house and professional peer

          reviews, these may continue for several years.

 

12

 

 

 

     Each phase requires substantial time to complete, contains

specific activities and results in the preparation of key

documents.  The final products of any longitudinal surveys are

usually public-use data files and reports.* Ideally, these should

be supplemented by rapid preparation of in-house documents as part

of the policy-making process.  Schedule milestones and due dates

are part of any longitudinal survey, and the ultimate success of

the project and even the usefulness of the analytical results may

be judged against their timeliness.

 

     It is not unusual for a longitudinal survey to consume a

decade or more from inception to completion of the publication

plan.  The NMCES and NMCUES Studies, for example, both took 8 to 10

years to complete.  While field operations and the period for

analysis vary with each survey's objectives and resources, the

successful pre-field period is probably very similar in each case. 

The planning period should be dedicated to achieving consensus

internally, then to producing instruments and obtaining clearances

and approvals (for contracts as well as for questionnaires).  A

typical schedule for completing pre-field activities alone

(excluding budget planning) would frequently require 12 to 18

months.

 

     Some of the most severe criticisms of longitudinal surveys

have resulted from insufficient planning.  It is not uncommon, for

example, to omit thorough planning of the analysis.  Then, at a

production stage, it is discovered that people have different ideas

on the tables and data to be produced and analyzed.  It is also

necessary to plan the linked files carefully so that the data

needed for longitudinal analyses are readily available. 

Unfortunately, the planning of budgets and field work often takes

precedence over the planning of processing and analysis, sometimes

leading to delays, acrimony, and sometimes shifts in support.

 

B.   Funding Longitudinal Research

 

The actual unit costs of doing longitudinal surveys may be no

higher than for a series of cross-sectional surveys of comparable

size and complexity (Wall & Williams:30).  There is conflicting

evidence on comparable costs, probably reflecting non-standard cost

reporting on survey operations.  Funds, however, must be committed

over a number of fiscal years and budget plans are not easily

altered.  There is a trade-off to be made when errors are

discovered or improvements can be implemented.  Additional costs

must be carefully considered, as well as the effect of changes in

methodology on the longitudinal analysis.  Errors, of course,

should be corrected or, if too costly, an indication of their

effects provided.  Changes in methodology are different from

changes necessitated by errors and must be thoroughly explored. 

Provision should be made to share information with analysts and

data users on real change vs. methodologically-induced change. (The

change to computer assisted telephone interviewing is one such

change that needs careful exploration.) If errors or methodological

changes result in higher costs, alternative methods of meeting

those costs should be considered: higher funding, smaller sample

size, more time between interviews, delayed processing, and so

forth.

 

     Surveys of business or industrial establishments are often an

exception to this rule, to protect the identity of large firms that

dominate certain samples.

 

                                                                 13

 

 

 

 

 

     Inter-agency cooperation can help meet long-term funding

needs.  The Health Care Financing Agency (HCFA) and the National

Center for Health Statistics (NCHS) chose this approach in

conducting NMCUES.  Inter-agency agreements frequently involve the

Census Bureau for data collection and analysis, but they may also

be used between other agencies with related research goals.  Inter-

agency Cooperation in longitudinal surveys could take the form of

joint sponsorship of a new longitudinal survey, or it could be in

the form of using an existing longitudinal sample as a vehicle for

research to save the cost of starting a new longitudinal survey.

 

     The NLS-72 provides an example of a consortium approach: For

the fifth follow-up interview in NLS-72, the National Science

Foundation appended questions on math and science teachers, and the

National Institute on Child Health and Human Development joined

with the National Center for Education Statistics (NCES) to fund

questions on child care and early childhood education issues. 

Longitudinal surveys are generally long term projects with

significant start-up costs.  If a survey can he constructed to

serve more than one agency through an inter-agency agreement,

start-up costs may be shared and several agencies will be bound to

multiple-year funding commitments.

 

     When agencies select outside contractors to conduct

longitudinal research, competitive procurement is required.  The

decision to use a contractor to conduct a survey increases the time

needed to start a project, because approval of contracting plans

must be added to other planning tasks.  One advantage of

contracting out the survey work is that it gives an agency access

to additional staff support in cases where the agency has no

authority to add permanent staff.

 

     Contracting for data collection by an outside agency may or

may not be more expensive than employing a government organization

for this purpose.  In comparing costs, NCES found that the first

NLS-72 follow-up, conducted by the Census Bureau, cost slightly

more than the second follow-up, conducted by Research Triangle

Institute (RTI), despite inflation.  Other longitudinal surveys,

including NMCES and NMCUES, have had just the opposite experience. 

The most cost-effective mode of operation appears to depend on the

kind of survey,  not on the agency conducting it.

 

     The duration of longitudinal surveys often requires periodic

recompetition once a competitive award has been made.  As a result,

agencies have found themselves switching contractors part way

through the data collection phase of a longitudinal survey.  The

competitive award of each data collection wave can, however, help

control overall survey costs, because it provides contractors with

an incentive to hold down their costs.

 

     The possibility of changing contractors over the life of a

longitudinal survey requires a detailed documentation of methods

that goes far beyond what is needed for any one-time survey.  This

level of documentation was not anticipated when the original

contract to collect data for NLS-72 passed from the Educational

Testing Service to RTI, and the change in contractors caused

difficulties.  Based on this experience, NCES now

 

14

 

 

 

 

 

builds a sub-contract to the previous contractor into any

subsequent data collection awards.  As a result, a later transfer

of the NLS-72 contract from RTI to NORC was accomplished without

problems.

 

C.   Staff Needs

 

     Staffing requirements for a longitudinal survey typically vary

substantially, both by number and by type of staff throughout the

history of the project.  Staffing is much more controlled in

rotating sample surveys, whether they are longitudinal or cross-

sectional.  Funding and staff needs for a longitudinal survey are

much greater during the data collection period than during any

other phase.  However, some of the types of people needed for data

collection, such as interviewers, are not needed in later phases. 

Staff monitors for field work and data processing are in high

demand at early stages as well as intermediate stages.  Because of

sporadic needs, the use of a core group of survey professionals in

combination with temporary staff, or interagency agreements or

outside contracts, can be the best method to ensure adequate

staffing for the entire effort.

 

     To distribute the costs of a contract more evenly over a

longitudinal survey, NCES and NCHSR have used incrementally-funded

contracts.  During the longitudinal survey, separate contracts are

awarded for each phase or wave.  Each contract extends over two or

more years.  At any point, some survey tasks are being advertised

for competition while others are being completed under contract. 

Looked at from the standpoint of each fiscal year, the total costs

and level of effort remain more nearly constant.  NCES has also

found that giving agency survey analysts the responsibility for

monitoring contract performance will help control variations in

staffing patterns.

 

     By employing temporary peripheral groups in addition to

permanent staff groups, two problems are solved: Research staff

needs are met without adding permanent personnel to an agency; and

peak workload needs are met without jeopardizing tight survey

schedules.  Inter-agency agreements or contracts not only bind

parties to a specified set of research goals, but they also permit

the level of staff effort to rise and fall as needed.

 

D.   Maintaining Core Staff

 

     The duration of longitudinal research projects creates another

management problem (which has been called a Methuselah effect by

Herbert Parnes).  Each phase of a longitudinal study, such as

planning, data collection, or analysis, is frequently carried out

by different individuals, who may not even be part of the same

organization.  The relative inflexibility of a longitudinal study

plan is an analytical necessity, but it could also prevent interim

analysis or refinements in the design.  For these reasons, it has

been suggested that non-going longitudinal surveys may hold little

interest for the calibre of professional staff that is needed for

management or analysis (Wall & Williams: 35).

 

     NCES, however, has successfully attracted talented analysts to

manage the agency's longitudinal surveys.  To some extent this may

be because NCES ensures that the Agency's staff have challenging

responsibilities for program

 

 

                                                                 15

 

 

analysis.  Agencies which see only data collection as their primary

mission may be more apt to encounter the staff problems recognized

by Wall and Williams. in order to allow mid-course corrections and

modifications of the survey plan, NCES uses a multi-phase sampling

design (as in HS+B).  This, too, contributes to the flexibility of

the NCES longitudinal survey program.

 

E.   Data Collection and Processing Schedules

 

Longitudinal surveys have become notorious for developing serious

backlogs because data collection takes precedence over all other

tasks.  The schedule for observations is usually the least flexible

aspect of the design, because each subject must have an identical

record structure.  As data collection continues, it creates an

ever-growing backlog of other procedures, such as analysis. 

Uncompleted tasks tend to accumulate, becoming increasingly

difficult to finish.  To prevent backlogs and delays, a

longitudinal survey must be well-organized and planned so that

analysis and data release keep pace with data collection.

 

     Data collection schedules are not the only factor in backlogs. 

Another factor is data processing, including file linkage.  Survey

organizations that are more accustomed to doing cross-sectional

surveys or other non-longitudinal surveys often have difficulty

recognizing the special processing needs of longitudinal surveys. 

Databases need specification, key variables,need identification,

and a policy on imputation needs to be thought through.  Ideally,

all this needs to be done when the survey questionnaire is

designed, but this ideal is seldom, if ever, met.

 

F.   Data Analysis

 

Data analysis is often looked on as the rewarding part of the job

after the difficulties of data collection and data processing. 

Analytical interests often go beyond the agency conducting the

study.  Some agencies include analysis contracts in their

contracting for services.  Usually some analysis is done by agency

personnel.

     One possibility to counter some,of the delay caused by the

time it takes to complete a longitudinal survey is to analyze each

wave as if it were from a cross-sectional survey.  This not only

provides timely data, but raises questions to be answered at later

stages, and generally whets the appetite for more data and more

analysis.  Recent data from non-going longitudinal programs can be

analyzed relatively quickly to serve some analytical purposes

without delay.  It is also possible to add questions to the current

data collections of a longitudinal survey to meet immediate data

needs.

 

G.   Release of Data

 

     A principal goal of any longitudinal survey should be to

produce public use data tapes and analytical reports rapidly, both

for policy-makers and the interested public.  If public use files

are to be created, then procedures to

 

16

 

 

 

protect confidentiality must be worked out in advance, File

structure and documentation need to be readily available.  Variance

estimation must be provided for those using the file.  The

permanent survey staff should maintain a role in the preparation of

files and reports, so that their expertise and interest are not

lost.

 

     In conclusion, longitudinal surveys, sometimes taking 5 years

or more to complete, inevitably encounter staff changes.  Two

management approaches can minimize the loss of institutional

memory.  First, it is vital that every survey activity be

documented.  Interview instructions, edit specifications, variable

definitions, file layouts, sampling, weighting and imputation

methodologies, all instruments and procedures should be recorded

and readily available.  This task is very labor-intensive and,

unfortunately, apt to be slighted when staff time is short. 

Second, inter-agency agreements or contracts may clearly lay out

both the procedures to be used and the final products.  It is also

wise to specify key contractor staff persons who cannot be replaced

without sponsor approval.  These actions are important to minimize

the effect of staff changes and to prevent errors and delays.

 

                                                                 17

 

 

 

 

18

 

 

                                                          CHAPTER 3

 

                                     LONGITUDINAL SURVEY OPERATIONS

 

 

     The principal differences between field and processing

operations in one-time surveys and in longitudinal surveys are

created by the use of time as a significant factor in research. 

Longitudinal surveys typically encounter changing conditions, and

survey designers have developed and evaluated a variety of methods

for controlling the problems that can be caused by change in the

sample or changes in the design or administration of the survey.

 

A. Sample change over time

 

     The composition of the sample may be expected to change across

waves for a variety of reasons.  Respondents may refuse to

participate, they may die, they may move and cannot be found, or

they may leave the sampling frame (e.g., by entering an

institutional population or by moving abroad).  The danger is that

the sample becomes increasingly less representative of the target

population as time passes.  To minimize the effects of these

problems, new observational units are routinely introduced into the

samples of some continuing surveys as time passes.

 

     1.   Selection of new units into sample

 

     For some longitudinal surveys, they are a number of concerns

related to the length of time respondents are kept in sample. 

Respondent burden across several interviews may produce a decline

in the quality of data gathered or may result in increasing refusal

rates.  Respondents may also leave the sampling frame, move and

cannot be tracked, or die, thereby affecting the representativeness

of the sample.  for these reasons, it may be desirable to institute

a rotating panel design, which regularly moves new respondents into

the sample and retires other respondents after a fixed number of

interviews or period of time.

 

     The Survey of Income and Program Participation (SIPP), the

National Crime Survey (NCS), the new Consumer Expenditure Survey

(CE), and the Consumer Price Index (CPI) have all adopted rotating

panels. SIPP introduces new respondents annually and retains them

for 2-« years (7 or 8 interviews) before rotating them out; NCS

introduces new respondents monthly and interviews them for 3-«

years (7 interviews).  The CE Survey introduces respondents monthly

and interviews them five times on a quarterly basis, while the CPI

introduces new respondents once every five years and interviews

monthly or bimonthly.

 

     Fienberg and Tanur (1983) note that rotating panel designs may

create some problems of inference, according to conventional sample

survey theory, in that random selections of respondents occur at

different times for different respondents.  The argue, however,

that this is only important when date of selection is related to

temporal changes in the phenomena the survey was designed to

measure.  The inferential

 

Principal Author: Bruce Taylor                                   19

 

 

 

difficulties which might result from a rotating panel design must

be balanced against the reduction of attrition-related bias, which

is the alternative.

 

     2.   Movers

 

     Some respondents may be expected to move from originally

sampled housing locations (or telephone numbers) during their time

in sample.  Depending on the purpose of the survey and procedures

adopted to track movers, respondent mobility has varying

implications for the representativeness of the sample over time.  a

number of factors may enter into decisions regarding whether, or

how, to follow movers.

 

     A crucial consideration is to determine the most important

unit of observation for the survey.  A longitudinal survey of

persons may be designed to follow sample individuals or households,

if the substantive goals of the survey would be served by retaining

as many of the originally sampled respondents as possible.  A

number of surveys, such as SIPP and NLS, focus on individual and

household economic data, which continue to be relevant to the

purposes of the survey regardless of respondent mobility. 

Consequently, following movers is an appropriate means to maintain

data quality over time for such surveys.

 

     Following movers may create other problems, however. For

instance, if there are ecological correlates for the phenomena of

interest, such as crime or quality of housing, then following

mobile respondents may result in deterioration of the geographic

representativeness of the original sample, with a consequent

potential for bias in some measures for later waves.  A rotating

panel design may minimize this problem, because newer respondents

are more likely to reside in the originally sampled housing

location.

 

     Another reason for following movers is that respondents may

move for reasons related to the substantive goals of the survey. 

This makes it important to know why they move.  If this is the only

reason for following movers, then collecting data for only one wave

after  a move may be enough.  In NCS, for example, some respondents

may move from a high-crime area to a safer neighborhood, and it

would be important to determine the proportion of moves which were

related to crime victimization can be measured, but not the future

consequences of victimizations for such movers.

 

     The SIPP is attempting to follow all individual movers. 

Because living arrangements vary according to economic circumstance

--and affect eligibility for social welfare programs -- a change in

residence can be related to changes in income and program

participation.  Thus, for SIPP it is crucial not to lose data on

movers.  The CPI, on the other hand, follows only those movers who

provide services, such as doctors or lawyers, since their expertise

is the item being purchased.  When a commodity outlet changes

location, this move is considered a unit "death" and the CPI record

is terminated.

 

The actual procedures developed for following movers are likely to

reflect the field procedures of the organization conducting the

survey, the collection mode used, the distance involved, and the

costs associated with tracking movers.  If the organization

conducting the survey uses decentralized collection procedures, a

respondent moving from the jurisdiction of one regional office to

another may be more difficult and more expensive to track.  Also,

the costs of following movers may be greater if a face-to-face

collection mode is used, rather than a telephone design, where

tracking procedures may

 

 

20

 

 

be limited to obtaining a new telephone number.  Depending on the

cost, administrative difficulty, and proportion of respondents who

move far enough to create problems, it may not be desirable to

follow all movers or to rely on standard collection modes.  SIPP 

field procedures, for instance, indicate that personal interviews

need not be administered if the respondent has moved beyond 100

miles from any sample PSU, and rules also differ for respondents

younger than fifteen years of age.  If survey procedures allow

telephone interviews in lieu of face-to-face interviews, a phone

contact may be a desirable alternative for movers who are difficult

to reach.

 

     The type of sample involved may also affect the ease with

which movers may be located.  For instance, it is usually easier to

find a mover through neighbors or subsequent occupants of a sample

housing  unit if an area sample has been adopted rather than with a

random digit dial sample.  Asking respondents to notify the field

office with pre-printed cards when they move can be a partial

solution, but this option relies heavily on the respondent's

cooperation.

 

     3.  Attrition

 

     When projected across waves of a longitudinal survey,

manageable levels of non-response in a cross-sectional survey can

become significant sample attrition.  The potential for attrition

in a longitudinal survey sometimes limits sample definition. 

Tracing mobile respondents generally accounts for a large

proportion of field problems as well as costs, and refusal rates

are likely to grow over the life of the survey.  Incomplete records

and missing interviews create analytical complexities that are

unparalleled in cross-sectional research.  Attrition is most

dangerous when it is correlated with the objectives of the survey. 

For example, there is evidence that sample attrition may be related

to victim status in the NCS.  To the extent that the sample loses

victims at a faster rate than non-victims, estimates from later

waves will be biased.  Also, Fienberg and Tanur(p.17) note than in

social experiments disproportionate loss of respondents for

different treatments may be a problem, because treatments often

vary in their attractiveness to participants.

 

     Sample attrition between observation periods may create the

illusion of change when means are compared between waves, without

adjusting for non-response.  In study focused on identifying

change, there is a risk that changes are spurious, due to sample

attrition.  In addition, respondent participation that varies from

panel to panel could produce the appearance of change even when

aggregate non-response is stable.  The estimates of central

tendency (Cook & Alexander: 191).  Mean test results from

longitudinal panels of students taking ETS exams were compared to

mean test results derived from a cross-sectional survey of the same

population.  The means were significantly different, which the

analysts attributed to selective attrition in the longitudinal

sample.

     Effects of attrition in demographic surveys have been harder

to predict.  Attrition does not necessarily created unmanageable

bias in a longitudinal survey:  The NLS was still contacting 92

percent of living respondents 3 years after the original contact,

and still contacting 80 percent of eligible respondents 12 years

after the study began (U.S. Department of Commerce:321). In the

ISDP panels of 1978 and 1979, attrition did not climb steadily over

the five or six interviews administered to respondents.  Instead,

it leveled off and then declined slightly over all waves

(Ycas:150).  Nonetheless, a combination of attrition and varying

participation from wave to wave can create serious

 

21

 

 

 

problems in creating complete records.  In the 1979 ISDP panel, for

instance, only two thirds of the original sample persons had

complete interview records (Ycas:150).

 

     Calculating the response rate in longitudinal surveys is

itself difficult.  The measures used in cross-sectional research

are often not adequate for measuring non-response in complex

records, as they do not reflect cumulative non-response across

waves and do not take into account changes in the size of the

eligible sample due to births, deaths, and the addition of new

household members.  The illustrate, non-response for entire housing

units in the NCS is sometimes reported at 4 percent.  However, when

records for housing locations are linked to form a longitudinal

file, it has been found that over half of the originally sampled

housing units are missing at least one interview.  This discrepancy

is due to the fact that the former figure is a cross-sectional

measure of unit non-response in a particular wave and does not

account for the approximately 10% of sample housing units

unoccupied at the time of interview (Fienberg & Tanur:14).  This

figure also dies not cumulate non-response over time.  While the

lower figure is an appropriate measure for many cross-sectional

uses of NCS data, it clearly is inadequate for reflecting the

completeness of linked housing unit records.

 

     The methods that have been developed for tracing respondents

in longitudinal surveys have been successful, but they have also

proven to be expensive.  The Census Bureau has estimated that the

cost of contacting each wave of an ISDP research panel increase by

8 percent over the previous wave, due to the costs of following

movers and interviewing additional households (Fienberg & Tanur:11-

12, White & Huang).  However, NCES also found that per-unit tracing

costs for the High School and Beyond (HS&B;) Survey were

approximately 20% less than the cost of base year sampling, which

illustrates the economies which can be realized by mounting a

longitudinal study, rather than separate cross-sectional studies. 

To control costs, as well as potential bias, each longitudinal

survey must investigate the characteristics of respondents who

move.  Depending on empirical evidence about how atypical non-

respondents are, a judgment can be made about the proper balance

between the costs of tracing respondents and an acceptable level of

non-response.

 

     Sample definition offers another approach to limiting

unscheduled attrition.  The probability of becoming a non-

respondent is not randomly distributed among the population.  In

longitudinal samples such factors as rural resident, interval since

contact, and region of the U.S. affect the probability of

maintaining contact (Artzrouni:21-24).  Some longitudinal designs

have therefore sought to minimize attrition by avoiding the

respondent classes that are most susceptible to attrition.

 

     Setting aside respondent classes to control attrition can

conflict with attaining a sample that truly represents the

reference population.  However, a sample chosen without regard to

eventual tracing difficulties may also gradually lose its

representative power through attrition. Only empirical evidence can

indicate the extent to which characteristics that predict attrition

co-vary with the characteristics that the study is designed to

investigate.  A sampling design which sets aside respondent classes

with potential attrition problems should be undertaken only after

careful consideration of the relative magnitude of bias which could

be introduced by such a strategy and other alternatives, such as

imputation for missing data or performing analysis on the remaining

sample cases of an initially representative sample.

 

     In cohort or panel studies, which require measurement to begin

and end at the same time for all respondents, implementation of a

rotating panel design, which reduces the impact of attrition by

replacing respondents over time, will clearly not serve the goals

of the survey.  One possible strategy for dealing with attrition in

such studies is to impute.

 

22

 

 

 

missing data, based either on statistical models or on complete

data from prior waves or from respondents with similar

characteristics. Another possibility is to reweight the sample for

each wave to reflect non-response for various demographic groups in

the sample. (See Chapter 4.)

 

     Duncan, Juster, and Morgan (1982) model such a procedure for

the Panel Study of Income Dynamics (PSID), conducted by the

Institute for Social Research (ISR) at the University of Michigan. 

They compare results for data gathered with persistent efforts to

pursue respondents and for the data set which would have resulted

if less intensive respondent contact strategies had been adopted. 

When the latter is reweighted to adjust for missing cases and

compared with the first data set, there are minimal differences in

outcome measures.  While this procedure has promise for minimizing

bias resulting from non-response across waves, it may also allow

some relaxation in pursuing respondents, allowing cost reductions

in survey administration.  The authors do note, however, that

reweighting entails some risk of covariation-related bias in

multivariate estimates, especially for models that are not well

specified, and that maintaining an adequate number of respondents

in some key subsamples may remain a problem.

 

     A reasonable precaution to minimize the deleterious effects of

sample attrition is to minimize respondent burden, which has been

variously described as the amount of time which an interview

entails or as the complexity of the task required of respondents

for successful completion of an interview.  Under the Paperwork

Reduction Act of 1980, each Federal statistical program is

restricted to a limited number of hours available for data

collection in a fiscal year, thereby encouraging reduction of the

burden placed on respondents.  In addition to the statutory reasons

for limiting the length of Federally sponsored surveys, controlling

respondent burden may also improve data quality for longitudinal

surveys in a number of ways. An important aspect of this data

quality enhancement is that, respondent participation may be

encouraged by reducing interview tedium, thereby reducing refusal

rates and enhancing the representativeness of the sample over time.

 

     Respondent burden hours may be reduced by a careful evaluation

of the utility of collecting information in every wave.  The SIPP,

for example, minimizes respondent burden by dividing the survey

into a core questionnaire ad ministered at each interview, plus

"topical modules" to collect data not required as regularly. 

Sometimes only a subsample of respondents should answer certain

topics.  Finally, lengthening and/or varying the intervals between

waves should also be considered as a means for reducing respondent

burden.  The CPS, while not a longitudinal survey, adopts this

strategy of varying tim e between interviews.  Respondents are

interviewed for four months in succession, not contacted for the

following eight months, and then interviewed for a final four

months.

 

 

     4. Changes in Units of Observation

 

     A slightly different sample of respondents participates in

each wave of a longitudinal survey.  Such changes in sample may

result from scheduled introduction or retirement of sample units in

a rotating panel design, from attrition, or from introducing new

respondents when household composition changes.  This variation

causes difficulties related to defining the correct reference

population, in weighting for item non-response, and in weighting

respondents who enter and leave the sample.  In addition, the

changing sample of respondents and aggregate units creates unique

difficulties in analyzing data above the person level A variety of

approaches has been used to define units of analysis in

longitudinal research, and each has specific problems and

strengths.  These are discussed in detail in Chapter 4.

 

                                                                 23

 

 

 

     It should be noted here, however, that all weighting

adjustments should be planned simultaneously.  The problem of

adjusting for non-response is the converse of problems created by

persons entering the sample, and the adjustments for entrants and

non-coverage, once selected, can be accomplished in a single

operation.

 

     Split and merged households present particular problems for

sample comparability across waves.  Such recomposition of

households creates obvious difficulties for longitudinal matching,

which will be discussed below.  However, changes in household

membership also raise questions about how to treat new members of

split households who were not members of the originally sampled

household but who came into sample because of their associations

with original sample persons.  Rules developed by the ISDP offer

one method which seems generally applicable to a number of surveys: 

New household members were added to the sample, but if they left

the household, or if this household subsequently split, only those

members who were selected for the original sample were followed. 

This procedure avoids excessive growth of the panel, thus

minimizing artifactual changes in aggregate panel statistics, but

still collects relevant household data which correspond to data

from "stable" households.

 

     Whether a change in a household constitutes the birth or death

of the sample unit depends on the goals of the survey.  If the

survey samples households and does not follow movers, then a

complete turnover in the household occupants would indicate the

birth of a new unit.  If housing locations are sampled, then such a

turnover would not constitute a death as long as the hosing unit

remains occupied.  The death of a member of the household, or event

he head, does not constitute death of the unit for a household-

based sample, but a divorce or separation often will be defined as

termination of the unit.  If an individual respondent leaves the

sample, the reason for the departure should be determined.  If the

respondent has died,  then the individual record should be

terminated.  However, if the respondent leaves the sampling frame

for other reasons (e.g., entering the military or moving abroad),

it is possible that he or she may return during the life of the

panel, and the record should be retained.

 

     Often the death of a unit can be determined by observation. 

For instance, when a housing unit is vacant or destroyed and the

sample is location-based, termination of the record may be

indicated.  However, in other cases respondents must be queried

regarding the status of the unit.  If the unit of measurement is

the household, occupants of the sample location must be asked

whether they lived at the current address when the previous

interview took place to determine whether they should be considered

part of the sample.  (Rules for this decision will vary between

surveys.)  If only part of the household has moved since the

previous visit, it may be necessary to determine the reason for the

departure to ascertain whether the movers remain in the sampling

frame.  In designs which do follow movers and which allow the

formation of new households during the life of the sample,

permanent departure of individuals to form new households will

indicate the need to establish new household records.  (See Chapter

4 for a fuller discussion of these issues.)

 

B.  Changes Related to Respondents' Time in Sample

 

     Varying sample participation is not the only change over time

which complicates inference from longitudinal data.  A number of

factors related to the time respondents remain in sample may

produce changes in survey measures which are independent of any

substantive changes in the phenomena under investigation.  These

factors include variation over time in the rules for interviewing

particular respondents and changes in 

 

 

24

 

 

 

respondents' approach to the interview based on increased

experience with the survey instrument as the sample matures.

 

 

     1. Response Variability Due to Changes in Respondent

 

     The manner in which a survey is administered may vary from

respondent to respondent. "Proxy" interviews may be administered,

in which adult household members complete interviews on behalf of

younger respondents, or in which available household members supply

data for other individuals in the household. (In some cases such

proxies are restricted to household members who are not present,

but, in other instances, one household member will supply personal

data for all individuals in the household.) Respondent rules are

also frequently needed for collecting household information if

there is more than one respondent per household.  A number of

possibilities exist for respondent rules.  For example, one

respondent in a household may be selected to provide household

data, while personal data is requested from each respondent

individually.  Alternatively, all respondents may be asked for

household data.  In the latter case, inconsistencies might be

reconciled in the field, for instance, when respondents report

conflicting details regarding a household crime incident.  A

computer edit, or a postweighting algorithm might also adjust for

differences in reporting, when household measures are simply the

sum of individual measures.

 

     Respondent rules can affect longitudinal data over tim e. For

instance, during a longitudinal survey, younger respondents may

become eligible to complete an interview without proxy, and may

begin to report information of which previous proxies are unaware. 

There is also evidence that household-respondent status may affect

the manner in which personal data are reported, particularly if the

two types of information requested are related.  Biderman, Cantor,

and Reiss (1982), for example, find that respondents who report

household data also report higher levels of personal crime

victimization than do respondents who do not report household data. 

They also find that, if the household respondent changed between

interviews, levels of personal victimization for the affected

persons would also change.  The authors hypothesize that the

initial battery of household victimization items serves as a warm-

up for personal items and aids recall for household respondents.

 

     If the household respondent is allowed to change across waves,

then two effects should be anticipated.  First, the quality of

personal data reported by a given respondent is likely to change

over time, depending on whether he or she serves as the household

respondent.  Second, different household members will vary in their

knowledge of the relevant data, so the quality of household data

may also be expected to change over time and thereby bias

transition estimates.

 

 

     There are some obvious remedies for these problems.  First,

proxy interviews should be minimized, recognizing that obtaining

certain information directly from younger respondents may be

inappropriate or that there maybe no other way to collect data for

some respondents.

 

     Surveys vary in their reliance on data collected by proxy

(eg., about 60% for NCS, 40% for SIPP), and such a policy is likely

to produce an improvement in data quality proportionate to the

fraction of data currently collected in this manner.  Second, care

should be ta ken in assigning responsibility for answering

questions about the household over time, either by consistently

assigning this responsibility to the same respondent or by

requesting these data of all respondents.  The latter procedure

minimizes the effect of an unavoidable change in household

respondent and makes any respondent effect consistent across all

waves however, due to mandated

 

 

                                                                 25

 

 

 

 

 

ceilings on response burden for federally sponsored data

collections, the additional precision realized may not justify the

substantial number of redundant questions which are required.  It

should also be noted that the reconciliation procedures or post-

weighting that would be required may make such a strategy very

difficult to use.

 

 

     2. Panel Bias

 

     A number of factors associated with respondents' time in

sample may produce changes in survey measures over time and thereby

complicate explanation.  The impact of these factors has been

described as a history effect, secular effect, maturation effect,

rotation group bias, time-in-sample bias, or Heisenberg effect. 

These factors include the reactivity of respondents to survey

measures, changes in the performance of the respondent role, the

"conditioning" effect of multiple administrations of the survey

instrument, the aging of the panel, interaction between

interviewers and respondents, interviewers' perceptions of their

role, and the correlation between variables of interest and the

probability of response.  Changes in survey measures due to such

effects present a danger for bias in longitudinal estimation. 

Consequently it is important to consider the influence of such

factors when designing a longitudinal survey and to minimize the

potential for such changes.  This is a difficult task, because the

reasons for the phenomenon are not clearly understood.

 

     Ideally, the process of measurement should itself produce no

change in the phenomenon under investigation.  Research methodology

in experimental psychology, for example, often involves disguising

the purposes of research, so that the subject will produce the

behavior under investigation with minimal "contamination" by the

research procedure.  In survey research, however, the respondent

must not only understand the measures being collected but also must

be led to appreciate the purposes and value of the research if

response rates are to remain high.  This is particularly important

for longitudinal surveys, where retaining sample is a crucial goal

Consequently the danger of reactivity between survey interviewing

and the phenomena under investigation is a particular problem.

 

     Researchers studying labor market experience, for example,

have speculated that repeated interviews asking about job mobility

might cause some of the mobility reported (Parnes:15).  Questions

about mobility may in fact cause subjects to consider the

possibility and act upon it.  National Crime Survey data also

indicate that proportionately fewer crime incidents are reported in

successive waves.  This finding may stem from respondents'

heightened awareness of vulnerability to crime, caused by

participation in the NCS, which results in increased precautions

taken against crime victimization.  It has been suggested that

respondents in a longitudinal sample might exhibit non-typical

behavior Simply because repeated questioning regarding a topic may

alter respondents' perceptions of the subject under investigation

and change their behavior or attitudes accordingly.

 

     For respondents no remain in sample, their responses can

change over tim e solely as a function of longevity in the panel

These temporal variations in response have implications for the

quality of longitudinal data which are often unpredictable.  In

some cases, the quality of data may improve over time. Respondents

may understand the respondent role better with repeated

interviewing or pay greater attention on a day-today basis to the

experiences being measured, with a consequent improvement in the

richness or accuracy of the data gathered.  Alternatively, if

respondents or interviewers find the interview tedious or

burdensome, they may become less enthusiastic about the

 

26

 

 

 

 

 

task over successive waves and avoid or give incomplete responses

to survey items. One aspect of such a decline in data quality is

the possibility that respondents may be "conditioned" by their

participation over several waves to provide answers which produce

artifactual changes over time. For instance, respondents may learn

that a particular response will trigger a long battery of

questions, which they may prefer to avoid in the future.

 

     This is one alternative explanation for the decline in the

rate of crime victimization reported in the NCS over successive

waves.  Respondents may learn that reporting a crime incident leads

to an additional series of items for each incident reported, which

results in a substantially longer interview.  The Census Bureau's

Current Population Survey (CPS), which is not strictly a

longitudinal panel survey but which has many of the attributes of a

longitudinal survey, exhibits a similar trend.  Reporting

unemployment triggers a battery of questions dealing with reasons

for unemployment and activities directed towards looking for work. 

Reported unemployment invariably falls between the first and second

waves of interviews in the CPS.  This phenomenon in CPS could be

related to several factors.  One has to do with repeated

interviewing and attrition.  Williams and Mallows showed that, if

the probability of response in a given save of interviewing was

correlated with variables of interest, then, even with no change in

the variables, a spurious change would occur.

 

     The passage of time can also produce unintended change between

observations because of gradual shifts in the meaning of questions

and answers.  Even when questionnaires are not changed, there may

be evolution In the way respondents perceive or answer questions,

which produces the appearance of movement (Parnes:14).  This might

be caused by events (including the survey itself), by maturation in

the sample, or by non-response.

 

     It is very difficult to determine whether a change across

waves is real change or spurious change.  Continuing validation

research is necessary to identify panel bias in longitudinal data. 

Panel bias may be studied by comparing data collected in subsequent

waves of a longitudinal survey to data collected in cross-sectional

surveys (as in Cook & Alexander).

 

     Although some conditioning or panel effects may be inevitable,

several tactics can be used to minimize their impact.  One option

is to implement a rotating panel design to replace respondents

after a predetermined number of interviews.  This procedure affords

two primary benefits.  First, those respondents who have been in

sample the longest are replaced with more "inexperienced"

respondents.  Second, the temporal overlap of old and new sample

facilitates studies of time in sample effects.  All respondents are

administered the same instrument under the same conditions at the

same time, which serves to test alternative hypotheses about panel

effects.

 

     Another possible means to attenuate or postpone the effects of

panel bias is to minimize the respondent burden imposed by the

interview.  Careful construction of the instrument to minimize

tedium and encourage respondent rapport should be central concerns

in planning any survey but take on added importance in longitudinal

data Collections, because of the need to sustain the active

participation of respondents overepeated interviews.  The overall

length of the instrument may play a role in the respondents

willingness to participate fully in successive contacts.  However,

design of the instrument to minimize tasks which the respondent is

likely to find either tedious or particularly difficult is also an

important consideration.  Use of long follow-up batteries should

also be minimized, to attenuate the effects of respondent

conditioning.

 

                                                                 27

 

 

 

 

 

C. Operations Change Over Time

 

 

     Changes in the administration of a continuing survey are

almost inevitable.  Revisions to the instrument, redesign of the

sample, introduction of new collection modes, and transfer of data

collection responsibilities to another organization can all

introduce changes in the data and compromise the validity of

longitudinal comparisons.  While a consistent time series may be

difficult to maintain under such circumstances, means exist which

allow the analyst to deal with the effects of such changes.

 

     Eventually in most longitudinal research there is a pressure

to change the survey measures in response to changing hypotheses. 

In addition, later findings frequently indicate a need for measures

of new variables.  Particularly when longitudinal research is

exploratory and designed to identify significant correlates of

change, researchers may be inclined to correct large a mounts of

data to minimize future requirements for change in the

questionnaire design.  This aspect of longitudinal research may be

costly, but it is an understandable precaution given the tendency

for research hypotheses and/or policy-aims to change over time.

 

     To accommodate changing methods, a survey may be run under old

and new procedures simultaneously for a period of time, to allow

comparisons between data collected before and after the change. 

Ideally, both old and new designs should be implemented at full

sample, in effect twice the usual sample size, but budget

constraints will often make this impractical The CPS has adopted

this double-sample strategy to phase in new samples based on the

1980 Census.  The CPI also used both old and new sample designs

simultaneously for a six- month period in 1978, when the survey was

revised.

 

     Another strategy to consider when a questionnaire item is

rewritten or a derived variable in a file is altered is to make

changes in such a way that analysts may record the revised variable

to correspond to the original variable (and vice versa), or to

retain old questionnaire items in the revised instrument for some

time.  NCES adopted the latter strategy for the HS&B; survey when it

adopted an "event history" approach to gathering employment and

education data.  In addition to the new items, the previous "Point

in time" activity item was continued, allowing calibration of new

items to the old and providing a degree of comparability between

versions.

 

     To reduce field costs, many sponsor agencies have approved

designs which permit data collection by telephone after the first

visit.  NMCES and MNCUES, for example, used phone contacts for

follow-up interviews.  The available evidence suggests that such

changes in mode may not produce uncontrollable fluctuations in the

measures obtained.  Benus (1975) notes that data collected by

telephone and by personal visit for the Panel Survey of Income

Dynamics (PSID) are quite similar.  Groves and Kahn (1979) found

overall that univariate distributions and bivariate relationships

were not significantly different for 200 questions ad ministered by

telephone and in person.  However, they note that telephone

interviews elicited more rounded financial figures, less detailed

responses to open-ended questions and narrower distributions on

some attitude items.  They also indicate that respondents tend to

perceive telephone interviews as longer than personal interviews of

the same length.  Findings that telephone respondents tend to give

more "don't know" answers to filter questions triggering other

questions may be related to this difference in perception of

length.  Telephone respondents may be more eager to bring the

interview to a close.  Consequently minimizing respondent burden

seem s particularly crucial for interviews conducted by telephone.

 

28

 

 

 

 

 

     While the research literature on the effects of interviewing

mode on survey response is generally encouraging, there are enough

examples of differences in respondent behavior to indicate that a

mixed mode design should not be implemented without adequate

pretesting and analysis of the effects.  One danger is that a

particular questionnaire design or questions about a certain

subject area might trigger mode-related differences in respondent

behavior.  To facilitate measurement of such mode-related response

variability, it is desirable to design shifts in mode of data

collection so that the changes across waves are systematic, making

the effects measurable.  It is also important in surveys which do

not require interviews with all household members to ensure that

interviews are obtained from the same household members when the

interviewing mode varies across waves, as respondent availability

may vary by mode.

 

     In conclusion, prospective longitudinal surveys require

administrative and operational features that are different in kind

as well as degree from those in cross sectional research.  The

long-term analytical goals of the survey must be considered in

planning every aspect of sample definition and weighting. 

Provisions should be made for validation studies to evaluate such

factors as attrition and panel bias.  Finally, changes in format,

operations and staff must be anticipated and managed in ways that

ensure the comparability of measures from wave to wave.

 

     In practice it is worth noting that there are only a limited

number of organizations which handle nearly all large-scale

longitudinal surveys.  Due to their experience, these organizations

have a high level of expertise, and the continuity of experience

contributes to successful planning and implementation.  However,

the concentration of longitudinal research in such a small number

of organizations increases the impact that any errors, such as

limitations in the sampling frames most commonly used, would have

on the representativeness of longitudinal research.

 

 

D. Processing

 

     While the measures collected in longitudinal research may be

similar, to those collected in cross-sectional studies, there are

special problems in controlling and interpreting them.  The sheer

size of the data files created in national longitudinal surveys

creates special problems in processing and analysis.  The massive

files can be difficult, expensive, and slow to process, which has

often limited their use to organizations with the staff, equipment,

and often complex software capable of handling complex data sets. 

As a result, data analysis has typically lagged behind the

accumulation of data (Kalachek:17).  Fortunately, this situation is

changing with the advent of public use files for multivariate

analysis and with the dissemination of m ore user-friendly

"statistical data base" packages to facilitate data management and

analysis.

 

     In processing data from longitudinal surveys, difficulties are

encountered related to cross-wave case matching, cross-wave data

revisions, and preparation of data files for analysis.  Often there

is no single "best" procedure for processing, because ease of

processing and analytical requirements are not always compatible

goals.

 

     Errors in individual record files can cause multiple problems. 

Often items which should remain consistent across waves (e.g., race

and sex) or which should change only in predictable ways (like age

and marital status) will exhibit changes due to respondent

confusion, transcription error by interviewers, or keypunching

errors by processing staff.  Detecting these errors is important,

not only because such items often define key

 

 

                                                                 29

 

 

 

 

 

demographic variables for analysis, but because such items are

frequently needed to match cases.  Errors are also inevitably

introduced when imputations are made for missing data.

 

     Several procedures are possible to minimize errors.  For SIPP,

the field office staff immediately checks completed interviews to

reconcile discrepancies, avoiding more costly correction of data

after they have been keyed.  Another possible procedure is to build

computer edits into the processing system to detect inconsistencies

between current and prior interviews.  NLS-72 and HS&B; use machine

edits to identify and resolve inconsistencies for about thirty

critical items.  Another option, utilized by CPI, is to create a

machine-generated control card, which avoids errors in

transcription and which provides interviewers with prior-wave data

necessary to reconcile discrepancies in the field.  This latter

procedure, however, can also lead to reduced reporting of actual

change.

 

 

     1. Cross-Wave Matching

 

     In order to link data across waves, variables must be created

to match records at the desired unit of analysis.  A number of data

management issues must be addressed, including the consistency of

linking variables across waves, providing for longitudinal matching

at multiple levels of analysis, and rules for matching merged and

split households.

 

     If longitudinal records are not matched correctly between

waves, the effects can be similar to sample attrition or non-

response.  The records of one or more observations will  be missing

from a respondent's longitudinal file, giving the appearance of

missing interviews.  One possible consequence of matching errors is

error in analysis, either because incomplete records are deleted,

or because missing data are imputed.  If records are linked

incorrectly, longitudinal data are also likely to produce flawed

results by showing false changes in status.  Even cross-sectional

analyses may be in error, if control card information or data from

previous interviews are carried over onto the improperly matched

record by the processing system.

 

     A number of procedures are possible for linking units

accurately from wave to wave, including matching of household and

individual line numbers, or matching independent person and/or

household identification numbers.  Economy in the number of

variables used for a match is generally a virtue, because the

opportunity for mismatches due to transcription or coding errors

increases with the number of variables used.  So does the

likelihood of missing data, which often results in the computer

assigning a missing data code, which hampers matching.  Limited

redundancy in linking variables can, however, provide some

protection against false matches, in that such cases are more

likely to be flagged in the matching process.

 

     Validation procedures to detect longitudinal mismatches should

be incorporated into the processing system and can often rely on

demographic variables which either should not change over time

(e.g., race, sex, or date of birth) or which can be expected to

change in predictable fashion (e.g., marital status or age).  Such

methods are particularly useful when person-level matching is

performed using the assigned line number of respondents within

household.  It is also useful to imbed check digits in key linkage

numbers, to detect miskeying.  In addition to careful design of

validation variables, immediate error checking by the field office

of items important for matching and validation is likely to reduce

the number of mismatches significantly.

 

30

 

 

 

     Often, person records are linked across waves by matching on

household ID and on the line number of an individual within the

household record.  This is usually cumbersome, and it makes linking

individual data across waves extremely difficult if an individual

moves out of the sampled household, if the household dissolves, or

if the household merges with another household, all of which render

the previously assigned household ID obsolete.  Consequently, for

surveys which are intended to follow individuals, regardless of the

duration of their association with a sampled household or household

location, assignment of an independent person ID is highly

desirable.  This is not to argue that ID is at other levels of

observation are not useful, as longitudinal analysis at household,

person, or event level is often needed.  The important

consideration is that linking variables be designed so that changes

in sample composition do not prevent record matches.

 

     SIPP has implemented an ID which, while complex, illustrates

the sort of linkage which is often desirable. (Cf Jean & McArthur,

1984).  The ID consists of:

 

 

     PSU number          - 3  digits

     Segment number      - 4  digits

     Serial number       - 2  digits

     Address ID          - 2  digits

     Entry address ID    - 2  digits

     Person number       - 2  digits

 

 

Household ID consists of address ID, PSU, segment, and serial

numbers.  The latter three numbers are fixed once assigned.  The

entry address ID also does not change.  The first digit of the

address ID indicates the wave at which the household was

interviewed at that address.  The second digit sequentially

numbers, by address, households resulting from a split into two or

more households by original sample persons.  The first digit of the

person number indicates the wave at which the respondent entered

the sample, and the second two digits sequentially number persons

within the household.  This ID also remains fixed.

 

     Linking households or individuals with the SIPP system is

fairly straightforward.  Households whose composition does not

change require the household ID, and individuals require the

household ID and person number to provide a match.  The inclusion

of a fixed entry address ID also facilitates matching records for

individuals or households who move, and for split households. 

Combining the person number and the entry address ID provides a

person number which remains constant regardless of changes in

address and household composition.  This provides a link to data

collected for an individual across all waves, allows a match to the

initial household, and permits the analyst to filter data for only

the original survey respondents, if desired.  This system remains

adequate for multiple movers or for households which split a number

of times.

 

     In 1979 two waves of interviews from an ISDP panel were merged

into a single longitudinal file using personal identification

variables.  Mismatching between records proved to be a significant

problem, and there was evidence that additional matching errors

were undetected (Kalton & Lepkowski:26).  A second file was created

using ID numbers rather than personal characteristics.  This file

had significantly fewer discrepancies during edit checks for such

items as sex and age, indicating that fewer matching errors

occurred with the use of the ID number for linking.

 

                                                                 31

 

 

 

 

 

Sometimes the potential of longitudinal data has not been exploited

because of the complexities involved in updating data with

information collected in subsequent waves.  For instance, a

respondent may report a crime victimization or a health problem,

but information on insurance coverage will remain incomplete,

because the claim had not been settled at the time of the

interview.  It is frequently desirable to revise or add data during

a later interview and to create an automated control system which

would allow revision of the original record.  One possibility is to

provide a check item on the instrument for information which is

frequently incomplete.  The control system could then flag

incomplete data during processing and direct the interviewer to

follow up on this question in a later wave.  Similar procedures

were used in N M C E S and N M C U E S, which allowed validation of

data collected on health care payments and insurance coverage

during later interviews.

 

     Revising files obviously creates some complications, and there

are trade-offs between ease of processing and ease of analyzing the

revised records. One of the simplest procedures for processing is

to reserve a field for follow-up data in the interview along with

an incident or event ID which allows a match to the original

record.  This procedure unfortunately would make the analyst's task

considerably more difficult, in that several files would have to be

scanned to locate all updated material. The required matching and

file restructuring routines would also be rather cumbersome and

expensive to run, unless the data were released in a form

compatible with a statistical data base which performed the

matching.  These complexities create potential for data management

errors, particularly for inexperienced users accessing public use

files.

 

     The alternative is to correct the original records based on

followup data and to release the updated files.  A disadvantage of

this procedure is that several versions of the same, file would be

in circulation.*  Nonetheless this procedure appears to have

greater potential for facilitating straightforward analysis and

management of the data, particularly if early versions of a file

are labeled as "preliminary."

 

 

     2. Data Structures to Facilitate Analysis

 

     A number of strategies may be used to create longitudinal data

files.  One is to create, a separate fixed length record for each

case at the smallest unit of analysis, with separate fields devoted

to repeated measures of the same variable.  Often this is not

feasible, because this procedure entails a thorough revision of the

file every time a new wave is completed.  It is often preferable to

produce a separate file for each completed wave or even more

frequently if data collection extends over a lengthy period and to

include in the files a number of linking variables which remain

constant for each case across waves.  Other than the size of the

files produced, the main difference between these two approaches

then is in the processing system adopted: The former produces

Integrated longitudinal files, while the latter produces files

resembling crow-sectional data sets which allow the analyst to link

the records later.

 

     Producing a file which uses the smallest unit of observation

as the basis for a record is often not the most efficient structure

for a data set.  A number of surveys

 

 

________________________________

 

*This is not as serious a problem for longitudinal files, the

latest version of which can more easily be identified, as it is for

cross-sectional files created from a particular wave.

 

32

 

 

 

 

 

collect data on households, individuals within households, and

discrete events experienced by the household in aggregate or by

individual members. Given the implicit "nesting" of such data,

creating a file based on the smallest unit will result in much

redundant information for higher level units.  The number of events

recorded and the number of household members may also be expected

to vary between households, and variable length records will

result, necessitating extensive "padding" to create a rectangular

file.

 

     A more efficient strategy in such cases is to produce

hierarchical files with the data pertaining to each level of

observation appearing in separate records and with variables

appearing in more than one type of record to allow for linkage

across levels.  A number of software packages such as SAS and

OSIRIS now exist which can process and analyze such files.  In

addition, a number of "statistical data base" packages are

available, such as SIR, Canada's RAPID, and Mathematical Policy

Research's R A MIS, which provide sophisticated capabilities for

matching across waves and levels, and which thereby simplify the

analyst's data management tasks in working with longitudinal files.

 

     Decisions regarding the optimum structure for a longitudinal

file also need to take into account the expected size of files. 

Limits on the number of records many soft ware packages can process

may be exceeded by the size of large federal data collections. 

Consequently, file structure options for facilitating analysis of

longitudinal data may be constrained.  Sponsors may find it

necessary either to forego compatibility with some otherwise useful

software packages or to release subsets of their data to provide

compatibility with a wider range of software packages.

 

 

     3. Confidentiality

 

     Processing operations and data structures for analysis cannot

be designed solely to reduce costs, complexity, or bias.  They must

also protect respondent privacy as far as possible.  This is

sometimes not compatible with maximum efficiency.  Procedures for

protecting confidentiality of paper records and of tape records

must be thought through carefully.

 

     The problem of maintaining respondent confidentiality is more

difficult in longitudinal surveys than in cross-sectional surveys. 

In cross-sectional research, the confidentiality of a response can

be protected by stripping responses of identifiers at an early

stage in processing.  In longitudinal surveys, response records

must be linked to personal identifiers, sometimes for decades,

until data collection and analysis are complete.  Longitudinal

records commonly contain multiple identifiers in order to

facilitate tracing and to ensure that records can be matched after

each wave, regardless of missing data.  Name, address and Social

Security number are often augmented with the name and address of

family, neighbors, or friends who are to be contacted in tracing

respondents who have moved.  The large number of identifiers, plus

their dispersion across records and across time, makes protecting

confidentiality in a longitudinal survey far more difficult than in

cross-sectional research.  However, most research organizations

have learned over the years how to protect paper records.

 

     An illustration of one solution to problem is that adopted by

N C ES for the NLS-72 and HS & B: Identifiers are stripped from the

tape prepared by the contractor before it is turned over to the

sponsor agency.  These data are maintained by the contractor but

may only be used with the explicit approval of the sponsor.  The

procedure provides a complicated, layered procedure which inhibits

any unauthorized access by sponsor, contractor, or public users and

provides protection similar to that of a cross-

 

     33

 

 

 

 

 

sectional study.

 

     This example illustrates a number of the basic safeguards

which should be integrated into any longitudinal data collection

effort.  First, identifiers should be used only to maintain the

quality of the data, e.g., for tracing respondents or for matching

purposes.  Second, only staff performing these functions should be

allowed access.  Hardcopy media containing identifiable data should

be stored in a secured area to limit access.  Electronic files

should be similarly secured and, when in use, access should be

restricted by the operating system to authorized processing

personnel only.  Third, all privacy- relevant data should be

stripped from public use tapes before release.  Ideally, the

collection agency should separate identifiers during processing and

store them on a file separate from the substantive data.  Finally,

when data Section is complete, all copies of identifiers should be

destroyed.  Even when such measures are taken, agencies and

research organizations must consider the possibility of

confidentiality breaks.  The quantity of information available

about respondents creates the possibility that a series of rare

responses can identify respondents.  Current research in

confidentiality is addressing this problem and should provide

useful guidelines for enhanced security measures in the near

future.

 

34

 

 

 

 

 

CHAPTER 4

SAMPLE DESIGN AND ESTIMATION

 

     There are many issues in the design and estimation strategies

for longitudinal surveys that are identical to those for cross-

sectional surveys. Some issues, however, such as weighting and

compensating for nonresponse become more complicated with a

longitudinal survey.  Usually the complications arise because of

the changing nature of the population, as discussed in Chapter 3.

In this chapter, we discuss some of the major design and estimation

problems, many of which need more research.

 

A.   Defining a Longitudinal Universe

 

     Defining the initial study universe for a longitudinal survey

is no more complicated than defining the universe for a cross-

sectional study, The initial universe is fixed at a specific point

in time and is explicitly d fined.  Sample units can be selected

and the only difficulties are related to the sampling frame itself. 

Time, however, gradually complicates the problem of defining a

longitudinal universe.

 

     The study universe usually does not remain constant over the

period of the longitudinal survey, as was discussed earlier., The

universe of individuals, households, families, or establishments

changes over time.  If a universe changes slowly along the critical

dimensions of the survey, the problem of a longitudinal universe

definition may be ignored.  However, if changes in the universe

over time are not trivial, a static universe definition may not be

sufficient.  The choice of definition for the longitudinal universe

will have a direct effect on data collection and analysis.

 

Judkins et al (1984) describe three methods for defining a

longitudinal universe. These ideas are generalizable to any

longitudinal study of persons or other units. One method for

defining a longitudinal universe is to select a specific time

during the course of the study as the point that defines the

universe.  If the universe is defined at the time of sample

selection, it is called a cohort study.  Units in the sample are

defined at the time of the first interview.  At later waves of

interviewing, data need be collected only from these units.  All

inferences and estimates refer only to the universe in existence at

the time of the first interview.  For example, for the CPI

commodities and service sector, the universe is a set of cohort

samples with attrition due to deaths.  Births are introduced only

when an entire cohort is replaced with a new sample.

 

Principal Authors: Daniel Kasprzyk and Lawrence R. Ernst          

 

 

35

 

 

 

 

 

     The longitudinal universe may also be defined at a time other

than the time of sample selection.  Under both scenarios,

statistical, operational and methodological problems may arise

because the sample was selected at one point in time and the

analyses of the study universe reflect a different point in time. 

It is possible that elements of the study universe at the time of

sample selection are no longer part of the longitudinal universe;

it is also probable that elements of the longitudinal universe

which exist at the time of definition were not in existence at the

time the sample was drawn.  This creates an operational problem --

whether to collect data from these "entrants" to the longitudinal

universe -- and it creates a statistical issue, the development of

estimation methods for this universe. For example, in the SIPP

universe (the non-institutional population, and members of the

military not living in barracks) individuals may leave the universe

by moving outside the United States, to an institution, to military

barracks, or by dying.  At any time during the study period persons

may enter the SIPP universe by returning from overseas,

institutions, or military barracks, or through birth.

 

     A second method of defining a longitudinal universe extends

the first method by looking at more than one time point.  Several

time points are selected, each one defining a universe at that

time.  Then the entire set of units -defined by these different

cross-sectional universes is included in the longitudinal universe. 

Thus, if a person entered a sample household by being born or

returning from overseas sometime after the initial interview, that

person would be included in the longitudinal universe.  People can

be added to the universe, and anyone who is in the universe for any

of the time periods should be included in the estimation.

 

     For analysis of a