Data-Driven Risk Models Could Help Target Pipeline Safety Inspections
by Rick Kowalewski, Pipeline and Hazardous Materials Safety Administration, and
Peg Young, Ph.D., Bureau of Transportation Statistics
Federal safety agencies share a common problem—the need to
target resources effectively to reduce risk. One way this targeting is commonly done is with a risk model that uses
safety data along with expert judgment to identify and weight risk
factors. In a joint effort, the U.S.
Department of Transportation's Bureau of Transportation Statistics (BTS) and
Pipeline and Hazardous Materials Safety Administration (PHMSA) sought to
develop a new statistical approach for modeling risk by letting the data weight the data—by using
the statistical relationships among the data, not expert opinion, to develop
Some key findings:
data through statistical procedures was superior to judgment-weighting in
predicting (targeting) relative risk.
modeling can help not only target which operators to inspect but also
focus what to inspect based on a
set of risk factors.
infrastructure, operator performance, and incident history appear to be about
equally useful in predicting future risk.
PHMSA's mission is to protect people and the environment from
the risks inherent in the transportation of hazardous materials by pipeline and
other modes of transportation. Each year
the pipeline safety program inspects several hundred thousand miles of
interstate pipelines carrying natural gas and hazardous liquids across the
United States. These pipelines are
operated by over 1,000 operators who manage systems ranging from a few miles to
tens of thousands of miles. While a
pipeline might seem to be a very simple system, in fact these systems are very
complex, and each system has some unique characteristics.
The general approach for conducting standard inspections
until now has been to inspect each major part of each system every 3
years. In 2006, PHMSA initiated a research/pilot project to
integrate the various kinds of inspections it conducted, to re-examine the
3-year inspection interval for standard inspections, and to focus the scope of
its inspections based on operator risk. Changing inspection intervals from a periodic-basis
to a risk-basis and changing from
comprehensive to focused inspections reflect a significant change in
approach. Program managers understood
from the outset that the new approach would require a better risk model.
The Current Risk Model
For more than a decade, PHMSA has used the Pipeline
Inspection Prioritization Program (PIPP) to schedule inspections and allocate
resources. PIPP is a data-based model
using 10 to 12 data variables (depending on type of pipeline) that are
transformed into 9 indexes, which are added together for an overall risk
score. The data variables for both
hazardous liquid and gas transmission pipelines are listed in table 1.
Beginning with these input variables, each one is transformed
into another variable (the individual PIPP scores) ranging from 0 to 9 points,
depending on the input variable, and then combined into the final total PIPP
score. The variables were selected
using expert judgment, and the transformations that determine the weight for
each variable also used expert judgment. PIPP results are used with other information to help set scheduling
priorities for inspections.
PIPP has been shown to be 3 to 4 times better than random selection in
identifying ("predicting") future risk as reflected in the number of pipeline
incidents.1 However, PIPP tends to underestimate risk
(substantially) where the actual number of incidents is high, and overestimate
risk (somewhat) where the number of incidents is low. This difference is illustrated in the the two
PIPP score scatterplots in figure 1 for hazardous liquid pipelines and for
natural gas pipelines, respectively.
The New Model
The new model predicts the number of
pipeline incidents and the incident rate per mile of pipeline for each pipeline
operator. To develop predictions,
researchers took several years of historical data to run simulations—using, for
example, data from 2002 to 2004 to "predict" 2005. The data were organized conceptually into
three sets, each using different data; the results are reflected in the six
remaining "risk" scatterplots in figure 1:
- The inherent risk associated
with the pipeline—represented by physical and operating characteristics such as
age, materials and coatings, diameter, location, and throughput—is estimated
using annual reports submitted by each pipeline operator.2 Inherent risk should be independent of how
the pipeline is managed and maintained.
- The performance risk associated with the operator (i.e, the company)—represented by safety
deficiencies—is estimated using the results of past safety
inspections—particularly those with the broadest scope, known as Integrity
Management (or IM) inspections.3 Performance risk should be independent of the
- The historical risk associated with past incidents is estimated from incident data reported to
PHMSA by operators.4 Historical risk is assumed to reflect the
combination of both inherent risk of the pipe and performance risk of the
Each set of data generated separate
predictions of future incidents that were also combined into a single
prediction for each operator. The diagonal line in each graph in figure 1
represents perfect prediction in which the predicted number of incidents equals
the actual number of incidents. The further the data points are from the
diagonal line, the poorer the performance of the predictive model. Gas
transmission operators were separated from hazardous liquid operators, as they
are in PIPP, because they present very different system profiles, different
risks, different data, and different numbers of incidents (see table 2). Other breakouts might also make sense (e.g.,
by product for liquid pipelines, or onshore v. offshore pipeline) but the research
has not explored these.
For presentation purposes, small
operators (with less than 500 miles of pipeline) were separated from large
operators because their operating environment tends to be different and the
relatively lower number of incidents makes the results somewhat less reliable.
The analysis behind all the models were performed in the statistical software
package SAS 9.1.
Three key characteristics of the data influenced the choice
of statistical models:
occur infrequently, so the models would have to deal well with small numbers.
number of incidents is a count value, with no fractional or negative values.
number of incidents per operator is highly skewed, with a large number of
operators having zero incidents in any given year.
Traditional linear regression, which relies on the assumption
of normally distributed data, is inappropriate for count data that are highly
skewed towards zero. Two other
models—the Poisson distribution and negative binomial regression5—can
handle such data. Another important
quality of these two models is their ability to control for exposure variables,
such as miles of pipeline. The negative
binomial is the more general model, and this was used to detect and weight risk
variables for both inherent risk and performance risk.6
The analysis of the historical risk associated with past
incidents presented a different set of conditions. The past 3 years of incidents and the next
(to-be-predicted) year of incidents most likely are not independent from one
another, so the data were transformed to create an "orthogonal" regression
model that would allow modeling the 3 years of incidents together to estimate
future risk. 7
Each of these major outputs—inherent risk, performance risk,
and historical risk—provide a separate prediction of risk, but they can also be
combined to present a single estimate. The approach taken here was to take the average of the three results.8 Other possibilities not examined here might use another model to weight these
three as inputs to an overall risk score, again letting the data weight the data, or developing an equation
that might relate any one output to the other two. Figure 1 provides a graphical synopsis of the
predictive accuracy for estimating the number of accidents per operator based
on PIPP scores, inherent risk, operator risk, and historical risk.
The predictive quality of each model tested was compared
using a standard statistical measure of error—the mean absolute deviation
(MAD)—which averages the absolute difference between the predicted value and
the actual value for each operator (see table 3). For example, when the model predicts 7.5
incidents and 5 actually occur, the error is 2.5; when the model predicts 4
incidents and 5 actually occur, the error is 1. MAD provides a sense of "how far off" the model predictions are from the
Testing Inputs to the Model
A key indicator for the effectiveness of any new model was
its ability to predict risk better than the existing judgment-weighted model
(PIPP ranking). In practice, this should
be fairly easy because a statistical model could simply reweight the 10 input
variables in PIPP or the 9 transformed variables for a better prediction using
data-weighting. Other obvious inputs to
nave model (which says that what happened last year is likely to happen again
alone (which suggests that the extent of the system might be the most important
indicator of the risk of incidents);
input variables into PIPP—reweighted using the new statistical procedures;
output variables (L-scores) from PIPP before the PIPP ranking is
calculated—reweighted using the new statistical procedures; and
- each of
the new indicators of risk—estimating inherent risk associated with the
pipeline, performance risk associated with the operator, and historical risk
associated with past incidents.
The results demonstrate that PIPP performs the worst in
targeting risk, and that reweighting the PIPP variables can improve the
predictive quality (reduce the error). Surprisingly, mileage alone and the nave model both were better
(smaller error) than PIPP in predicting future risk, but such simple models
offer little guidance in selecting appropriate sites to inspect. The new model performed well (with a MAD of
1.0), although the analysis indicated noticeable differences between gas
transmission operators and hazardous liquid operators. Hazardous liquid pipeline incidents are more
prevalent and more concentrated (fewer operators), so the data provide a better
basis for prediction.
The three main components of the new model—inherent risk, performance risk, and
historical risk—performed about equally well in predicting future incidents.
Findings From the Modeling Research
Modeling inherent risk associated with the pipeline
demonstrated that mileage, throughput (barrel-miles per year), date of installation, and pipeline diameter were
significant risk factors. Six variables
were significant in predicting future incidents for gas transmission systems,
and 14 variables were significant for hazardous liquid systems. About half of these variables were negatively
correlated with risk, meaning that they had a "protective effect." (Table 4 provides the listing of the significant variables for both models.)
Modeling performance risk associated
with the operator demonstrated that a few key inspection areas from Integrity
Management9 inspections were most highly correlated with future risk. One area (integrity
assessment review) was negatively correlated, suggesting that
finding deficiencies in this area helped an operator rapidly improve its safety
program. The most significant risk
factor was in the area of continual
evaluation and assessment—which inspection staff have suggested
might be a critical indicator of an operator's safety program.
Modeling historical risk associated with
past incidents demonstrated that the passage of time rapidly degrades the
utility of the data. After 2 years, past
incidents do not appear to be useful in predicting future risk. The most recent year is most important, and
the model weights this year most heavily.
Significant Data and Modeling Issues
While the model demonstrates the general
effectiveness of statistical tools as an alternative to judgment-weighting,
several important data limitations and modeling issues remain to be
addressed. Some of the more important
issues are listed here:
- Data on operators' systems and operator relationships
reflect a snapshot in time; changes might not be captured for up to a year, so
some data are outdated.
- Deficiency data from inspections are largely limited to
one major type of inspection—Integrity Management inspections—representing only
a small portion of the inspections conducted.
- The model does not differentiate more serious incidents
(the focus of the agency's performance goals) from those with less severe
consequences (actual or potential).
- The model introduces an exponential function that can
dramatically over-predict incidents when new data are outside the historical
- Small numbers of incidents each year limit the ability to
isolate combinations of factors that might be statistically significant.
The first line of research, currently
underway, is to refine the incident measures to reflect the consequences of incidents—to weight
incidents by potential severity in terms of harm to people and/or the
environment. Using conditional
probabilities, we have found so far that three variables help explain whether
an incident is likely to be serious: fire/explosion (indicating a violent incident), whether the incident
occurred in a high consequence area (indicating proximity to people), and
incident cause (e.g., corrosion or excavation damage).
Some general model improvements are
planned as well. These would separate
out onshore v. offshore systems, interstate v. intrastate operators, and
certain commodities that have special risk characteristics. The relationship between inherent risk,
performance risk, and historical risk needs to be further explored and
modeled. The issue of total number of
incidents v. the rate of incidents per mile needs to be addressed; it is not
clear which is more important in targeting inspections. And operator relationships—where some
operators are part of a larger group of operators that share certain plans and
management—need to be addressed because some inspections are targeted at this
higher corporate level.
There are several areas where the
measures for inherent risk, performance risk, and historical risk could be
enhanced. Improvement would include
targeted analyses of certain key variables to better understand why they are or
aren't significant risk factors, adding more inspection data, and testing the
time-sensitivity of inspection data.
After refinements are made, the model
needs to be validated with data from other years, uncertainty should be incorporated
into the results, and PHMSA program staff need to be involved in formulating
the best presentation of results for the intended use—targeting and focusing
A parallel effort will extend the
concepts from this modeling effort to another safety program—hazardous
materials transportation safety—which cuts across four other modes of
transportation. The model might be more
generally applicable in other federal safety programs as well.
1 By scaling PIPP scores to the number of actual incidents, predictive quality
was measured by the correct "hits" to determine the percent correct. This was compared to a random selection model
where each operator was simply assigned an equal share of points.
3 Deficiency data are captured at the point of inspection for Integrity
Management (IM) inspections of pipeline operators. Where deficiencies are serious, PHMSA pursues
enforcement action. Data on these
actions are available at www.phmsa.dot.gov.
5 In a recent review of the Motor Carrier Safety Status Measurement System, or
SAFESTAT, model used by the Federal Motor Carrier Safety Administration, the
Government Accountability Office (GAO) recommended a negative binomial
regression in place of expert opinion to weight the risk factors used in
targeting motor carrier safety inspections. This work by GAO was a strong factor in the risk modeling effort by BTS
and PHMSA. See Motor Carrier Safety: A Statistical Approach Will
Better Identify Commercial Carriers That Pose High Crash Risks Than Does the
Current Federal Approach, June 2007 (GAO-07-585).
6 For a good explanation of the Poisson and negative binomial models and how they
are estimated in SAS, see Logistic
Regression Using SAS: Theory and Application, by Paul D. Allison,
1999 (SAS Institute Inc.).
7 "Orthogonal variables" are linearly independent. For details on orthogonal
regression, see A. Stuart, J.K. Ord, and S.F. Arnold. 1999. Kendall's Advanced
Theory of Statistics, 6th ed. London: Edward Arnold,
8 Although historical risk—using incident data—might reflect the nexus of the
inherent risk associated with the pipeline and the performance risk associated
with the operator, using equal weights to average provides a simple
approximation of overall risk. Other
statistical methods might provide a better way to combine these factors.
9 The Integrity Management program was introduced over the last several years,
first for hazardous liquid pipelines then later for gas transmission
pipelines. This program requires
pipeline operators to identify and understand the risks in their systems,
identify high consequence geographic areas, establish programs for inspecting
and repairing pipelines, and continuously monitoring their systems.
About this Report
This report is the result of joint research by Rick Kowalewski, Senior Advisor of the Pipeline and Hazardous Materials Safety Administration (PHMSA), and Peg Young, Statistician for the Bureau of Transportation Statistics (BTS).
For related BTS data and publications: www.bts.gov