Data-Driven Risk Models Could Help Target Pipeline Safety Inspections
by Rick Kowalewski, Pipeline and Hazardous Materials Safety Administration, and Peg Young, Ph.D., Bureau of Transportation Statistics
Federal safety agencies share a common problem—the need to target resources effectively to reduce risk. One way this targeting is commonly done is with a risk model that uses safety data along with expert judgment to identify and weight risk factors. In a joint effort, the U.S. Department of Transportation's Bureau of Transportation Statistics (BTS) and Pipeline and Hazardous Materials Safety Administration (PHMSA) sought to develop a new statistical approach for modeling risk by letting the data weight the data—by using the statistical relationships among the data, not expert opinion, to develop the weights.
Some key findings:
- Weighting data through statistical procedures was superior to judgment-weighting in predicting (targeting) relative risk.
- Statistical modeling can help not only target which operators to inspect but also focus what to inspect based on a set of risk factors.
- Pipeline infrastructure, operator performance, and incident history appear to be about equally useful in predicting future risk.
PHMSA's mission is to protect people and the environment from the risks inherent in the transportation of hazardous materials by pipeline and other modes of transportation. Each year the pipeline safety program inspects several hundred thousand miles of interstate pipelines carrying natural gas and hazardous liquids across the United States. These pipelines are operated by over 1,000 operators who manage systems ranging from a few miles to tens of thousands of miles. While a pipeline might seem to be a very simple system, in fact these systems are very complex, and each system has some unique characteristics.
The general approach for conducting standard inspections until now has been to inspect each major part of each system every 3 years. In 2006, PHMSA initiated a research/pilot project to integrate the various kinds of inspections it conducted, to re-examine the 3-year inspection interval for standard inspections, and to focus the scope of its inspections based on operator risk. Changing inspection intervals from a periodic-basis to a risk-basis and changing from comprehensive to focused inspections reflect a significant change in approach. Program managers understood from the outset that the new approach would require a better risk model.
The Current Risk Model
For more than a decade, PHMSA has used the Pipeline Inspection Prioritization Program (PIPP) to schedule inspections and allocate resources. PIPP is a data-based model using 10 to 12 data variables (depending on type of pipeline) that are transformed into 9 indexes, which are added together for an overall risk score. The data variables for both hazardous liquid and gas transmission pipelines are listed in table 1.
Beginning with these input variables, each one is transformed into another variable (the individual PIPP scores) ranging from 0 to 9 points, depending on the input variable, and then combined into the final total PIPP score. The variables were selected using expert judgment, and the transformations that determine the weight for each variable also used expert judgment. PIPP results are used with other information to help set scheduling priorities for inspections.
PIPP has been shown to be 3 to 4 times better than random selection in identifying ("predicting") future risk as reflected in the number of pipeline incidents.1 However, PIPP tends to underestimate risk (substantially) where the actual number of incidents is high, and overestimate risk (somewhat) where the number of incidents is low. This difference is illustrated in the the two PIPP score scatterplots in figure 1 for hazardous liquid pipelines and for natural gas pipelines, respectively.
The New Model
The new model predicts the number of pipeline incidents and the incident rate per mile of pipeline for each pipeline operator. To develop predictions, researchers took several years of historical data to run simulations—using, for example, data from 2002 to 2004 to "predict" 2005. The data were organized conceptually into three sets, each using different data; the results are reflected in the six remaining "risk" scatterplots in figure 1:
- The inherent risk associated with the pipeline—represented by physical and operating characteristics such as age, materials and coatings, diameter, location, and throughput—is estimated using annual reports submitted by each pipeline operator.2 Inherent risk should be independent of how the pipeline is managed and maintained.
- The performance risk associated with the operator (i.e, the company)—represented by safety deficiencies—is estimated using the results of past safety inspections—particularly those with the broadest scope, known as Integrity Management (or IM) inspections.3 Performance risk should be independent of the pipeline characteristics.
- The historical risk associated with past incidents is estimated from incident data reported to PHMSA by operators.4 Historical risk is assumed to reflect the combination of both inherent risk of the pipe and performance risk of the operator.
Each set of data generated separate predictions of future incidents that were also combined into a single prediction for each operator. The diagonal line in each graph in figure 1 represents perfect prediction in which the predicted number of incidents equals the actual number of incidents. The further the data points are from the diagonal line, the poorer the performance of the predictive model. Gas transmission operators were separated from hazardous liquid operators, as they are in PIPP, because they present very different system profiles, different risks, different data, and different numbers of incidents (see table 2). Other breakouts might also make sense (e.g., by product for liquid pipelines, or onshore v. offshore pipeline) but the research has not explored these.
For presentation purposes, small operators (with less than 500 miles of pipeline) were separated from large operators because their operating environment tends to be different and the relatively lower number of incidents makes the results somewhat less reliable. The analysis behind all the models were performed in the statistical software package SAS 9.1.
Three key characteristics of the data influenced the choice of statistical models:
- Incidents occur infrequently, so the models would have to deal well with small numbers.
- The number of incidents is a count value, with no fractional or negative values.
- The number of incidents per operator is highly skewed, with a large number of operators having zero incidents in any given year.
Traditional linear regression, which relies on the assumption of normally distributed data, is inappropriate for count data that are highly skewed towards zero. Two other models—the Poisson distribution and negative binomial regression5—can handle such data. Another important quality of these two models is their ability to control for exposure variables, such as miles of pipeline. The negative binomial is the more general model, and this was used to detect and weight risk variables for both inherent risk and performance risk.6
The analysis of the historical risk associated with past incidents presented a different set of conditions. The past 3 years of incidents and the next (to-be-predicted) year of incidents most likely are not independent from one another, so the data were transformed to create an "orthogonal" regression model that would allow modeling the 3 years of incidents together to estimate future risk. 7
Each of these major outputs—inherent risk, performance risk, and historical risk—provide a separate prediction of risk, but they can also be combined to present a single estimate. The approach taken here was to take the average of the three results.8 Other possibilities not examined here might use another model to weight these three as inputs to an overall risk score, again letting the data weight the data, or developing an equation that might relate any one output to the other two. Figure 1 provides a graphical synopsis of the predictive accuracy for estimating the number of accidents per operator based on PIPP scores, inherent risk, operator risk, and historical risk.
The predictive quality of each model tested was compared using a standard statistical measure of error—the mean absolute deviation (MAD)—which averages the absolute difference between the predicted value and the actual value for each operator (see table 3). For example, when the model predicts 7.5 incidents and 5 actually occur, the error is 2.5; when the model predicts 4 incidents and 5 actually occur, the error is 1. MAD provides a sense of "how far off" the model predictions are from the actual values.
Testing Inputs to the Model
A key indicator for the effectiveness of any new model was its ability to predict risk better than the existing judgment-weighted model (PIPP ranking). In practice, this should be fairly easy because a statistical model could simply reweight the 10 input variables in PIPP or the 9 transformed variables for a better prediction using data-weighting. Other obvious inputs to test included:
- the nave model (which says that what happened last year is likely to happen again next year);
- mileage alone (which suggests that the extent of the system might be the most important indicator of the risk of incidents);
- the input variables into PIPP—reweighted using the new statistical procedures;
- the output variables (L-scores) from PIPP before the PIPP ranking is calculated—reweighted using the new statistical procedures; and
- each of the new indicators of risk—estimating inherent risk associated with the pipeline, performance risk associated with the operator, and historical risk associated with past incidents.
The results demonstrate that PIPP performs the worst in targeting risk, and that reweighting the PIPP variables can improve the predictive quality (reduce the error). Surprisingly, mileage alone and the nave model both were better (smaller error) than PIPP in predicting future risk, but such simple models offer little guidance in selecting appropriate sites to inspect. The new model performed well (with a MAD of 1.0), although the analysis indicated noticeable differences between gas transmission operators and hazardous liquid operators. Hazardous liquid pipeline incidents are more prevalent and more concentrated (fewer operators), so the data provide a better basis for prediction.
The three main components of the new model—inherent risk, performance risk, and historical risk—performed about equally well in predicting future incidents.
Findings From the Modeling Research
Modeling inherent risk associated with the pipeline demonstrated that mileage, throughput (barrel-miles per year), date of installation, and pipeline diameter were significant risk factors. Six variables were significant in predicting future incidents for gas transmission systems, and 14 variables were significant for hazardous liquid systems. About half of these variables were negatively correlated with risk, meaning that they had a "protective effect." (Table 4 provides the listing of the significant variables for both models.)
Modeling performance risk associated with the operator demonstrated that a few key inspection areas from Integrity Management9 inspections were most highly correlated with future risk. One area (integrity assessment review) was negatively correlated, suggesting that finding deficiencies in this area helped an operator rapidly improve its safety program. The most significant risk factor was in the area of continual evaluation and assessment—which inspection staff have suggested might be a critical indicator of an operator's safety program.
Modeling historical risk associated with past incidents demonstrated that the passage of time rapidly degrades the utility of the data. After 2 years, past incidents do not appear to be useful in predicting future risk. The most recent year is most important, and the model weights this year most heavily.
Significant Data and Modeling Issues
While the model demonstrates the general effectiveness of statistical tools as an alternative to judgment-weighting, several important data limitations and modeling issues remain to be addressed. Some of the more important issues are listed here:
- Data on operators' systems and operator relationships reflect a snapshot in time; changes might not be captured for up to a year, so some data are outdated.
- Deficiency data from inspections are largely limited to one major type of inspection—Integrity Management inspections—representing only a small portion of the inspections conducted.
- The model does not differentiate more serious incidents (the focus of the agency's performance goals) from those with less severe consequences (actual or potential).
- The model introduces an exponential function that can dramatically over-predict incidents when new data are outside the historical range.
- Small numbers of incidents each year limit the ability to isolate combinations of factors that might be statistically significant.
The first line of research, currently underway, is to refine the incident measures to reflect the consequences of incidents—to weight incidents by potential severity in terms of harm to people and/or the environment. Using conditional probabilities, we have found so far that three variables help explain whether an incident is likely to be serious: fire/explosion (indicating a violent incident), whether the incident occurred in a high consequence area (indicating proximity to people), and incident cause (e.g., corrosion or excavation damage).
Some general model improvements are planned as well. These would separate out onshore v. offshore systems, interstate v. intrastate operators, and certain commodities that have special risk characteristics. The relationship between inherent risk, performance risk, and historical risk needs to be further explored and modeled. The issue of total number of incidents v. the rate of incidents per mile needs to be addressed; it is not clear which is more important in targeting inspections. And operator relationships—where some operators are part of a larger group of operators that share certain plans and management—need to be addressed because some inspections are targeted at this higher corporate level.
There are several areas where the measures for inherent risk, performance risk, and historical risk could be enhanced. Improvement would include targeted analyses of certain key variables to better understand why they are or aren't significant risk factors, adding more inspection data, and testing the time-sensitivity of inspection data.
After refinements are made, the model needs to be validated with data from other years, uncertainty should be incorporated into the results, and PHMSA program staff need to be involved in formulating the best presentation of results for the intended use—targeting and focusing inspections.
A parallel effort will extend the concepts from this modeling effort to another safety program—hazardous materials transportation safety—which cuts across four other modes of transportation. The model might be more generally applicable in other federal safety programs as well.
1 By scaling PIPP scores to the number of actual incidents, predictive quality was measured by the correct "hits" to determine the percent correct. This was compared to a random selection model where each operator was simply assigned an equal share of points.
3 Deficiency data are captured at the point of inspection for Integrity Management (IM) inspections of pipeline operators. Where deficiencies are serious, PHMSA pursues enforcement action. Data on these actions are available at www.phmsa.dot.gov.
5 In a recent review of the Motor Carrier Safety Status Measurement System, or SAFESTAT, model used by the Federal Motor Carrier Safety Administration, the Government Accountability Office (GAO) recommended a negative binomial regression in place of expert opinion to weight the risk factors used in targeting motor carrier safety inspections. This work by GAO was a strong factor in the risk modeling effort by BTS and PHMSA. See Motor Carrier Safety: A Statistical Approach Will Better Identify Commercial Carriers That Pose High Crash Risks Than Does the Current Federal Approach, June 2007 (GAO-07-585).
6 For a good explanation of the Poisson and negative binomial models and how they are estimated in SAS, see Logistic Regression Using SAS: Theory and Application, by Paul D. Allison, 1999 (SAS Institute Inc.).
7 "Orthogonal variables" are linearly independent. For details on orthogonal regression, see A. Stuart, J.K. Ord, and S.F. Arnold. 1999. Kendall's Advanced Theory of Statistics, 6th ed. London: Edward Arnold, pp. 764-766.
8 Although historical risk—using incident data—might reflect the nexus of the inherent risk associated with the pipeline and the performance risk associated with the operator, using equal weights to average provides a simple approximation of overall risk. Other statistical methods might provide a better way to combine these factors.
9 The Integrity Management program was introduced over the last several years, first for hazardous liquid pipelines then later for gas transmission pipelines. This program requires pipeline operators to identify and understand the risks in their systems, identify high consequence geographic areas, establish programs for inspecting and repairing pipelines, and continuously monitoring their systems.
About this Report
This report is the result of joint research by Rick Kowalewski, Senior Advisor of the Pipeline and Hazardous Materials Safety Administration (PHMSA), and Peg Young, Statistician for the Bureau of Transportation Statistics (BTS).
For related BTS data and publications: www.bts.gov