Methodology: Seasonal Adjustment of the Transportation Services Index Time Series Data
The data underlying the Transportation Services Index (TSI) are seasonally adjusted by the Bureau of Transportation Statistics (BTS) using the methodology discussed here. This seasonal adjustment enables consistent comparisons of data between time periods.
Origin
To seasonally adjust transportation services time-series data, BTS adopted X12-ARIMA, Release 0.2, which was created by the U.S. Department of Commerce, U.S. Census Bureau. X12-ARIMA grew out of X11-ARIMA, which in turn originated from Census X-11 software developed at the Census Bureau in the 1950s and 1960s. The basic approach of X-11 is undoubtedly the most widely used statistical method for seasonal adjustment.[1]
Isolating Seasonality
The basic model of this approach is to decompose the time series into three components: trend, seasonality, and irregular (including cyclic phenomena). By a series of iterative steps, the seasonality component is eventually isolated and removed from the original data series. In applying this methodology to the transportation services’ time-series data, we found that each element of the Transportation Services Index—rail passenger, rail freight, pipeline (petroleum and natural gas), transit, waterborne, trucking, and aviation (passenger and freight)—displays strong seasonality patterns. However, in some series the seasonality was less pronounced than in others. For example, transit seasonality must be isolated from a background of considerable fluctuations or noise. When there is a great deal of fluctuation in the data, as in this instance, the seasonality component is much smaller relative to the other components (trend and irregular).
Holiday and Trading-Day Effects
The seasonality component in a typical time series is a familiar pattern of peaks and troughs that consistently repeat at certain months. Another source of regularity in the data is the effects of holidays and peculiarities in the calendar. Some months are longer than other months, which can affect the monthly totals of output services. Because these effects can be identified and measured, they are typically grouped with the seasonality component. We call the effects, such as month length and months with additional weekends, trading-day effects. Not surprisingly, trading-day effects and holidays influence the data in many of the time series used as components in the Transportation Services Index.
Outlier Adjustments
In all of the transportation services’ time series it was necessary to make adjustments to the defaults offered in the software. One of the series used in the Transportation Services Index (waterborne) required an adjustment to the trend-line estimation based on the software default. Additionally, each of the series had too many extreme observations, or outliers, when using the default for estimating the trend line. Obviously, too many outliers can bring the whole model down. We had as our objective to ensure that no more than one out of eight data observations were of the extreme variety. The criterion of one out of eight comes from best practices in the field. For the most part we met that objective.
Small Outliers
We limited the number of outlier observations by adjusting the program’s weights for the relatively less extreme observations (i.e., less extreme outliers). These outliers as a rule receive weights less than 1.0 (fully weighted) and greater than 0.0 (no weight). However, in computing the estimates, preliminary and final, of the trend and the seasonality components, we did not treat the relatively less extreme observations as true outliers; we gave them a weight of 1.0 rather than the partial weight the program would have assigned. The weighting method we used is justified with transportation services’ time series data because the 10 transportation time-series data sets are strongly impacted by the vicissitudes of weather patterns as well as a host of unknown variables, making transportation time-series data inherently variable. Our experience has shown that nearly all time series need a similar weighting adjustment; the weighting defaults in X-12 can rarely be left alone without creating too many outliers.
Extreme Outliers
Using a variety of procedures, including a hard look at changes in the monthly data, we detected several extreme outliers. An example of an extreme outlier that strongly impacted many of the time-series data is the terrorist attacks of September 11, 2001. The impact of 9/11 was particularly powerful on aviation passenger and freight outputs.
We intend in the future to hard-code the most extreme observations as outliers when we have external information (e.g., catastrophe, strike, weather) that points to their origin. For purposes of identifying the seasonality component and measuring its effect, the extreme outliers—whether found by the X-12 program or other sources—are temporarily removed from the data series while computations for measuring seasonality are in progress. Outlier hard-coding removes any uncertainty as to whether a particular observation is an outlier, thereby making the seasonality pattern easier to discern and measure. A few extreme outliers found by the program series are noted in the data summaries that follow. Future data summaries will identify all hard-coded outliers.
Modeling the Time Series: Multiplicative or Additive?
A series is modeled in one of two forms: multiplicative or additive. In practice, when the peaks and troughs remain constant as the trend increases or decreases, the additive model is used. When the peaks and troughs expand as the trend increases or decreases, the multiplicative model will fit the time series better. In the seasonality adjustment for the May 2004 index, we determined the best model using the Census Bureau’s X12-ARIMA monthly seasonal adjustment method. Previous adjustments had used the multiplicative model. However, it was found that five of the series were best rendered by an additive model: trucking, transit, rail freight (carloads and intermodals), and waterborne. In these instances, the model adopted was changed to “additive” for the duration of the Transportation Services Index’s experimental phase.
Moving Seasonality
Adjustments were made to the seasonally adjusted data based on several criteria developed to assess the basic model, which are given as measures M1 through M11. These criteria involve the way the trend line is estimated, the amount and type of variation of the irregular component, and the stability of the model, especially indications of moving seasonality. The latter occurs when the high and low months and undulations in the data are shifting, thereby introducing additional variability. Although moving seasonality should be a matter of concern, its existence does not preclude executing seasonal adjustments.
Meeting Criterion
Failing a single criterion by no means implies that the model is not good, just as passing all the criteria does not ensure a good adjustment. These measures only serve as aids to guide the adjustment. A criterion is met if it is less than 1.0; failure means that the measure was equal to or greater than 1.0. Gradually, with experience, we have learned to put the most emphasis on one measure, M7, which determines if there is any identifiable seasonality, and to be very indulgent with all the others. A failure in the other measures may mean, for example, a weak trend line, or a lot of autocorrelation, or shifting seasonality; nevertheless, although these other commonly computed measures may fail for some of the Transportation Services Index series, they do not act as show-stoppers. By using M7 as the dominant criterion, all the series used in the TSI display very strong seasonality components.
Months to Cyclical Dominance (MCD)
One of the measures, M5, utilizes the concept of months to cyclical dominance (MCD). This is an indicator of how strong the trend is relative to the irregular component. MCD is defined as the number of months it takes before variations in the trend are expected to become larger than variations in the irregular component. When the irregular is under control, the MCD is about 3. An MCD of 3 would mean it takes approximately 3 months before changes in the seasonality adjusted series can be attributed to changes in the underlying trend rather than the irregular. For the Transportation Services Index, the MCDs of the 10 individual time series range in value from 3 to 12. These are noted in summaries of each series that follows this introduction.
Standardizing Each Series
The nearly dozen time series used to create the Transportation Services Index had varying amounts of historical data. Some series began as early as 1973. Earlier data may be useful for historical purposes, but are no help in seasonally adjusting data in recent years. Why? The answer is that the X12 ARIMA program doesn’t allow the earlier data to carry much weight for the years at the end of the series. For this reason, with the exception of transit, all time series used in the current (2004) TSI begin at January 1990. This is more than sufficient time to obtain a good seasonal adjustment. Moreover, data prior to 1990 could only have an influence on the seasonal adjustment if they contained one or more extreme outliers, which the TSI strives to eliminate from the data series during the computation phase. Therefore, it is best to eliminate this potential problem by eliminating old data.
Waterborne
Comparing 12-month periods in the series, it is clear that February is the low point (January is also frequently low) and October most often the high point. No substantial amount of moving seasonality was detected. December 2003 saw a 12-percent increase, and this made the modeling more difficult. At this time, it isn’t known if this increase is an outlier. Extreme outliers include January and May 1995, October 1997, and January 2000. The years 1995 and 2000 were more erratic than the other years, but no level shifts were detected.
The waterborne series is composed chiefly of three series: coal, petroleum, and farm products. These series are largely independent of each other. The coal series is not seasonal, while the farm products series is very unstable. After January 1994, the coal portion was left alone (i.e., not seasonally adjusted), the remainder was seasonally adjusted, and then coal was added back in. An additive model was used to fit the series. The months to cyclical dominance (MCD) value for waterborne is exceptionally high: 12. (The average MCD value of the ten series is 5.3 for the February 2005/March 2005 data.) This means it takes 12 months or one year for changes in the trend to equal changes in the irregular, indicating that the trend in waterborne is very weak or the irregular component is very strong, or both. An examination of the graph of the irregular component of waterborne without coal indicates this series to be the most variable of all 10 transportation series included in the Transportation Services Index.
NOTE: For the March 2005 TSI, no data values for waterborne were required to be forecasted and used in the model. Indeed, data were available for waterborne through April 2005.
Rail
Freight. Two time series constitute rail freight: carloads and intermodal. For both time series, we find a double peak in August and October with October as the larger of the two. February is reliably the low month. The holidays were not a significant factor in the seasonality component in either series with the exception of Easter and Thanksgiving in intermodal. Notable outliers for carloads were August 1993, and January in 1996 and 1998. We haven’t seen extreme outliers in carloads for the last six years. For intermodal, April 1991, June 1992, October 2002, June 2004, and September 2004 were extreme outliers. Over time it will become clear whether or not the September 2004 value is a true outlier. Trading-day effects were not significant in either series. Rail freight is very seasonal; the adjustment in both series reveals clear trend lines and stable seasonality. No significant moving seasonality was detected in either series. The months to cyclical dominance (MCD) values are 5 and 3 for carloads and intermodal, respectively. The carloads’ MCD was average for the 10 transportation series used in creating the Transportation Services Index. Both series receive a “pass” on all diagnostics, M1 through M11.
Each of the rail time series was first seasonally adjusted and then combined into a rail output index. Additive models were both used to fit carloads and intermodal time series.
NOTE: For the March 2005 TSI, no data values in either series (carloads and intermodal) were needed to be forecasted and used in the model. Indeed, data were available through April 2005.
Passenger. Like freight data, rail passenger data are also well behaved with a clear trend and consistent and stable seasonality. Rail passenger adjusts similarly to rail freight. Trading-day effects were not significant, but Easter week, Labor Day, and Thanksgiving were determined significant factors in rail passenger. No significant moving seasonality was detected. The high points of the year are July and August; the low months were in the winter with February most often the lowest. Extreme outliers found in the model were April 1991, June 1992, and February 1998. Like rail freight, rail passenger easily passed the diagnostic standards, M1 through M11. An MCD value of 4 is good. Rail passenger data are best modeled with the multiplicative model.
NOTE: For the March 2005 TSI, one data value was forecasted and used in the model for rail passenger.
Pipeline
Petroleum. Petroleum and petroleum products monthly amounts show a distinct pattern of February as consistently the low month. July, August, December, and January are the peak months, with December slightly more often the annual high month. Significant effects for trading days or holidays were not detected. The number of outliers was reduced by adjusting the weighting as in the other series. In January 1985, a new upward trend occurred. When seasonality is removed, the monthly increase (i.e., from December 1984 to January 1985) was 37 percent from the previous month. This level is sustained over the following months, never to return to its previous level. This change in the trend was one reason why we start the series a little later at January 1990. Extreme outliers include March 1991, February 1992, October 1994, and February 2000. February occurs as a moderate and extreme outlier quite frequently, indicating that the model is compromised to some degree by the relatively extreme results in February.
The criterion for each of the diagnostic measures, M1 through M11, was met with the exception of M4, which indicates autocorrelation in the irregular component of the model. We now believe this is not a problem in the deseasonalized data and did not attempt to eliminate it. No significant amount of moving seasonality was detected. The MCD value was 7, which is the second highest value in all the transportation series used in Transportation Services Index. Multiplicative models were used for both series of Pipeline.
NOTE: For the March 2005 TSI, one data point was forecasted and used in the model.
Natural gas. The peaks and low months in natural gas complement the petroleum time series. January is most often the high month for gas usage; December and February are also quite high. The low months occur between June and September. The final adjustment included a change in the weighting to manage the number of outliers. No trading-day effects or holiday effects were significant. Three extreme outliers were February 1994, December 2000, and October 2001. Pipeline gas behaved the same way as petroleum with respect to the diagnostic measures, M1 through M11. The criterion for each of the diagnostic measures was met with the exception of M4, which indicates autocorrelation in the irregular component of the model. We now believe this is not a problem in the deseasonalized data and did not attempt to eliminate it. While strong seasonality is not in doubt, it is confounded by the existence of significant moving seasonality, which was detected at the 5 percent level. This means the seasonality tended to shift, adding an element of uncertainty to the adjustment. The irregular component is quite large in natural gas; the number of months to cyclical dominance (MCD) is 9. Because pipeline petroleum had an MCD value of 7 and pipeline gas had an MCD value of 9, pipeline data, in general, shows strong irregularity components and/or weak underlying trends.
NOTE: For the March 2005 TSI, one data point was forecasted and used in the model.
Trucking
October appears to be the most frequent peak month in the calendar year. Other high points are March and June. The low points occur during the late fall and winter months from November through February; December is most often the low month. The seasonality aspect is very strong and consistent throughout the decades. No evidence of significant moving seasonality was detected. Only two extreme outliers were found---April 1994 and December 1994. At this time, there is no explanation for the surprisingly large output for December 1994, where the irregular component was 7 standard deviations above the average of the series. The outliers appear to be concentrated in the year 1994, indicating that this year was not normal. The average monthly standard deviation for the year 1994 exceeded three standard deviations. Although in the past Trucking had been affected by trading-day effects and Easter week and Thanksgiving, now it appears that only trading day exerts substantial influence. The irregular component is relatively small in trucking; the number of months to cyclical dominance (MCD) is 3. The deseasonality conducted on the trucking data met all diagnostic measures, M1 through M11. For the November Transportation Services Index, the model for trucking was changed to an additive model. The multiplicative model was more valid in the past because of the lower fluctuations of the series apparent during the earlier years. As the series has progressed over time, the fluctuations are becoming approximately constant as the trend increases.
NOTE: For the March 2005 TSI, no data values were forecasted and used in the model.
Transit
An examination of the raw data going back to 1979 shows evidence of seasonality. The lowest month in ridership is sometimes July and other times February, but often other months are lowest. The high month is most often October. The pattern of seasonality is statistically significant, but the frequent shifts in the troughs and peaks make the seasonal adjustment a real challenge. Although the series is thus highly erratic, there is a little more stability in recent data. For this reason, we decided to work with just the last 11 years. The day of the week plays a strong role, particularly Thursday through Sunday, with Sunday having an especially large impact. Surprisingly, holiday effects were not worth hard-coding because the day of the week absorbed most of the holiday effects. No significant moving seasonality was detected. Each of the diagnostic measures passed the criterion. Two extreme outliers were found in June: 1994 and 1998. The other two extreme outliers were December 1993 and April 1995. The months to cyclical dominance equals 5. The additive model was used to fit transit time series data.
NOTE: The most recent data available for transit is December 2004. To make up for the data gap, three data points were required to be forecasted for the March 2005 Transportation Services Index. Normally, Transit lags well behind the other series in up-to-date data. In the December 2004 update, 8 years of transit data were revised.
Aviation
Freight. The seasonality pattern in aviation freight shows high points most often in October and low points in February. Since about 1986 there has been a steady increase in tons carried. As this trend has risen, the seasonal fluctuation has increased markedly. These data were relatively easy to seasonally adjust. Only the weighting was adjusted slightly to reduce the number of outliers. No significant moving seasonality was detected. Aviation freight is strongly influenced by trading-day effects. For the most part, holiday effects were absent with the notable exception of Thanksgiving. A very significant outlier was September 2001, which is to be expected after the 9/11 catastrophe. The value for September 2001 dropped 12% from the previous month in the seasonally adjusted series. Other extreme, or nearly extreme, outliers include: February 1993, June 2001, and several concentrated in 2002: August, September, and October. The last outlier in October 2002, 4.5 standard deviations from the mean, increased from the previous month by 25% after adjusting for seasonality. Aviation freight easily passed the diagnostic standards, M1 through M11. The value for months to cyclical dominance (MCD) was 3. A multiplicative model was used to model the Aviation Freight data series.
NOTE: For the March 2005 TSI, no data points were forecasted and used in the model. Indeed, data were available through April 2005.
Passenger. August is generally the highest month for passenger data, although occasionally July and September were the high for a particular year. February is consistently the low month. The seasonality of aviation passenger traffic is very evident. The defaults in the seasonality adjustment software work well, except for the weighting. A slight adjustment was made in the latter as in the aviation freight data to reduce the number of outliers. Moving holiday adjustment factors were applied directly to the final seasonally adjusted series. Unlike aviation freight, the trading-day effects were not significant for aviation passenger. Moving seasonality was detected at the 1 percent level. Like aviation freight, aviation passenger was impacted strongly by 9/11, where there was a 30-percent drop after seasonality was adjusted. In fact, much of 2001 was unusual with outliers from July through October, and where the August through October outliers were extreme. The only other extreme outliers were back in February and March 1991. No other extreme outliers were uncovered. Aviation passenger data easily passed the diagnostic standards, M1 through M11. The value for MCD was 4. A multiplicative model was used to model the aviation passenger data series.
NOTE: For the March 2005 TSI, no data points were forecasted and used for determining the seasonal factors. Indeed, data were available through April 2005.
[1] For a good overview of the X-11 method, see D. Ladiray and B. Quenneville. Seasonal Adjustment with the X-11 Method. 2001. New York: Springer-Verlag.
Transportation Service Index Composition
This table gives basic information for the time series data used to compose the Transportation Services Index (TSI). Six of the 10 series have data to at least March 2005. The other series had missing data that had to be forecasted (see column 4). Five of the series, Waterborne, and Rail Freight (Carloads and Intermodals), Aviation Freight and Aviation Passenger have more recent data up to April 2005.
Excel | CSV
| Rail |
|
|
|
|
|
| Passenger |
No |
4 |
1 |
M |
No TD; all moving holidays |
| Rail Freight |
|
|
|
|
|
| Carloads |
No |
5 |
None |
A |
No TD, no holidays |
| Intermodals |
No |
3 |
None |
A |
No TD; Easter and Thanksgiving |
| Trucking |
No |
3 |
None |
A |
TD; no holidays |
| Waterborne |
No |
12 |
None |
A |
No TD, no holidays |
| Transit |
No |
5 |
3 |
A |
TD; no holidays |
| Aviation |
|
|
|
|
|
| Freight |
No |
3 |
None |
M |
TD and Thanksgiving |
| Passenger |
Yes, at 1% level |
4 |
None |
M |
No TD, all moving holidays |
| Pipeline |
|
|
|
|
|
| Natural gas |
Yes at 5% level |
9 |
1 |
M |
TD; no holidays |
| Petroleum |
No |
7 |
1 |
M |
No TD, no holidays |
1 MCD = months to cyclical dominance. An MCD value of 3 means it takes 3 months before a change in the trend “stands out” and surpasses a change in the irregularity component of the model. Note that 4 and 5 months are normal for trends in these transportation services series; an MCD higher than 5 months shows relatively more instability, making the seasonality measured less exact and reliable.
2 M = Multiplicative; A = Additive
3 TD = Trading Days
4 (Moving) Holidays = Easter, Labor Day, & Thanksgiving
|