Accounting for Uncertainty in Estimates of Total Traffic Volume: An Empirical Bayes Approach

Wednesday, November 30, 2011

Gary Davis*
University of Minnesota

Shimin Yang
The St. Paul Companies

Abstract

Predicted or estimated totals of traffic volume over one or more years are required in both highway pavement and safety engineering. While current recommended practices contain guidance on how to generate such estimates, they are less clear on how to quantify the uncertainty attached to the estimates. This paper describes an initial solution to this problem. Empirical Bayes methods are used to compute quantiles of the predictive probability distribution of the traffic total at a highway site, given a sample of daily traffic volumes from that site. Probable ranges and their associated probability values are readily found, and a point prediction of the total traffic can be obtained as the median, or 50th percentile, of the predictive distribution. The method of derivation can also be used to find the predictive density and the moments of the predictive distribution if needed. No data other than those routinely collected by statewide traffic monitoring programs are needed. A test comparing computed 90% credible intervals for annual traffic volume with the corresponding actual volume at 48 automatic traffic recorder sites showed that the actual coverage percentage was not significantly different from the nominal 90% value.

Introduction

In engineering design, it is sometimes necessary to work with variables whose values are not known with certainty. In such cases, a rational compromise between over- and under-design requires first the determination of a probable range for a variable's outcome and then a design to accommodate values in this range. Implementation hinges on an acceptably accurate assessment of this probable range because if the assessed range is too narrow, the likelihood of a failure will be unacceptably high, but if the assessed range is too broad, resources will be expended in anticipation of improbable events. In addition, the users of scientific and engineering measurements often expect an assessment of the uncertainty attached to those measurements. The need for an assessment of uncertainty is revealed in the reporting of error bounds associated with opinion survey estimates, in the U.S. Supreme Court's recommendation that judges consider "known or potential rate of error" when determining the admissibility of expert scientific testimony (Foster and Huber 1999), and in the recommendation by the American Association of State Highway and Transportation Officials (AASHTO) that the "precision and bias" attached to traffic volume measurements be assessed and reported (AASHTO 1992).

The ability to justify an uncertainty assessment becomes especially important when either particular values of an estimate or the uncertainty range itself could be used by partisans to justify or oppose a controversial policy (Sarewitz and Pielke 2000). In a discussion of scientific predictions and social policy, Stewart (2000) found it useful to distinguish between two sources of uncertainty, which he called aleatory and epistemic uncertainties. Aleatory uncertainty arises when the outcomes of interest are governed by physically random processes that are at least in principle capable of generating stable long-run relative frequencies. Epistemic uncertainty, on the other hand, arises when our knowledge of underlying states of nature is incomplete. This duality in the notion of uncertainty appears to date back to the origin of modern ideas concerning probability (Hacking 1975) and is arguably the source of the current debate between Bayesian and frequentist views on the foundations of statistics (Howson and Urbach 1993).

Estimates or forecasts of the total traffic volume on a section of road for one or more years are used in both pavement design and traffic safety analysis and are also used to generate state- and nationwide estimates of total distance traveled. These are most often computed by multiplying an estimate of the road's mean daily traffic (MDT) volume by the number of days in the desired time horizon. For example, in pavement design (AASHTO 1993) estimates of MDT for each vehicle class are multiplied by 365 to obtain estimated yearly traffic totals, which are in turn used to predict the traffic loading over a pavement's design life. In traffic safety, the traffic exposure at a road site is computed by multiplying an estimate of MDT by the total number of days over which traffic accidents have been counted. For an intersection, these traffic totals are then summed over the intersection's approaches to give the total entering vehicles. For a highway section, the traffic total is multiplied by the section's length, producing an estimate of total vehicle kilometers of travel. Clearly, aleatory uncertainty is attached to an estimate of total traffic because, even if we knew a site's MDT exactly, the estimated traffic total and the actual total would likely differ, due to the unpredictable decisions of individual travelers. Epistemic uncertainty is present when the true MDT is not known exactly but has been estimated from a sample of daily traffic counts. AASHTO's (1993) recommended pavement design method explicitly allows for aleatory uncertainty as one of the components making up the overall variation term in the pavement design equation; however, epistemic uncertainty is not addressed. In traffic safety, current practices address the uncertainty in estimated accident rates due to the random nature of accident counts but do not appear to consider the contributions of either aleatory or epistemic uncertainty when estimating exposure (see Parker 1991).

Draper (1995) has illustrated how an accounting of multiple sources of uncertainty can be accomplished using Bayesian statistical methods, and in this paper we will consider the problem of assessing the uncertainty attached to estimates or forecasts of the total traffic volume. The second section will illustrate, using a simple example, how a Bayesian approach can be used to combine the contributions of aleatory and epistemic uncertainties into one assessment. The reasoning illustrated in that section will then be applied in the next section to develop an expression for the predictive distribution of a traffic total, which can be used to compute both point and interval estimates. The fourth section will then describe an initial empirical evaluation of this estimation method, and the final section will present conclusions. The development described in the third section draws heavily on past research into statistical models for time series of daily traffic counts (Davis and Guan 1996) and on weak convergence results for sums of lognormal random variables (Marlow 1967).

Illustrating the Sources of Uncertainty in Total Traffic Estimation

Consider the problem of estimating the total traffic volume over a period of N days, using a short count collected with a portable traffic counter. Let

z_t = traffic volume on day t, t=1,...,N,

the total traffic volume over days t=1,...,N,

z_l = traffic count on the lth sample day, l=1,...,n,

the sample average,

n-dimensional vector containing the sample counts.

The ultimate objective is to estimate the total traffic volume z^N from the traffic count sample z. For this example, we will assume that the daily traffic counts are independent and identically distributed normal, random variables with common mean μ; and common variance . The mean value μ; is the site's MDT, and we will assume that it is unknown. However, to avoid technical sidetracks, we will assume the variance of the daily volumes, , is known. We will also assume, for simplicity, that the sampled days are not part of the time period over which the total traffic is desired. Aleatory uncertainty will be present even when the MDT is known exactly. It is straightforward to show that given μ and , the probability distribution of the total traffic count z^N is normal with mean equal to and variance equal to . But, as noted earlier, we do not generally know μ but must estimate its value from the count sample, leading to epistemic uncertainty. If we assume that, prior to collecting our sample z, our uncertainty concerning the MDT parameter is characterized by a prior probability density , Bayes Theorem can be used to find the probability density characterizing this uncertainty after collecting the sample z. That is

function (lowercase mu given lowercase z and lowercase sigma superscript {2}) equals [function (lowercase z given lowercase mu and lowercase sigma superscript {2}) times function (lowercase mu)] divided by [integral of function (lowercase z given lowercase mu and lowercase sigma superscript {2}) times (function lowercase mu) times (derivative of lowercase mu)]

In particular, if is noninformative in the sense of being uniformly distributed on the real line, it can be shown that the posterior uncertainty concerning μ is characterized by a normal distribution with mean equal to the sample average and variance equal to (Box and Tiao 1973). The joint effect of aleatory and epistemic uncertainty can then be determined by treating μ as a nuisance parameter and integrating it out of the joint density for z^N and μ

function (lowercase z superscript {uppercase n} given lowercase z and lowercase sigma superscript {2}) equals the integral from negative infinity to positive infinity [function (lowercase z superscript {uppercase n} given lowercase mu and lowercase sigma superscript {2}) times function (lowercase mu given lowercase z and lowercase sigma superscript {2}) times (derivative of lowercase mu)]

(Draper 1995). Here denotes the predictive probability density of the total traffic given the count sample, while denotes the predictive probability density of the total traffic when the MDT is known. For this example, closed form evaluation of (2) is possible (Box and Tiao 1973), leading to the conclusion that the predictive probability density is normal with mean equal to theand variance given by

variance (lowercase z superscript {uppercase n} given lowercase z and lowercase sigma superscript {2}) equals [uppercase n times (uppercase n plus lowercase n) times lowercase sigma superscript {2}] divided by lowercase n equals uppercase n times lowercase sigma superscript {2} plus uppercase n superscript {2} times (lowercase sigma superscript {2} divided by lowercase n)

In this case, the contributions to the total variance attributable to aleatory and epistemic uncertainty can be separated, with as variance due to aleatory uncertainty and as variance due to uncertainty concerning the MDT. Interestingly, while the variance due to aleatory uncertainty increases linearly with the number of days in the traffic total, the variance due to epistemic uncertainty increases quadratically. To see the relative contributions of these sources, suppose that we seek to predict one year's total traffic volume using a 10-day sample count and that the daily traffic volume has an MDT of 1,000 vehicles per day and a coefficient of variation equal to 0.1. The day-to-day variance would be , so the standard deviation due to aleatory uncertainty would be which equals 1,910 vehicles. The standard deviation due to epistemic uncertainty would be equal to 11,540 vehicles. This example illustrates how epistemic uncertainty can be the dominant source of error and that neglecting its contribution can lead to a serious overstatement of a prediction's precision.

Because the main objective of this paper is to show how a more complete accounting of uncertainty can be added to current traffic monitoring practices, we describe these practices next. The chief purpose of a traffic monitoring program is to generate estimates of MDT on each of a jurisdiction's road segments. Ideally, this is done with year-round counting on each segment, but the cost of installing and maintaining such a comprehensive traffic monitoring system is prohibitive. Therefore, MDT estimates on the majority of road segments are obtained from samples gathered using portable traffic counters. Since traffic volumes vary systematically throughout the course of the year as well as across the days of the week, averages computed from short count samples are generally biased estimates of full year averages. However, if the magnitude of the bias is known, adjustments can be made. To determine these adjustments, most states employ a small number of permanent automatic traffic recorders (ATRs) placed on a representative sample of road segments. The daily traffic counts from the ATRs are used to cluster the ATRs into factor groups such that daily traffic volumes at sites in a factor group show similar seasonal and day-of-week variation patterns. The ATR counts are also used to estimate the seasonal and day-of-week factors characterizing each group. Each non-ATR road section is then assigned to one of these factor groups, and the variation factors characterizing the assigned group are used to adjust the short-count sample, providing a better estimate of the section's MDT. It is currently recommended that a suitably adjusted short count of 48 hours produces an estimate of MDT with acceptable precision (AASHTO 1992; USDOT FHWA 1995).

At least two sources of potential error can cause an estimated MDT to differ from a section's true (but unknown) MDT: sampling error, arising anytime the estimate is based on less than a complete census of the section's traffic volumes, and adjustment error, arising if the factors used to adjust the short-count sample differ from those which actually describe the sampled section's variation pattern. In a recent review of MDT estimation, Davis (1997b) pointed out that much of the earlier research used to justify the use of short counts for estimating MDT tended to underestimate the potential effect of adjustment error, and an analysis of the potential contributions from the two error sources indicated that adjustment error can plausibly be two to three times larger than sampling error. This analysis was consistent with recent empirical work by Sharma et al. (1996), which investigated the effect of adjustment errors in estimating MDT, as well as with work which highlighted the error caused by applying adjustment factors developed for traffic dominated by passenger cars to estimate the MDT of heavy trucks (Hallenbeck and Kim 1994; Cambridge Systematics 1994). The review also pointed out that both sampling and adjustment error can be explicitly accounted for within a hierarchical statistical model of the process generating the daily traffic counts and that this model can be used to develop an empirical Bayes (EB) estimator of MDT, which does not require that each roadway section be assigned a priori to a factor group (Davis and Guan 1996; Davis 1997a). Rather, a structure similar to that shown in equations (1) and (2) is used, in which the sample data are used to assess the posterior probabilities the sample site belongs to each factor group. The MDT is then estimated as a weighted average, with the factor group probabilities providing the weights. The next section describes how this hierarchical modeling approach can be extended to develop a method for computing the predictive distribution of a site's total traffic volume, rather than its MDT, given a traffic count sample at the site.

Deriving the Predictive Distribution for Total Traffic Volume

Using Bayes Theorem to assess the information provided by a sample and then integrating out nuisance parameters, the two steps exemplified in equations (1) and (2) provide the basic framework for deriving the predictive distribution of traffic totals from more realistic assumptions. In the above example, we derived a predictive probability density but here we will focus on the corresponding predictive distribution function The distribution function is more useful from a practical standpoint since it leads immediately to a method for finding the quantiles of the predictive distribution by solving equations of the form Since the expression for the cumulative distribution function turns out to have the form of a weighted average, an argument similar to that employed below could also be used to find the predictive probability density or the moments of the predictive distribution.

Aleatory Uncertainty

We will develop an explicit expression for in several steps. As in the example, the total traffic count z^N is determined as the sum of the daily counts z_t, so that the statistical properties of the daily traffic volumes will determine the form of the conditional distribution The first step is to characterize the statistical properties of the daily traffic volumes z_t. In order to develop plausible statistical models for the day-to-day fluctuations in traffic volumes at a particular site, a detailed analysis of daily traffic counts from 50 ATRs in Minnesota was undertaken, as described in Davis (1997a) and Davis and Guan (1996). This work indicated that variations in daily traffic volumes could be described using a lognormal regression model of the form

lowercase y subscript {lowercase t} equals lowercase u plus (summation from i equals 1 to 12 (uppercase delta subscript {lowercase t, i} times lowercase m subscript {lowercase k, i})) plus (summation from lowercase j equals 1 to 7 (lowercase delta subscript {lowercase t, j} times lowercase w subscript {lowercase k, j}) plus lowercase e subscript {lowercase t})

where

, the natural logarithm of a daily count,

u = expected log traffic count on a typical day,

, if the count z_t was made during month i, i=1,...,12, (0 otherwise)

m_k,i = correction term for month i, characteristic of factor group k,

= 1, if the count z_t was made on day-of-week j, j=1,...,7, (0 otherwise)

w_k,j = correction term for day-of-week j, characteristic of factor group k,

e_t = random error.

If we let denote a column vector containing the monthly and day-of-week adjustment terms for factor group k, and equation (4) can be written in a slightly simpler form:

In the above model, the mean value of the logarithm of the daily count varies according to month and day-of-week, and the magnitude of these variations depends on the factor group to which the site of interest belongs. Analysis of the regression residuals obtained after estimating the adjustment terms indicated that the error terms e_t were not independent but showed day-to-day dependencies, which could be described by a multiplicative autoregressive (AR) model of the form

Here the a_t are independent, identically distributed, normal, random variables with zero mean and common variance, and and are site-specific autoregressive coefficients.

The above model is parameterized by u, a mean-value parameter, the monthly and day-of-week adjustment terms, the variance of the e_t terms, which we will denote by and the autoregressive coefficients In the next step, we will assume we know the values of these parameters but nothing else about the site. Properties of lognormal random variables (Shimizu and Crow 1988) can be used to show that the expected value of the total traffic volume is

lowercase mu subscript {uppercase n} equals expected value (lowercase z superscript {uppercase n}) equals summation from t equals 1 to uppercase n (exponential (lowercase u plus (lowercase x subscript {lowercase t} times lowercase beta subscript {lowercase k}) plus (lowercase sigma superscript {2} divided by 2)))

and the variance of the total traffic volume is

Here denotes the correlation between e_t and e_s, which can be computed from (see Brockwell and Davis 1991). Since z^N is a sum of lognormal random variables, characterizing its probability distribution turns out to be very difficult even for the case N=2, and no result for general N is known (Johnson, Kotz, and Balakrishnan 1994). In most situations of practical interest, however, N will be equal to the number of days in one or more years, so an asymptotic approximation might prove useful. In the Appendix, a result due to Marlow (1967) is adapted to show that the cumulative distribution function of the random variable

(lowercase mu subscript {uppercase n} divided by lowercase nu subscript {uppercase n}) times natural logorithm (lowercase z superscript {uppercase n} divided by lowercase mu subscript {lowercase n})

converges to that of a standard normal random variable, implying that for large N, log_e(z^N) can be approximated by a normal random variable with mean equal to and variance equal to This, in turn, supports using a lognormal approximation for z^N,

where denotes the standard normal distribution function.

Epistemic Uncertainty

The sample z contains two types of information concerning z^N. On the one hand, it provides information concerning the model parameters. In principle, this information could be summarized by the posterior distribution function On the other hand, the correlated noise in equation (6) implies that any given daily count z_t is correlated with its neighbors, so knowing z_t allows us to more accurately predict neighboring values. If the noise equation (6) is stationary, and if the sample counts are sufficiently separated in time from the counts comprising z^N (such as might occur when trying to predict the total volume for 1999 using a sample taken in 1997) we can take the sample and the total count to be independent of each other. Then the sample provides information on the total only by providing information on the model parameters. The predictive distribution could then be found using a generalization of equation (2):

When the sample counts are correlated with counts comprising the total z^N, the expression (11) will only be approximate, with the approximation deteriorating with increasing overlap between the sample counts and the counts entering into the total. In principle, smoothing algorithms could be used to include dependency on z in (7) and (8), so that a z-dependent asymptotic approximation could be developed, but the necessary computational labor appears to be substantial. For the situations commonly encountered in highway and safety engineering, the number of counts entering into the total will be large compared to the size of the sample (for example, one or more years for N, compared to a 48-hour or two one-week counts for n), so that most of the aggregating counts will be separated from the sample counts by at least one month. For these cases, we conjecture that equation (11) will provide a suitably accurate approximation. Some empirical evidence supporting this conjecture will be presented in the fourth section.

The final steps involve characterizing the distribution and then finding a computationally feasible way to evaluate the (multidimensional) integral in (11). It turns out, however, that this problem is very similar to the problem of computing Bayes estimates of mean daily traffic described in Davis (1997a) and Davis and Guan (1996), and a similar solution can be employed here. The essence of this approach is to assess the prior uncertainty concerning the model parameters, and then use Bayes Theorem to account for information provided by the data sample.

As in Davis (1997a), we will assume that the highway agency has divided its road segments into a set of m factor groups and that estimates of the adjustments factors for each group, , k=1,...,m, are available. We will further assume that the agency maintains a total of M ATRs and that for each ATR estimates of the covariance parameters are also available. Straightforward procedures for computing these estimates from ATR data, using commonly available software packages, are described in Davis (1997a). Prior to collecting any data for a site, we will assume that our uncertainty concerning that site's parameters is captured by the prior probability distribution

where I_b(x) denotes the indicator function

I_b(x) = 1, if x=b, (0 otherwise).

Basically, this prior assumes that before collecting data we are completely uncertain of the value of u in the sense that our prior probability is uniformly distributed on the real line. For the adjustment term , we are certain it takes on one of the values characterizing our factor groups, but we are equally uncertain which of these is correct. Similarly, for we are certain one of the sets of values estimated from our ATR sites is correct, but prior to collecting data we are equally uncertain which one. Completing the specification of this prior by generating estimates of the from ATR data results in an empirical Bayes (EB) method, in the sense of Padgett and Robinson (1978). That is, empirical distributions from samples are used to form the priors.

Because the logarithms of the traffic counts are normal random variables, the likelihood function of the sample is easy to specify. Letting y denote the vector containing the logarithms of the sample counts and V denote the correlation matrix of y (which can be computed once the value of the AR parameters is known), then if we knew the site-specific values for the parameters , the likelihood of the sample could be computed using the appropriate multivariate normal density.

Here X is a matrix, of dimension N19, each row having elements equal to 0 or 1, according to the month and day-of-week of the corresponding sample count, while 1_n is an n-dimensional column vector with each element equal to 1.0.

Predictive Distribution of Total Traffic

Applying Bayes Theorem to the prior and likelihood to obtain the posterior distribution for the parameters, substituting this into (11), and performing the indicated integrations produces, after some tedious algebra,

where y^N=log_e(z^N), V_p denotes the sample correlation matrix computed using and

where are as defined in (7) and (8) but evaluated using The distribution given in (14) is a finite mixture of normal distributions where the weights given to the mixture components are the posterior probabilities that the sampled site has adjustment factors and covariance parameters characteristic of each the m factor groups and each of the M ATR sites. Although the expressions in (14) and (15) appear rather forbidding, the implied computations are readily carried out on a personal computer.

A Calibration Test

As noted above, the distribution (14) approximates the predictive distribution of a total traffic count, the approximation being appropriate when predicting the total of a large number of days (for example, a year or more) from a small sample (for example, two weeks or less). In an earlier study, Davis (1997a) used traffic counts from the year 1992 from 50 ATRs in outstate Minnesota to estimate monthly and day-of-week adjustment terms for the Minnesota Department of Transportation's (Mn/DOT) 3 outstate factor groups, as well as covariance parameters for each of the 50 ATRs. These estimates were then used to construct the discrete prior distributions for , giving m=3 and M=50. In addition, daily counts from the year 1991 were available for 48 ATRs, and for each of these ATRs a sample consisting of a one-week count from the month of March and a one-week count from the month of July was drawn. The 1992 data were used for estimation, and 1991 data were used for validation because more ATRs had good data in 1992. A MATLAB (Mathworks 1992) program for evaluating (14) was written, and then for each of the 48 ATRs, the 5th and 95th percentile points of the predictive distribution of the logarithm of the 1991 total traffic volume were computed by embedding this routine inside MATLAB's root-finding algorithm. Finally, the logarithm of the total 1991 actual traffic volume was also computed for each ATR. The results of these computations are displayed in tables 1 through 3.(table 1, table 2, table 3)

Note that the 5th and 95th percentile points describe the bounds of a 90% credible interval, and, clearly, if a large number of actual traffic counts fell outside the bounds of our intervals, we would have evidence for inaccurate prediction. On the other hand we would still expect a few actual counts to fall outside our bounds. If the intervals caught all actual volumes, we would be inclined to believe that the computed credible intervals were too large. If the approximation is acceptably accurate, we would expect the actual count to fall outside the bounds 10% of the time, and a test of the adequacy of the estimated credible bounds can be made by treating the number of missed totals as the outcome of a binomial random variable with 48 trials and a hypothesized miss probability of p=0.1. Inspection of the tables shows that for 8 of the ATRs (2, 8, 12, 204, 208, 217, 218, and 226) the actual count fell outside the estimated bounds, for a total of 8 binomial "successes." Since the probability of obtaining 8 or more successes by chance is 0.102, this result is not inconsistent with the hypothesis that equation (14) provides a reasonable approximation of the predictive distribution.

Conclusions

Predicted or estimated traffic totals are required in both highway pavement and safety engineering and are used to produce statewide and nationwide estimates of total distance traveled. Although recommended practices exist for estimating traffic totals as part of a traffic monitoring program, it is less clear how we should characterize the uncertainty associated with these estimates. This paper describes an initial solution to this problem, in which empirical Bayes methods are used to compute the quantiles of a traffic total's predictive distribution, given a sample of daily traffic volumes. Probable ranges and their associated probability values are readily found, and, if desired, a point prediction of the total traffic can be obtained as the median, or 50th percentile, of the predictive distribution. The method of derivation can also be used to find the predictive density and the moments of the predictive distribution. No data are required beyond that routinely collected by statewide traffic monitoring programs, and the estimates of the factor group adjustment parameters can be computed using standard linear regression methods. All other computations have been successfully implemented as MATLAB macros.

In conclusion, almost all engineering decisions must be made in the face of uncertainty, and the art of successful engineering requires cost-effective hedging against this uncertainty. It was argued earlier that standard methods for predicting total traffic ignore potentially important sources of error, and, hence, understate the resulting uncertainty characterizing estimates and predictions. Many of the statistical procedures used in highway engineering date to the middle part of the 20th century and are based on simplified statistical models adapted to the computational constraints of those times. Statistical science has advanced considerably since then, and these advances can support and encourage the use of more realistic models in highway engineering. This paper proposes a modest step in this direction by providing a computationally practical method which accounts for uncertainty in traffic volume predictions. Of course, the importance of hedging against uncertainty depends on the consequences of error, and, fortunately, so far the consequences attached to using mistakenly precise traffic forecasts have not been too severe. Whether or not this state of affairs continues is of course another uncertain prediction about the future.

Acknowledgments

The authors would like to thank Mark Flinner of Mn/DOT for providing the traffic-count data used in this study. This research was sponsored by Mn/DOT. However, all facts, conclusions and opinions expressed here are solely the responsibility of the authors and do not necessarily reflect the views of Mn/DOT.

References

AASHTO. 1992. Guidelines for Traffic Data Programs. Washington, DC.

____. 1993. Guide for Design of Pavement Structures. Washington, DC.

Box, G. and G. Tiao. 1973. Bayesian Inference in Statistical Analysis. New York: Wiley and Sons.

Brockwell, P. and R. Davis. 1991. Time Series: Theory and Methods. New York: Springer-Verlag.

Cambridge Systematics. 1994. Use of Data from Continuous Monitoring Sites. Report to Federal Highway Administration, U.S. Department of Transportation, Washington, DC.

Davis, G. 1997a. Estimation Theory Approach to Monitoring and Updating Average Daily Traffic. Report 97-05, Minnesota Department of Transportation, St. Paul, MN.

Davis, G. 1997b. Accuracy of Estimates of Mean Daily Traffic: A Review. Transportation Research Record 1593:12-6.

Davis, G. and Y. Guan. 1996. Bayesian Assignment of Coverage Count Locations to Factor Groups and Estimation of Mean Daily Traffic. Transportation Research Record 1542:30-7.

Draper, D. 1995. Assessment and Propagation of Model Uncertainty. Journal of the Royal Statistical Society B 57:45-97.

Foster, K. and P. Huber. 1999. Judging Science: Scientific Knowledge and the Federal Courts. Cambridge: MIT Press.

Gallant, R. and H. White. 1988. A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. New York: Basil-Blackwell.

Hacking, I. 1975. The Emergence of Probability. Cambridge: Cambridge University Press.

Hallenbeck, M. and S-G. Kim. 1994. Truck Loads and Flows, Task A Final Technical Report. Washington State Transportation Research Center (TRAC), Seattle, WA.

Howson, C. and P. Urbach. 1993. Scientific Reasoning: The Bayesian Approach. Chicago: Open Court.

Johnson, N., S. Kotz, and N. Balakrishnan. 1994. Continuous Univariate Distributions, Volume 1. Second edition. New York: Wiley and Sons.

Marlow, N. 1967. A Normal Limit Theorem for Power Sums of Independent Random Variables. Bell Systems Technical Journal 46:2081-9.

Mathworks. 1992. MATLAB Reference Guide, Version 4. Mathworks, Inc.

Padgett, W. and J. Robinson. 1978. Empirical Bayes Estimators of Reliability for Lognormal Failure Models. IEEE Transactions on Reliability R-27:223-336.

Parker, M. 1991. Highway Safety Engineering Studies: Procedural Guide. Report No. FHWA-HI-88-039. Federal Highway Administration, U.S. Department of Transportation, Washington, DC.

Sarewitz, D. and R. Pielke. 2000. Prediction in Science and Policy. Prediction: Science, Decision Making, and the Future of Nature. Washington, DC: Island Press.

Sharma, S., B. Gulati, and S. Rizak. 1996. Statewide Traffic Volume Studies and Precision of AADT Estimates. ASCE Journal of Transportation Engineering 122:430-9.

Shimizu, K. and E. Crow. 1988. History, Genesis, and Properties. Lognormal Distributions: Theory and Applications. New York: Marcel Dekker.

Stewart, T. 2000. Uncertainty, Judgment, and Error in Prediction. Prediction: Science, Decision Making, and the Future of Nature. Washington, DC: Island Press.

U.S. Department of Transportation (USDOT), Federal Highway Administration (FHWA). 1995. Traffic Monitoring Guide, Third Edition. Washington, DC.

Appendix

Weak Convergence of Sums For a Class of Correlated Lognormal Random Variables

As above, let z_t, t=1,...,N denote a sequence of lognormal random variables with denoting their sum. denotes the standard normal distribution function. As before, we will assume that the z_t follow the model

and the error terms {e_t} follow a stationary p-order autoregressive (AR(p)) process. Marlow (1967) showed that if there exists sequences of positive real number such that

limit as uppercase n approaches infinity (probability ((lowercase mu subscript {uppercase n}) divided by lowercase mu subscript {uppercase n}) less than or equal to lowercase x) equals uppercase phi (lowercase x),

then

limit as uppercase n approaches infinity probability [(lowercase mu subscript {uppercase n} divided by lowercase nu subscript {uppercase n}) times (natural logorithm (lowercase z subscript {uppercase n} divided by lowercase mu subscript {uppercase n})) less than or equal to lowercase x] equals uppercase phi (lowercase x)

To verify conditions (a) and (b), we will impose the restriction that the monthly and day-of-week factors for any given day are bounded from above and also bounded away from zero. That is, there exist constants such that

for all t and k. It is then possible to show that

where

lowercase gamma equals summation from lowercase k equals 1 to infinity (lowercase e superscript {lowercase rho subscript {lowercase k} times lowercase sigma {2}} minus 1) less than infinity

whenever the are the autocorrelations for a stationary AR(p) process. Similarly,

so that

lowercase nu subscript {uppercase n} divided by lowercase mu subscript {uppercase n} equals order (uppercase n superscript {minus one half})

and condition (a) is satisfied.

If the daily counts z_t were independent, we could use either the Lyaponuv or Lindbergh central limit theorems to verify condition (b), as was done by Marlow (1967). A more general central limit theorem, allowing for dependence of the sort generated by the AR(p) model for the errors e_t , is stated in Theorem 5.3 of Gallant and White (1988, 76). In particular, if we can show that

then condition (b) will be satisfied, and we are done.

1) Let , and since z_t is lognormal, its fourth central moment is known, so that

2) This condition is satisfied trivially since

implies

3) This condition is also satisfied trivially since the fact that the noise process {e_t} is a stationary AR(p) process implies that it is of all orders.

4) This follows from the fact, demonstrated above, that

Address for Correspondence

Gary A. Davis, Department of Civil Engineering, University of Minnesota, 122 CivE, 500 Pillsbury Drive SE, Minneapolis, MN 55455. Email: drtrips@tc.umn.edu.

USA Banner

Accounting for Uncertainty in Estimates of Total Traffic Volume: An Empirical Bayes Approach