Appendix B - Reliability of the Estimates

Monday, July 2, 2012

The estimates in this publication may differ from the actual, unknown population values. Statisticians define this difference as the total error of the estimate. When describing the accuracy of survey results, it is convenient to discuss total error as the sum of sampling error and nonsampling error. Sampling error is the average difference between the estimate and the result that would be obtained from a complete enumeration of the sampling frame conducted under the same survey conditions. Nonsampling error encompasses all other factors that contribute to the total error of a sample survey estimate.

The sampling error of the estimates in this publication can be estimated from the selected sample because the sample was selected using probability sampling. Common measures related to sampling error are the sampling variance, the standard error, and the coefficient of variation (CV). The sampling variance is the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers. This publication presents these measures in Appendix B.

Nonsampling errors are difficult to measure and can be introduced through inadequacies in the questionnaire, nonresponse, inaccurate reporting by respondents, errors in the application of survey procedures, incorrect recording of answers, and errors in data entry and processing. No measures of nonsampling error are presented in this publication, however, every effort is made to minimize their effect on the estimates. Data users should take into account both the measures of sampling error and the potential effects of nonsampling error when using these estimates.

More detailed descriptions of sampling and nonsampling errors for the 2002 CFS are provided in the following sections.

Sampling Error

Because the estimates are based on a sample, exact agreement with results that would be obtained from a complete enumeration of all shipments made in 2002 from all establishments included on the sampling frame using the same enumeration procedures is not expected. However, because probability sampling was used at each stage of selection, it is possible to estimate the sampling variability of the survey estimates. For CFS estimates, sampling variability arises from each of the three stages of sampling. (See Appendix C for a description of the sample design.)

The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same design. If all possible samples had been surveyed under the same conditions, an estimate of a population parameter of interest could have been obtained from each sample. These samples give rise to a distribution of estimates for the population parameter. A statistical measure of the variability among these estimates is the standard error, which can be approximated from any one sample. The standard error is defined as the square root of the variance. The coefficient of variation (or relative standard error) of an estimator is the standard error of the estimator divided by the estimator. Note that measures of sampling variability, such as the standard error and coefficient of variation, are estimated from the sample and are also subject to sampling variability. (Technically, we should refer to the estimated standard error or the estimated coefficient of variation of an estimator. However, for the sake of brevity, we have omitted this detail.) It is important to note that the standard error only measures sampling variability. It does not measure systematic biases of the sample. The Census Bureau recommends that individuals using estimates contained in this report incorporate this information into their analyses, as sampling error could affect the conclusions drawn from these estimates.

An estimate from a particular sample and the standard error associated with the estimate can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability of containing the result of a complete enumeration of the sampling frame conducted under the same survey conditions. Associated with each interval is a percentage of confidence, which is interpreted as follows. If, for each possible sample, an estimate of a population parameter and its approximate standard error were obtained, then:

For approximately 90 percent of the possible samples, the interval from 1.645 standard errors below to 1.645 standard errors above the estimate would include the result as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.
For approximately 95 percent of the possible samples, the interval from 1.96 standard errors below to 1.96 standard errors above the estimate would include the result as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.

To illustrate the computation of a confidence interval for an estimate of total value of shipments, assume that an estimate of total value is $10,750 million and the coefficient of variation for this estimate is 1.8 percent, or 0.018. First obtain the standard error of the estimate by multiplying the value of shipments estimate by its coefficient of variation. For this example, multiply $10,750 million by 0.018. This yields a standard error of $193.5 million. The upper and lower bounds of the 90-percent confidence interval are computed as $10,750 million plus or minus 1.645 times $193.5 million. Consequently, the 90-percent confidence interval is $10,432 million to $11,068 million. If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of these intervals would contain the result obtained from a complete enumeration.

Nonsampling Error

Nonsampling error encompasses all other factors that contribute to the total error of a sample survey estimate and may also occur in censuses. It is often helpful to think of nonsampling error as arising from deficiencies or mistakes in the survey process. In the CFS, nonsampling error can be attributed to many sources: inability to obtain information about all units in the sample; response errors; differences in the interpretation of the questions; mistakes in coding or keying the data obtained; and other errors of collection, response, coverage, and processing. Although no direct measurement of the potential biases due to nonsampling error has been obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize their influence. The Census Bureau recommends that individuals using estimates in this report incorporate this information into their analyses, as nonsampling error could affect the conclusions drawn from these estimates.

A potential source of bias in the estimates is nonresponse. Nonresponse is defined as the inability to obtain all the intended measurements or responses from all units in the sample. Four levels of nonresponse can occur in the CFS: item, shipment, quarter (reporting week), and establishment. Item nonresponse occurs either when a question is unanswered or the response to the question fails computer or analyst edits. Nonresponse to the shipment value or weight items is corrected by imputation, which is the procedure by which a missing value is replaced by a predicted value obtained from an appropriate model. (See Appendix C for a description of the imputation procedure.) Shipment, quarter, and establishment nonresponse are used to describe the inability to obtain any of the substantive measurements about a sampled shipment, quarter, or establishment, respectively. Shipment and quarter nonresponse are corrected by reweighting. Reweighting allocates characteristics to the nonrespondents in proportion to the characteristics observed for the respondents. The amount of bias introduced by this nonresponse adjustment procedure depends on the extent to which the nonrespondents differ, characteristically, from the respondents. Establishment nonresponse is corrected during the estimation procedure by the industry-level adjustment weight. (See Appendix C for a description of the estimation procedure.) In most cases of establishment nonresponse, none of the four questionnaires have been returned to the Census Bureau, after several attempts to elicit a response. Approximately 63 percent of the establishments provided at least one quarter of data that contributed to tabulation.

Some possible sources of bias that are attributed to respondent-conducted sampling include misunderstanding the definition of a shipment, constructing an incomplete frame of shipments from which to sample, ordering the shipment sampling frame by selected shipment characteristics, and selecting shipment records by a method other than the one specified in the questionnaire's instructions. We often contact respondents who reported shipments having an untypically large value or weight when compared to the rest of their reported shipments. Upon contact, if we are able to collect information on all of a given respondent's large shipments made either for a particular reporting week or for the entire quarter, then we identify these large shipments as certainty shipments. (See Appendix C for a description of how certainty shipments are used in the estimation process.)

DEFINITION OF TERMS

Confidentiality

Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and surveys. Section 9 of the same Title requires that any information collected from the public under the authority of Title 13 be maintained as confidential. Section 214 of Title 13 and Sections 3559 and 3571 of Title 18 of the United States Code provide for the imposition of penalties of up to 5 years in prison and up to $250,000 in fines for wrongful disclosure of confidential census information. In accordance with Title 13, no estimates are published that would disclose the operations of an individual firm.

The Census Bureau's internal Disclosure Review Board sets the confidentiality rules for all data releases. A checklist approach is used to ensure that all potential risks to the confidentiality of the data are considered and addressed.

Disclosure Limitation

Disclosure is the release of data that have been deemed confidential. It generally reveals information about a specific individual or establishment or permits deduction of sensitive information about a particular individual or establishment. Disclosure limitation is the process used to protect the confidentiality of the survey data provided by an individual or firm. Using disclosure limitation procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk for disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.

Unpublished Estimates

Some unpublished estimates can be derived directly from this report by subtracting published estimates from their respective totals. However, the estimates obtained by such subtraction would be subject to poor response, high sampling variability, or other factors that may make them potentially misleading.

Individuals who use estimates in this report to create new estimates should cite the Census Bureau as the source of only the original estimates.

USA Banner

Appendix B - Reliability of the Estimates