A Page One story by Washington Post staff writer David A. Fahrenthold says carbon dioxide emissions in the Washington, DC, area increased 13.4% from 2001 to 2005.
The article clearly is intended to influence public policy. But there are significant problems with this estimate that are not disclosed in the article. The federal Information Quality Act does not apply to the Washington Post, but it would apply to any federal agency that attempted to either take action based on them, or even to report them in a manner suggesting that it thought they were valid. (Congress is exempt from the statutory requirement to only disseminate scientific and statistical data that meet applicable information quality standards. Unlike Executive branch agencies, of course, Congress is never regarded as an authoritative body for scientific or statistical information. )
Below we compare the data reported by Fahrenthold with the information quality standards that apply to federal agencies
Here’s what Fahrenthold tells us about how he derived his estimate:
The Post estimate began with data on miles traveled by cars and trucks in local jurisdictions and the amount of kilowatt hours used by utility customers.
Then, using methods from the U.S. Energy Information Administration, those figures were used to calculate the total amount of carbon dioxide emitted from vehicles and power-plant smokestacks. [See the chart for details.]
The figures from those calculations leave out greenhouse gases from other sources, such as agriculture, planes, boats and oil furnaces. Those missing figures could account for half of all emissions.
The the chart referred to in square brackets above is found only in the print edition and is titled “The Rapid Rise of Emissions.” It contains the following reported data, but in graphical form:
“The Rapid Rise of Emissions”
(“The rate of increase was calculated by The Washington Post”
using data from governments, environmental groups and electric utilities”)
|Cars and Trucks*||Electricity Use**||Both Sources Combined|
|*Arlington County not included in Virginia Suburbs
** Frederick County not included in Maryland Suburbs. Only partial data available for Stafford, Fauquier, Calvert, Montgomery and Prince George’s counties
SOURCE: Staff reporting
These data do not adhere to the minimum information quality standards that would apply if they had been disseminated by the federal government.
TRANSPARENCY AND REPRODUCIBILITY
Federal information quality guidelines require government agencies to practicetransparency and reproducibility when they disseminate statistical information.Transparency means fully revealing all sources and methods. Reproducibility means providing enough information that a qualified third party would obtain essentially the same answer. The Post’s data do not satisfy either of these requirements.
The Post’s choice of data is not transparent, and Fahrenthold only hints at his sources. At least one of his acknowledged sources — “environmental groups” — have a policy interest in maximizing the reported percentage increase in CO2 emissions. It is possible that they did not bias their data in accordance with these policy interests. However, Fahrenthold does not inform readers of this potential conflict of interest, nor does he reveal whether the Post performed due diligence to validate the validity and reliability their data. It appears that the Post simply accepted their data without question.
The Post acknowledges that its cdata are incomplete two ways — first, by not counting all emissions from categories that it included, and second, by excluding source categories. When data are incomplete, inferences about them should be made with caution. Instead, the Post mentions these defects but draws inferences as if these defects are minor.
With regard to its analytic methods, the Post also reveals nothing of importance. Presumably, the Post performed a simple subtraction of 2001 from 2005 values and assumed the resulting difference to be an unbiased estimate. An unbiased estimate is one that is just as likely to overestimate the true but unknown value as to underestimate it. But simple subtraction yields an unbiased estimate of the difference only under certain restrictive conditions, including:
- All definitions must be identical for 2001 and 2005. Any change in definitions means that the data are not comparable across years, and the result of subtraction is uninterpretable. Apples cannot be subtracted from oranges.
- Data that were missing in each year must be missing from both years. Counties partially counted or missing in 2001 must be either missing or excluded in 2005, and vice versa. Where coverage was partial in 2001, it must be identically partial in 2005.
- The methods used to estimate values for 2001 must be the same methods used for estimating values for 2005. Any change in methods implies an explainable discrepancy in the reported difference.
These conditions might apply, but we don’t know because the Post did not reveal its sources and methods.
This leads to the Post’s second procedural failure. The Post’s calculations are not reproducible by a qualified independent third party. Fahrenthold reports that “Jonathan Cogan, a spokesman for the [Department of Energy’s] Energy Information Administration reviewed The Post’s calculations and said the agency’s formulas appeared to have been used correctly.” The extent of this external review is unclear — was it limited to fidelity to EIA formulae, or did it also include a review of the Post’s input data? (By responding to the Post’s request, Cogan put EIA in the position of violating the spirit of the law by implicitly conveying its endorsement. He did not violate the letter of the law because statements made by agency spokesmen are exempt.)
The depth of Cogan’s review notwithstanding, the reproducibility requirement in federal information quality standards can’t be satisfied by reliance on a hand picked third party. Satisfying the reproducibility requirement can be achieved only by disclosure.
Federal information quality guidelines require federal agencies to ensure that statistical information intended to influence policy be objective:
Substantive objectivity means that information must be “accurate, reliable, and unbiased.” “In a scientific, financial, or statistical context, the original and supporting data shall be generated, and the analytic results shall be developed, using sound statistical and research methods.”
Presentational objectivity means that information must be “presented in an accurate, clear, complete, and unbiased manner,” including “within a proper context” that may include”other information” necessary “to ensure an accurate, clear, complete, and unbiased presentation, including sources and supporting data and models “so that the public can assess for itself whether there may be some reason to question the objectivity of the sources.”
We’ve already documented why the Post’s estimates are unlikely to be substantively objective. If a federal agency disseminated statistical information this way, it would be presumptively in violation of the law. So we’ll focus on presentational objectivity, which applies even if substantive objectivity is assured.
- Excess precision
An elementary principle of information quality is to present quantitative measurements or estimates at a level of precision consistent with that of the measurement instruments and analytic tools. In this case, Fahrenthold presents estimates of percentage change with three significant figures, with the last digit measuring tenths of percentage points. This means Fahrenthold’s estimate of the percentage change in CO2 emissions should be accurate within 0.05%. Given just the acknowledged missing data, that level of precision is technically infeasible; he would be fortunate if his first digit were significant. But by using three significant digits, Fahrenthold falsely implies that he knows much more about CO2 emissions, and their changes over time, than is justified by his data. Presentational objectivity is never served by misleading the users of information about its precision even when the information is accurate.
- Invalid baseline
It’s unclear by how much, if any, CO2 emissions actually rose because Fahrenthold chose a problematic baseline. The year 2001 was unusual in many respects, most notably a weak recession and the coordinated terrorist attacks of September 11. The average annual change in CO2 emissions likely would be different — and in particular, smaller — if Fahrenthold had chosen as a baseline a comparable date in the previous business cycle.
- Invalid comparisons
It’s also unclear what to make of estimates for the Virginia Suburbs that exclude Arlington County, the jurisdiction closest to the District of Columbia. This difficulty is exacerbated by missing data from two exurban Virginia counties (Stafford and Fauquier). Arlington, Stafford and Fauquier counties represent 9%, 5% and 3%, respectively, of the estimated 2005 population of the Virginia Suburbs. That is, data are incomplete or excluded with respect to 17% of the suburban Virginia population.
Figures for the Maryland Suburbs are even more problematic. Fahrenthold reports that there are data missing from Montgomery and Prince George’s counties, and he excludes Frederick County. These counties represent 33%, 30% and 3%, respectively, of the Maryland Suburbs. Data are incomplete or excluded with respect to 66% of the suburban Maryland population.
Howard County, located midway between Washington and Baltimore, is also excluded by the Post. Had Howard County been included, the population for the Maryland Suburbs would have been about 10% greater.
- Invalid inferences from the data
Fahrenthold reports that District of Columbia officials took credit for their apparently lower rate of increase in CO2 emissions:
The brightest news came from the District, where emissions grew 6.7 percent. D.C. officials said they think the relatively low increase is partly a sign of changing behavior: Residents were leaving their cars at home and walking, biking or taking public transit..
But Fahrenthold did not point out that DC’s population had declined about 4% during this period, whereas the population of Suburban Virginia and Suburban Maryland increased about 11% and 10%, respectively. Adjusting for DC’s population decline, Fahrenthold’s figures, if true, would mean DC’s CO2 emissions rose 11% per capita.
Indeed, the entire picture changes when population changes are taken into account. When Fahrenthold’s (unverified) estimates of percentage changes in CO2 emissions from 2001 to 2005 are divided by the Census Bureau’s (validated) estimates of population changes from 2000 to 2005, DC’s performance is the worst in the region rather than the best:
|How Adjusting for Population
Changes the Washington Post’s Estimates
|Jursdictions||Percentage Change in CO2 Emissions
Reported by theWashington Post
|Percentage Change in CO2
Emissions Reported by the Washington Post Adjusted for Population Changes
To be clear, we hesitate to draw any inferences from Fahrenthold’s data. We doubt they are useful for any public policy purpose. Most importantly, his inferences about both the absolute change in CO2 emissions in the Washington metropolitan area and his comparisons across jurisdictions are unsupported by his own data.
- Invalid inferences beyond the data
The primary message of Fahrenthold’s article is that CO2 emissions in the Washington metropolitan area are “rapidly rising.” But Fahrenthold reports data from just two dates. Even if these data were accurate to three significant figures, it would be technically impossible to discern acceleration. The most that Fahrenthold could legitimately report is the average annual change.
- Information quality defects lead others to draw invalid inferences
Information quality principles matter for many reasons, but one key reason is that when poor quality information is disseminated, others are led to draw invalid inferences. These invalid inferences often find their way into public policy unless they are successfully corrected before decisions are made.
A plausible explanation for the invalid inferences made by the anonymous DC government officials cited by Fahrenthold is that Fahrenthold himself premised his request for a reaction on invalid inferences about the data. When pressed for a reaction, public officials may offer answers that are consistent with other data at their disposal. Alternatively, they may give an explanation that is either self-serving or what they think the reporter wants to hear. (Sometimes these are the same thing.) It’s possible that DC officials have data supporting their suggestion that DC’s allegedly lower rate of increase CO2 emissions is a “sign of changing behavior.” But it’s more plausible that they didn’t want to attribute the lower rate to a decline in the District’s population, about which they would be familiar and would not be interpreted favorably by a reporter whose narrative is that regional CO2 are “rapidly rising.”
Similarly, Frank O’Donnell’s claim that “sprawl is causing a big increase in greenhouse gases” is most plausibly related to the public policy positions he and his organization advocate. Because they are opposed to what they call “suburban sprawl,” sprawl is a convenient inference from Fahrenthold’s data that also fits the reporter’s likely narrative.
If sprawl were actually the culprit, then one would expect to find that commuting times are significantly higher for jurisdictions farther away from the District. The available data don’t support that inference. Average commute times reported by the Census Bureau are not nearly as different across the region as one would expect if sprawl were the underlying cause of rising CO2 emissions. For Virginia, average commute times vary from 27.3 minutes (Arlington County) to 37.7 minutes (Stafford County). But Arlington is located adjacent to the District and Stafford is about 45 miles southwest. A 10-minute difference in average commuting time seems much less than one would expect if proximity to the District reduced CO2 emissions from commuting. For Maryland the range is 29.2 minutes (St. Mary’s County) to 39.8 minutes (Calvert County) — again, a range of just 10-minutes.
Indeed, the average commuting time for residents of the District was almost 30 minutes in 2000. The higher population density of the District apparently does not translate into a significantly reduced commute. When DC’s figure is treated as a baseline and subtracted from the averages for the other jurisdictions, the range in net average commuting times in Virginia becomes -2.4 to 8, and the range in Maryland becomes -0.5 to 10.1. People in the Washington metropolitan area don’t all work in the District, and they choose places to live based on many criteria other than the length of their commute. But their average commute is remarkable stable irrespective if where they live.
Of all the errors in Fahrenthold’s story, surely the most pernicious is the claim that CO2 emissions are “rising rapidly.” As we’ve already noted, a rate of acceleration cannot be discerned from two static observations. But this narrative is clearly an appealing one for those who are predisposed to believe that “the problem” of anthropogenic global climate change is “getting worse.” This narrative is often expressed by Post reporters and the newspaper’s editorial board. The Post should make a diligent effort to understand information quality principles and apply them to the newspaper’s work products, especially when a story appears to conform to the revealed biases of its reporters and editors.
% Change CO24
|Falls Church City||10,377||10,781||3.9%||26.4||6.7||—|
|Manassas Park City||10,290||11,622||12.9%||35.6||5.9||—|
|Prince William County||280,813||348,588||24.1%||36.9||7.2||—|
|Anne Arundel County||489,656||510,878||4.3%||28.9||0.1||—|
|Prince George’s County||801,515||846,123||5.7%||35.9||6.2||—|
|St. Mary’s County||6,211||96,518||11.9%||29.2||-0.5||—|
|1 Estimated by Census Bureau; see data quality note.
2 Estimated by Census Bureau; see data quality note.
3 Estimated by Census Bureau; see data quality note.
4 Estimated by the Washington Post; no data quality disclosed.