How to lie with statistics

Harold Black, PhDBlog

President Trump has said repeatedly that “We lose $800 billion a year on trade”. Excuse me? If that were the case, we would not trade. Apparently, the president is equating a trading deficit to a loss. Others would opine that we actually are better off having that trade deficit. Moreover, the president has stated that we are running a trade deficit with Canada. However, Canadian data show that the US is running a trade surplus with Canada. But are trade deficits good or bad? It depends. A search of the web will show evidence that supports both views. The same is true with global warming. Search “is the earth warming?” You will find scientific evidence that affirms and disconfirms.

Yet much of today’s opinions rest on evidence on one side of the ledger while ignoring or dismissing evidence to the contrary. The results affect policy decisions, legislation, business activity, employment and public opinion. The rational unbiased observer would insist that the evidence presented would be right. Unfortunately, that may not be the case. The evidence may be tainted, the result of bad data, bad statistical analysis, bad statistical model construction, incorrect time period and a host of other problems. In academics are articles with contradictory conclusions. Few of the articles attempt to replicate the other. Rather, they use different time periods, different datasets, and employ different statistical methods. For example, I published an article confirming competing hypothesis using the same model, the same dataset, but two different time periods. Policymakers would pursue two opposing actions had they adopted either set of results.

One would hope that the researchers were honest. Yet that is not always the case. When an economics journal once asked authors to submit their datasets for accepted articles in order to replicate their results, over half were withdrawn and submissions plummeted. In The Wall Street Journal is an article “How bad is the government’s science” that states that over half of all the research results appearing in academic journals are probably wrong and that in one field independent researchers were only able to replicate 38% of 100 prominent articles. Some empirical scientists have questioned the studies employed by government agencies such as the EPA and Consumer Financial Protection Bureau that have resulted in regulations affecting everything from power plant emissions to restricting types of loans available to consumers. Scott Pruitt, the EPA administrator, recently proposed a rule that would exclude results of research that did not make their data available to the public and could not be replicated. The howls from the academic community were loud and indignant saying that Pruitt’s action would endanger the public health and not keep us safe. Yet the fact is that researchers have an incentive to cheat. At many Federal agencies research results are motivated to be consistent with their bosses point of view. I was involved in a high profile case in which the government accused a large bank of lending discrimination. Using the same data base, I demonstrated that the government’s results were bogus. The government withdrew their suit. In academics, incentive systems are driven by publishable results. If the findings of research are deemed unpublishable then the temptation is to change them to garner more status at the university or research grants from the government or foundations. I have thus been skeptical of academic results and those generated by government researchers.

Scott Pruitt is to be applauded. Maybe now EPA rule making will be based on legitimate rather than junk science.