April 15, 2024

Economix Blog: A History of Oopsies in Economic Studies



Dollars to doughnuts.

As my colleague Annie Lowrey has been covering, a frenzy has erupted over errors in an influential paper by Carmen Reinhart and Kenneth Rogoff. Probably the most embarrassing mistake dealt with errors in a simple formula in an Excel spreadsheet.

This is hardly the first time that a big, splashy economics paper — one that appeared in an elite, peer-reviewed journal, no less — has had an embarrassing error. After all, everyone makes mistakes, including tenured economics professors. From what I’ve seen, and from my discussion with academics, it seems that the mistakes rarely result from deliberate fabrication. Typically the flubs seem to result from human error, with the blame often placed (though rarely on the record) on the poor research assistant who did the grunt work. Which sounds like a terrible cop-out, but is actually credible; economists believe in comparative advantage and so often leave tasks like data entry, coding or simple regressions to undergrads or grad students.

There is a middle ground between innocent error and wholesale fraud, of course, including interpreting an ambiguous data point or result in a way that is favorable to your thesis — something not unique to economics or even the academic world.

Here is some of the prominent research whose data was questioned in recent years:

  • Emily Oster wrote her dissertation, published in 2005, about how hepatitis B skewed sex ratios at birth and was therefore responsible for the “missing women” in Asia, contra explanations from development economists like Amartya Sen. Later she published a separate paper, using different data, that rebutted her original thesis. Professor Levitt praised her for publicly admitting the paper that made her famous was wrong.
  • One of Caroline Hoxby’s most cited papers, in 2000, argued that having more school choice — which in this case basically meant more school districts — improved the quality of schools. The work used a clever proxy for the historical number of school districts: the number of streams in an area, since streams are natural boundaries around which school districts have historically been formed.  Later another economist, Jesse Rothstein, wrote a paper saying he could not replicate her results, and blamed how the original paper categorized what counted as a stream for the discrepancy. A quotation from the paper: “Where Hoxby reports five larger streams in Fort Lauderdale, I counted 12, and a research assistant — working independently — counted 15.” The brouhaha over how to define a stream made it to the national press, and involved accusations of racism and sexism (Professor Hoxby is black and female, in a discipline that is still predominantly male and white).

And those are just a few prominent cases in top academic journals. How do questionable if not clearly erroneous findings make it past the gatekeepers, given the rigorous, onerous, ridiculously long peer-review process?

For the most part, economics journals do not ask anonymous peer reviewers, known as referees, to replicate results fully. The referees are there chiefly to weigh in on things like: Is the question being asked an interesting one, and is it being answered in the smartest way possible? Did the author use the right data, controls and statistical tools available? Is there other research related to this topic that the author should be considering? What additional robustness checks should the author do? Rarely are the referees fact-checking arithmetic and Excel spreadsheet formulas.

That said, in response to controversies over mistakes, coding disputes and academics under siege who are protective of their data, some top journals like The American Economic Review now require authors to submit data and code “sufficient to permit replication” when sending in a paper. Sometimes the underlying numbers and code must be made publicly available. But there are exceptions, as sometimes the data are proprietary or very sensitive. Researchers have to get special permission to use the highly coveted Social Security records data, for example, and have to agree not to share the numbers with unauthorized parties.

Article source: http://economix.blogs.nytimes.com/2013/04/17/a-history-of-oopsies-in-economic-studies/?partner=rss&emc=rss

Microsoft Renews Relevance With Machine Learning Technology

He remained at M.S.R., as Microsoft’s advanced research arm is known, for the fast computers and the chance to work with a growing team of big brains interested in cutting-edge research. His goal was to build predictive software that could get continually smarter.

In a few months, Mr. Horvitz, 54, may get his long-awaited payoff: the advanced computing technologies he has spent decades working on are being incorporated into numerous Microsoft products.

Next year’s version of the Excel spreadsheet program, part of the Office suite of software, will be able to comb very large amounts of data. For example, it could scan 12 million Twitter posts and create charts to show which Oscar nominee was getting the most buzz.

A new version of Outlook, the e-mail program, is being tested that employs Mr. Horvitz’s machine-learning specialty to review users’ e-mail habits. It could be able to suggest whether a user wants to read each message that comes in.

Elsewhere, Microsoft’s machine-learning software will crawl internal corporate computer systems much the way the company’s Bing search engine crawls the Internet looking for Web sites and the links among them. The idea is to predict which software applications are most likely to fail when seemingly unrelated programs are tweaked.

If its new products work as advertised, Microsoft will find itself in a position it has not occupied for the last few years: relevant to where technology is going.

While researchers at M.S.R. helped develop Bing to compete with Google, the unit was widely viewed as a pretty playground where Bill Gates had indulged his flights of fancy. Now, it is beginning to put Microsoft close to the center of a number of new businesses, like algorithm stores and speech recognition services. “We have more data in many ways than Google,” said Qi Lu, who oversees search, online advertising and the MSN portal at Microsoft.

M.S.R. owes its increased prominence as much to the transformation of the computing industry as to its own hard work. The explosion of data from sensors, connected devices and powerful cloud computing centers has created the Big Data industry. Computers are needed to find patterns in the mountains of data produced each day.

“Everything in the world is generating data,” said David Smith, a senior analyst with Gartner, a technology research firm. “Microsoft has so many points of presence, with Windows, Internet Explorer, Skype, Bing and other things, that they could do a lot. Analyzing vast amounts of data could be a big business for them.”

Microsoft is hardly alone among old-line tech companies in injecting Big Data into its products. Later this year, Hewlett-Packard will showcase printers that connect to the Internet and store documents, which can later be searched for new information. I.B.M. has hired more than 400 mathematicians and statisticians to augment its software and consulting. Oracle and SAP, two of the largest suppliers of software to businesses, have their own machine-learning efforts.

In the long term, Microsoft hopes to combine even more machine learning with its cloud computing system, called Azure, to rent out data sets and algorithms so businesses can build their own prediction engines. The hope is that Microsoft may eventually sell services created by software, in addition to the software itself.

“Azure is a real threat to Amazon Web Services, Google and other cloud companies because of its installed base,” said Anthony Goldbloom, the founder of Kaggle, a predictive analytics company. “They have data from places like Bing and Xbox, and in Excel they have the world’s most widely used analysis software.”

Like other giants, Microsoft also has something that start-ups like Kaggle do not: immense amounts of money — $67 billion in cash and short-term investments at the end of the last quarter — and the ability to work for 10 years, or even 20, on a big project.

It has been a long trip for Microsoft researchers. M.S.R. employs 850 Ph.D.’s in 13 labs around the world. They work in more than 55 areas of computing, including algorithm theory, cryptography and computational biology.

Machine learning involves computers deriving meaning and making predictions from things like language, intentions and behavior. When search engines like Google or Bing offer “did you mean?” alternatives to a misspelled query, they are employing machine learning. Mr. Horvitz, now a distinguished scientist at M.S.R., uses machine learning to analyze 25,000 variables and predict hospital patients’ readmission risk. He has also used it to deduce the likelihood of traffic jams on a holiday when rain is expected.

Mr. Horvitz started making prototypes of the Outlook assistant about 15 years ago. He keeps digital records of every e-mail, appointment and phone call so the software can learn when his meetings might run long, or which message he should answer first.

“Major shifts depend on incremental changes,” he said.

At a retreat in March, 100 top Microsoft executives were told to think of new ways that machine learning could be used in their businesses.

“It’s exciting when the sales and marketing divisions start pulling harder than we can deliver,” Mr. Horvitz said. “Magic in the first go-round becomes expectation in the next.”

Article source: http://www.nytimes.com/2012/10/30/technology/microsoft-renews-relevance-with-machine-learning-technology.html?partner=rss&emc=rss