big data, statistics

Disciples of enumeration, beware

This kind of analysis doesn’t just end arguments it buries them and salts the earth —unless you are prepared to raise the stakes with your own Big Data-mining operation.

Trevor Butterworth, The Awl

In a tour de force on the opportunities and challenges of big data Butterworth apparently demolishes the idea of small sample data analysis or (more questionable?) the use of anecdotes and thoughtfulness to argue points of contrversy. But finding correlations in massive amounts of data doesn’t mean that the difficulty of finding causality — what’s really going on — has disappeared. It doesn’t mean we abandon anecdote and argument and thoughtful explanation. It means only that we can calculate correlations on bigger data sets. Sometimes — Angrist-and-Pischke style — we can do something akin to experiment. Still, such efforts require much more than mere counting, more than mere enumeration.

His first example, gender bias in the media:

Pre-Big Crit, you might have had pundits setting the air on fire with a mixture of anecdote and data; or a thoughtful article in The Atlantic or The Economist or Slate, reflecting a mixture of anecdote, academic observation and maybe a survey or two; or, if you were lucky, a content analysis of the media which looked for gender bias in several hundred or even several thousand news stories, and took a lot of time, effort, and money to undertake, and which—providing its methodology is good and its sample representative—might be able to give us a best possible answer within the bounds of human effort and timeliness.

The Bristol-Cardiff team, on the other hand, looked at 2,490,429 stories from 498 English language publications over 10 months in 2010. Not literally looked at—that would have taken them, cumulatively, 9.47 years, assuming they could have read and calculated the gender ratios for each story in just two minutes; instead, after ten months assembling the database, answering this question took about two hours. And yes, the media is testosterone fueled, with men dominating as subjects and sources in practically every topic analyzed from sports to science, politics to even reports about the weather. The closest women get to an equal narrative footing with men is—surprise—fashion. Closest. The malestream media couldn’t even grant women tokenistic majority status in fashion reporting. If HBO were to do a sitcom about the voices of this generation that reflected just who had the power to speak, it would, after aggregation, be called “Boys.”

How is this useful analysis, that news stories are more likely to be about men than about women? And how is this evidence of gender bias in news stories? There is only gender bias here if the actual news had been unfairly represented by the stories — if somehow women made as much news as men. But yet we know that women don’t for a myriad of reasons. Women are busy with family. Women don’t face the same opportunity structures as men. Women face bias in the professional and political worlds. The presence of a lopsided gender ratio in magazine and news stories does not necessarily point to gender bias in journalism.

It’s just not that easy to tease truth out of numbers. I hate to restate a cliche everyone should already know and which is too often stated uncritically, but I will anyway. Correlation is not causation.

Patterns are not truth.

Big data does not, in fact, allow us to answer really big questions because most really big questions are questions about causality: Do women face unfair bias — is their unequal representation the result of bias apart from real world factors that would otherwise tend to reduce their representation (and in what context, what country, what career?) Or from my current job context — does social engagement improve academic outcomes (in what context, in what country, in what courses, in what classroom)? Big data is not so useful in answering such questions. Mere correlation in a specific context doesn’t tell you much. Broad-scale big data correlation, even less.

Here’s an example of a study that demonstrated the clear presence of gender bias. Merely changing the gender of a name from male to female  on a resume led to lower rankings on hireability, competency, and mentoring. No big data required. This is essentially experimental — everything was held constant except the gender of the applicants’ name. Big data doesn’t make experiments that control for outside factors more likely. It may reduce their use if it makes us think that big data has something more to tell us than small-data experimentation.

To elaborate on that example from my current job: since students who post to discussion threads show better grades (let’s say they are more “engaged”), will increasing discussion thread posts improve grades? Maybe – but only in very limited contexts, where discussion thread participation actually improves students’ ability to make sense of content, produce better work, spend more time in class, and ultimately do better in class. In most cases, there is only correlation, not causation. Better students post more. They are more conscientious. They are already more engaged. You can make more discussion threads mandatory but I’m skeptical that will improve outcomes.

We can count and then we can correlate counts but to make sense of those associations requires more work. It requires understanding, explanation, context, sometimes anecdote — and ideally experimentation too.

Anyway, I loved Butterworth’s article — one of the best I’ve read recently about big data. All the caveats about the perils of pure counting without context arrive later in the piece: Goodbye, Anecdotes! The Age Of Big Data Demands Real Criticism.