analytics, big data, statistics

Data science, Gladwell-style

Does Malcolm Gladwell’s brand of storytelling have any lessons for data scientists? Or is it unscientific pop-sci pablum?

Gladwell specializes in uncovering exciting and surprising regularities about the world — you don’t need to reach a lot of people to spread your ideas (The Tipping Point), your intuition wields more power than you imagined (Blink), and success depends on historical or other accident as much as individual talent (Outliers).

Gladwell’s new book David and Goliath promises to “reshape the way we think of the world around us,” according to the publisher. But Gladwell’s approach makes some empiricists cringe:

[Gladwell] excels at telling just-so stories and cherry-picking science to back them. In “The Tipping Point” (2000), he enthused about a study that showed facial expressions to be such powerful subliminal persuaders that ABC News anchor Peter Jennings made people vote for Ronald Reagan in 1984 just by smiling more when he reported on him than when he reported on his opponent, Walter Mondale. In “Blink” (2005), Mr. Gladwell wrote that a psychologist with a “love lab” could watch married couples interact for just 15 minutes and predict with shocking accuracy whether they would divorce within 15 years. In neither case was there rigorous evidence for such claims. [Christopher Chabris, The Wall Street Journal]

On his blog, Chabris further critiques Gladwell’s approach, defining a hidden rule as “a counterintuitive, causal mechanism behind the workings of the world.” Social scientists like Chabris are all too well aware that to really know what’s happening causally in the world we need replicable experimentation, not cherry-picked studies wrapped up in overblown stories.

Humans love hidden rules. We want to know if there is some counterintuitive practices we should be following, practices that will make our personal and business lives rock.

Data scientists are often called upon to discover hidden rules. Predictive models potentially combine many more variables than our puny minds can handle, often doing so in interesting and unexpected ways. Predictive and other correlational analyses may identify counterintuitive rules that you might not follow if you didn’t have a machine helping you. We learned this from Moneyball. The player stats that baseball cognoscenti thought worked for identifying the best players turned out to be less effective than stats identified by predictive modeling in putting together a winning team.

I am sympathetic to Chabris’ complaints. When I build a predictive model, a natural urge is to deconstruct it and see what it is saying about regularities in our world. What hidden rules did it identify that we didn’t know about?  How can we use those rules to work better? But the best predictive models often don’t tell us accurate or useful things about the world. They just make good predictions about what will happen — if the world keeps behaving like it behaved in the past. Using them to generate hidden, counterintuitive rules feels somehow wrong.

Yet the desire for good stories won’t go away. Neither will the challenges of figuring out causal realities using whatever data we have on hand. We need stories that don’t dispense with science.

How about counterintuitive examples as stone soup?

As those of you who are social scientists surely already know, ideas are like stone soup. Even a bad idea, if it gets you thinking, can move you forward. For example: is that 10,000 hour thing true? I dunno. We’ll see what happens to Steven Levitt’s golfing buddy. (Amazingly enough, Levitt says he’s spent 5000 hours practicing golf. That comes to 5 hours every Saturday . . . for 20 years. That’s a lot of golf! A lot lot lot lot of golf. Steven Levitt really really loves golf.) But, whether or not the 10,000-hour claim really has truth, it certainly gets you thinking about the value of practice. Chris Chabris and others could quite reasonably argue that everyone already knows that practice helps. But there’s something about that 10,000 hour number that sticks in the mind.

When we move from heuristic business rules to predictive models there’s a need to get people thinking with more depth and nuance about how the world works. Telling stories with predictive or other data analytic models can promote that, even if the stories are only qualifiedly true.

If the structure and outputs of a predictive model can be used to get people thinking in more creative and less rigid ways about their actions, I’m in favor. Doesn’t mean I’m going to let go of my belief in the ideal of experimentation or other careful research designs for figuring out what really works, but it does mean maybe there’s some truth to the proposition that data scientists should be storytellers. Finding and communicating hidden rules a la Gladwell can complement careful science.

books, statistics

How data science is like magic

In The Magicians[1], Lev Grossman describes magic as it might exist, but he could as well be describing the real-world practice of statistical analysis or software development:

As much as it was like anything, magic was like a language. And like a language, textbooks and teachers treated it as an orderly system for the purposes of teaching it, but in reality it was complex and chaotic and organic. It obeyed rules only to the extent that it felt like it, and there were almost as many special cases and one-time variations as there were rules. These Exceptions were indicated by rows of asterisks and daggers and other more obscure typographical fauna which invited the reader to peruse the many footnotes that cluttered up the margins of magical reference books like Talmudic commentary.

It was Mayakovsky’s [the teacher’s] intention to make them memorize all these minutiae, and not only to memorize them but to absorb and internalize them. The very best spellcasters had talent, he told his captive, silent audience, but they also had unusual under-the-hood mental machinery, the delicate but powerful correlating and cross-checking engines necessary to access and manipulate and manage this vast body of information. (p149)

To be a good data scientist, whether using traditional statistical techniques or machine learning algorithms (or both), you must know all the rules and approach it first as an orderly system. Then you begin to learn all the special cases and one-time variations and you study and study and practice and practice until you can almost unconsciously adjust to each unique situation that arises.

When I took ANOVA in my Ph.D. program, I could hardly believe there was entire course devoted to it. But it was much like Grossman’s description above. Each week we learned new special cases and one-time variations. I did ANOVA in so many different Circumstances that now I have absorbed and internalized its application as well as the design of studies that would usefully be analyzed with it or with some more flexible variation of it (e.g., hierarchical linear modeling). It felt cookbook at the beginning, but at the end of the course, I felt like I’d begun to develop that “unusual under-the-hood mental machinery” that Grossman suggested an effective magician in his imagined world would need.

That’s not to say that there aren’t important universal principles and practices and foundational knowledge to understand if you are to be an effective statistician or data miner or machine learner programmer; it’s not to say that awareness of Circumstances and methodical practice are all you need. It is to say that data science is ultimately a practice not a philosophy and you reach expertise in it through doing things over and over again, each time in slightly different ways.

In The Magicians, protagonist Quentin practices Legrand’s Hammer Charm, under thousands of different Circumstances:

Page by page the Circumstances listed in the book became more and more esoteric and counterfactual. He cast Legrand’s Hammer Charm at noon and at midnight, in summer and winter, on mountaintops and a thousand yards beneath the earth’s surface. He cast the spell underwater and on the surface of the moon. He cast it in early evening during a blizzard on a beach on the island of Mangareva, which would almost certainly never happen since Mangareva is part of French Polynesia, in the South Pacific. He cast the spell as a man, as a woman, and once–was this really relevant?–as a hermaphrodite. He cast it in anger, with ambivalence, and with bitter regret. (pp150-151)

Sometimes I feel like I have fit logistic regression in all these situations (perhaps not as a hermaphrodite). The next logistic regression I fit, I will say to myself “Wax on, wax off” as Quentin did when faced with a new spell that he had to practice according to each set of Circumstances.

[1]Highly recommended, but with caveats. Read it last summer — loved it — sent it to my 15-year-old son at camp. He loved it too and bought me the sequel for Christmas. After reading the second one, I had to re-read the first. It’s a polarizing book. Don’t pick it up if you are offended by heavy drinking, gratuitous sex, and a wandering plot. Do pick it up if you felt like your young adulthood was marked by heavy drinking, gratuitous sex, a wandering plot, and not nearly enough magic. My son tends to read adult books so I didn’t hesitate to share it with him, but it probably would not be appropriate for most teenagers.