I just watched this video of Hilary Mason* talking about data mining. Aside from the obvious thoughts of what I could have done with my life if (1) I had majored in computer science instead of philosophy/economics and (2) hadn’t spent all of the zeroes having babies, buying/selling houses, and living out an island retirement fantasy thirty years before my time, I found myself musing about her comments on the “data scientist” term. She said she’s gotten into arguments about it. I guess some people think it doesn’t really mean anything — it’s just hype — who needs it? Someone’s a computer scientist or a statistician or a business intelligence analyst, right? Why make up some new name?
I dunno, I rather like the term. My official title at work is “data scientist” — thank you to my management for that — and it seems more appropriate than statistician or business intelligence analyst or senior software developer or whatever else you might want to call me. The fact is, I do way more than statistical analysis. I know SQL all too well and (as my manager knows from my frequent complaints) spend 75% + of my time writing extract-transform-load code. I use traditional statistical methods like factor analysis and logistic regression (heavily) but if needed I use techniques from machine learning. I try to keep on top of the latest online learning research and I incorporate that into our analytics plans and models. Lately I’ve been spending time looking at what sort of big data architectures might support the scale of analytics we want to do. I don’t just need to know what statistical or ML methods to use — I need to figure out how to make them scalable and real-time and — this is critical — useful in the educational context. That doesn’t sound like pure statistics to me, so don’t just call me a statistician**.
I do way more than data analysis and I’m capable of way more, thanks to my meandering career path that’s taken me from risk assessment (heavy machinery accident analysis at Failure Analysis now Exponent) to database app development (ERP apps at Oracle) to education (AP calculus and remedial algebra teaching at the Denver School of Science and Technology) and now to Pearson (online learning analytics). I earned a couple of degrees in mathematical statistics and applied statistics/research design/psychometrics meanwhile.
None of what I did made sense at the time I was wandering the path — and yet it all adds up to something useful and rare in my current position. Data science requires an alchemistic mixture of domain knowledge, data analysis capability, and a hacker’s mindset (see Drew Conway’s Venn diagram of data science reproduced here). Any term that only incorporates one or two of these circles doesn’t really capture what we do. I’m an educational researcher, a statistician, a programmer, a business analyst. I’m all these things.
In the end, I don’t really care what you call me, so long as I get the chance to ask interesting questions, gather the data to answer them, and then give you an answer you can use — an answer that is grounded in quantitative rigor and human meaning.
*Yes, I do have a girl-crush on Hilary. I think she’s awesome.
** Also, my kids cannot seem to pronounce the word “statistician.” I need a job title they can tell people without stumbling over it. I hope to inspire them to pursue careers that are as rewarding and engaging, intellectually and socially, as my own has been.