Tag Archives: data scientist

Links for March 11, 2012

Depression: A genetic Faustian bargain with infection? [Emily Deans/Evolutionary Psychiatry]. Discusses the Pathogen Host Defense (PATHOS-D) theory of depression described by Raison and Miller [pdf]. Genes that make people susceptible to depression may also protect them from infection. Depression is associated with brain inflammation; inflammation is also part of the immune response that combats infectious disease. “Since infections in the developing world tend to preferentially kill young children, there is strong selection pressure for genes that will save you when you are young, even if those genes have a cost later in life.”

The people of the petabyte [Venkatesh Rao/Forbes blogs]. An “informal taxonomy and anthropological survey of data-land” based on Rao’s observations at the Strata conference. Apparently everyone’s a data scientist now:

The taxonomy part is simple. Apparently the list of species in data land is very short. It has only one item:

  • Data scientist

What is the value of big data research vs. good samples [from LinkedIn Advanced Business Analytics, Data Mining and Predictive Modeling group]. Interesting and lengthy discussion from LinkedIn’s Advanced Business Analytics, Data Mining, and Predictive Modeling group on whether/when sampling vs. big data sets should be used.

The real-world experiment: New application development paradigm in the age of big data [James Kobielus/Forrester].

This year and beyond, we will see enterprises place greater emphasis on real-world experiments as a fundamental best practice to be cultivated and enforced within their data science centers of excellence.  In a next best action program, real-world experiments involve iterative changes to the analytics, rules, orchestrations, and other process and decision logic embedded in operational applications. You should monitor the performance of these iterations to gauge which collections of business logic deliver the intended outcomes, such as improved customer retention or reduced fulfillment time on high-priority orders.

So you call yourself a data scientist?

Hilary Mason (in Glamour!)

I just watched this video of Hilary Mason* talking about data mining. Aside from the obvious thoughts of what I could have done with my life if (1) I had majored in computer science instead of philosophy/economics and (2) hadn’t spent all of the zeroes having babies, buying/selling houses, and living out an island retirement fantasy thirty years before my time, I found myself musing about her comments on the “data scientist” term. She said she’s gotten into arguments about it. I guess some people think it doesn’t really mean anything — it’s just hype — who needs it? Someone’s a computer scientist or a statistician or a business intelligence analyst, right? Why make up some new name?

I dunno, I rather like the term. My official title at work is “data scientist” — thank you to my management for that — and it seems more appropriate than statistician or business intelligence analyst or senior software developer or whatever else you might want to call me. The fact is, I do way more than statistical analysis. I know SQL all too well and (as my manager knows from my frequent complaints) spend 75% + of my time writing extract-transform-load code. I use traditional statistical methods like factor analysis and logistic regression (heavily) but if needed I use techniques from machine learning. I try to keep on top of the latest online learning research and I incorporate that into our analytics plans and models. Lately I’ve been spending time looking at what sort of big data architectures might support the scale of analytics we want to do. I don’t just need to know what statistical or ML methods to use — I need to figure out how to make them scalable and real-time and — this is critical — useful in the educational context. That doesn’t sound like pure statistics to me, so don’t just call me a statistician**.

I do way more than data analysis and I’m capable of way more, thanks to my meandering career path that’s taken me from risk assessment (heavy machinery accident analysis at Failure Analysis now Exponent) to database app development (ERP apps at Oracle) to education (AP calculus and remedial algebra teaching at the Denver School of Science and Technology) and now to Pearson (online learning analytics). I earned a couple of degrees in mathematical statistics and applied statistics/research design/psychometrics meanwhile. 

Drew Conway's Venn diagram of data science

None of what I did made sense at the time I was wandering the path — and yet it all adds up to something useful and rare in my current position. Data science requires an alchemistic mixture of domain knowledge, data analysis capability, and a hacker’s mindset (see Drew Conway’s Venn diagram of data science reproduced here). Any term that only incorporates one or two of these circles doesn’t really capture what we do. I’m an educational researcher, a statistician, a programmer, a business analyst. I’m all these things.

In the end, I don’t really care what you call me, so long as I get the chance to ask interesting questions, gather the data to answer them, and then give you an answer you can use — an answer that is grounded in quantitative rigor and human meaning.


*Yes, I do have a girl-crush on Hilary. I think she’s awesome.

** Also, my kids cannot seem to pronounce the word “statistician.” I need a job title they can tell people without stumbling over it. I hope to inspire them to pursue careers that are as rewarding and engaging, intellectually and socially, as my own has been.