Tag Archives: big data

Links for March 11, 2012

Depression: A genetic Faustian bargain with infection? [Emily Deans/Evolutionary Psychiatry]. Discusses the Pathogen Host Defense (PATHOS-D) theory of depression described by Raison and Miller [pdf]. Genes that make people susceptible to depression may also protect them from infection. Depression is associated with brain inflammation; inflammation is also part of the immune response that combats infectious disease. “Since infections in the developing world tend to preferentially kill young children, there is strong selection pressure for genes that will save you when you are young, even if those genes have a cost later in life.”

The people of the petabyte [Venkatesh Rao/Forbes blogs]. An “informal taxonomy and anthropological survey of data-land” based on Rao’s observations at the Strata conference. Apparently everyone’s a data scientist now:

The taxonomy part is simple. Apparently the list of species in data land is very short. It has only one item:

  • Data scientist

What is the value of big data research vs. good samples [from LinkedIn Advanced Business Analytics, Data Mining and Predictive Modeling group]. Interesting and lengthy discussion from LinkedIn’s Advanced Business Analytics, Data Mining, and Predictive Modeling group on whether/when sampling vs. big data sets should be used.

The real-world experiment: New application development paradigm in the age of big data [James Kobielus/Forrester].

This year and beyond, we will see enterprises place greater emphasis on real-world experiments as a fundamental best practice to be cultivated and enforced within their data science centers of excellence.  In a next best action program, real-world experiments involve iterative changes to the analytics, rules, orchestrations, and other process and decision logic embedded in operational applications. You should monitor the performance of these iterations to gauge which collections of business logic deliver the intended outcomes, such as improved customer retention or reduced fulfillment time on high-priority orders.

Links for February 27, 2012

Kathy Sierra on gamification in education [Larry Ferlazzo/Larry Ferlazzo's Websites of the Day... for Teaching ELL, ESL, & ESL] Kathy offers guidelines around when gamification may be safe vs. dangerous. What falls in the dangerous category? Learning and engaging that is intrinsically rewarding, since psychology studies have suggested that rewarding such activity destroys a person (or a monkey’s) interest in doing the activity for its own sake:

The studies are both counter-intuitive and disturbing. The monkeys that enjoyed playing with wooden puzzles until given their favorite treat reward for solving the puzzles, at which time their puzzle-solving diminished. The kids given ribbons for their drawings then showed less interest in drawing. The writers shown a list of possible external reasons for writing immediately wrote less complex and interesting poems than those shown a list of intrinsically-rewarding reasons for writing. And on and on and on and on. Animals, humans, children, adults, across wide-ranging domains and in studies conducted by dozens of independent researchers.

If 99.9% of big data is irrelevant, why do we need it [Michael Wu/Lithium Lithosphere blogs] Lithium’s Principal Scientist of Analytics Wu says “Just because you can track, store, and analyze big data, doesn’t mean you should.” He argues that in many cases you can answer the questions you need to answer just by getting the relevant data — which might be able to be loaded and analyzed on a beefy computer.

Lazily musing about sharing [JP Rangaswami/Confused of Calcutta]. “Sharing is serious business” — it has serious consequences for businesses, especially for those built upon not-sharing. Five ideas about sharing:

1. For anything to be social, it must be shared.

2. Sharing, the act of making social, happens because people are made social.

3. Sharing is encouraged by good design.

4. When you share physical things like food, sharing reduces waste.

5. When you share non-physical things like ideas, sharing increases value.

Want to get value our of your data and analytics investment? Then deal with this issue before you buy the software [Maz Iqbal/B2C Business to Community]. People don’t think statistically correctly, even professional statisticians. Getting the right data into systems that can analyze it is the easy part. The hard part is:

Getting managers to give up their pet theories, their ideological convictions, their vested interests, their intuition, their past experience and use data and analytics to make decisions. That is the central issue that you have to and should deal with.

Links for January 20, 2012

Big data market survey: Hadoop solutions [Edd Dumbill/O'Reilly Radar].

Apache Hadoop is unquestionably the center of the latest iteration of big data solutions. At its heart, Hadoop is a system for distributing computation among commodity servers. It is often used with the Hadoop Hive project, which layers data warehouse technology on top of Hadoop, enabling ad-hoc analytical queries.

I’m starting my first ever project with Hadoop this week–a prototype of an analytics warehouse using Amazon Elastic MapReduce. Colleagues have told me EMR is a great way to get your head around Hadoop-based data processing.

CBO Report: Medicare pilot programs don’t control health-care costs [Megan McArdle/The Atlantic blogs]. McArdle describes what happened with a housing-project demolition program whose pilot studies suggested  much better effects than were actually seen at scale:

The initial study was small and involved highly screened people with a lot of support. And it seems to have suffered from publication bias–the most spectacular results got the most attention, even though these might just have been outliers.

This is distressingly common–not just in government or social-do-gooding research, but in organizations of all kinds–including corporations.

Programs at scale often don’t show results as good as pilot studies of those programs. More generally in program evaluation, it’s hard to find evidence of strong (or even weak) effects of interventions. Social systems are complex; factors other than those targeted by the intervention often determine outcomes. This is something I need to communicate regularly to my colleagues and our partners–student learning is largely determined by factors other than what we have control over. That’s not to say we shouldn’t improve our course design, teaching practices, and so forth but it is to say that there aren’t many easy pickings out there for improving student outcomes.

For-profits vs not-for-profits [Felix Salmon/Reuters blog].

I know full well that a lot of not-for-profit organizations are run in a dreadful fashion; I’m just not convinced that introducing a profit motive is always or even often the best way to fix that problem…. I very much doubt that for-profit education is ever a good idea. I just don’t see how the incentives there could possibly be aligned.

But the profit motive can’t provide optimal outcomes if there isn’t consumer discipline along with it. For-profit higher education is subsidized by the government in the form of grants and low-interest loans (and note that nonprofit education is subsidized in additional ways as well, in the case of public institutions). Would-be students do not have an incentive to seriously evaluate whether the education they are purchasing is worth what they pay, because there is a third-party payer involved. The situation is much like health care. Good discussion in post of the issues and controversy over for-profit higher education.