Links for January 20, 2012

Big data market survey: Hadoop solutions [Edd Dumbill/O’Reilly Radar].

Apache Hadoop is unquestionably the center of the latest iteration of big data solutions. At its heart, Hadoop is a system for distributing computation among commodity servers. It is often used with the Hadoop Hive project, which layers data warehouse technology on top of Hadoop, enabling ad-hoc analytical queries.

I’m starting my first ever project with Hadoop this week–a prototype of an analytics warehouse using Amazon Elastic MapReduce. Colleagues have told me EMR is a great way to get your head around Hadoop-based data processing.

CBO Report: Medicare pilot programs don’t control health-care costs [Megan McArdle/The Atlantic blogs]. McArdle describes what happened with a housing-project demolition program whose pilot studies suggested  much better effects than were actually seen at scale:

The initial study was small and involved highly screened people with a lot of support. And it seems to have suffered from publication bias–the most spectacular results got the most attention, even though these might just have been outliers.

This is distressingly common–not just in government or social-do-gooding research, but in organizations of all kinds–including corporations.

Programs at scale often don’t show results as good as pilot studies of those programs. More generally in program evaluation, it’s hard to find evidence of strong (or even weak) effects of interventions. Social systems are complex; factors other than those targeted by the intervention often determine outcomes. This is something I need to communicate regularly to my colleagues and our partners–student learning is largely determined by factors other than what we have control over. That’s not to say we shouldn’t improve our course design, teaching practices, and so forth but it is to say that there aren’t many easy pickings out there for improving student outcomes.

For-profits vs not-for-profits [Felix Salmon/Reuters blog].

I know full well that a lot of not-for-profit organizations are run in a dreadful fashion; I’m just not convinced that introducing a profit motive is always or even often the best way to fix that problem…. I very much doubt that for-profit education is ever a good idea. I just don’t see how the incentives there could possibly be aligned.

But the profit motive can’t provide optimal outcomes if there isn’t consumer discipline along with it. For-profit higher education is subsidized by the government in the form of grants and low-interest loans (and note that nonprofit education is subsidized in additional ways as well, in the case of public institutions). Would-be students do not have an incentive to seriously evaluate whether the education they are purchasing is worth what they pay, because there is a third-party payer involved. The situation is much like health care. Good discussion in post of the issues and controversy over for-profit higher education.


Links for January 7, 2012

Nutrition advice: The vitamin D-lemma [Amy Maxmen/Nature]. “The difficulty of distilling strong advice from weak evidence.” This is a key challenge for researchers/statisticians/data scientists in any domain, not just in health.

Will Amazon offer analytics as a service? [Quentin Hardy/Bits]. Interesting to get an idea what that might look like. I don’t think, though, this would compete with SAS and similar software as the post implies. Would someone looking to implement a product recommendation engine implement it in SAS? Probably not. For example, Google is said to use R for model exploration and prototyping, then puts them into production using Python or C++. I feel a “choosing your analytics tool” post coming on.

Community college budget cuts drive students to for-profit school [Chris Kirkham/Huffington Post]. Balanced coverage of why students turn to for-profit schools and the pros and cons of such choices. My observation: community college tuition is artificially low due to government subsidization while for-profit tuition is artificially high, again because of government interference (in the form of financial aid). No market forces to bring about a reasonable balance between supply and demand. The big losers are students (and taxpayers).

Benchprep is codecademy for any subject, high school to med school [Josh Constine/TechCrunch]. “Eventually, publishers might get a clue that interactive digital education is going to destroy their paper book business. If they’re smart they’ll start developing their own courses or raise licensing fees. Until then though, BenchPrep will be the savior of anyone frustrated by the static book-learning experience.” I’m pretty certain some big textbook publishers see that already.

Forget dieting, try intermittent fasting [Josh Ozersky/Time Ideas]. “And that’s why instead of eating healthier, I’m going for longer stretches without eating so I can actually enjoy a whole meal. I don’t starve myself; I drink a protein shake if I get hungry and consume endless glasses of diet iced tea. People tell me this is bad, that I will soon gain back all the weight I’ve lost – and these rejoinders are always given with a smug malice, as if the people uttering them actually despise me for trying to compensate for the pleasures of the plate.”

I fast most days at work until about 2 or 3 pm, then have a small snack. I eat whatever I want once I get home from work around 5 pm. I find this allows me to eat generally what I want while maintaining my weight at a level I’m happy with. I have found, like Josh, that people get really upset about this plan, almost offended that I would eat this way. Funny how everyone thinks they know what is healthy and what is not, despite the difficulties in determining that (see first link in this post).