Category Archives: links

Links for February 22, 2012

Elizabeth Gilbert on What the Porcupine Dilemma Can Teach Us About the Secret of Happiness [Maria Popova/Brain Pickings]. Elizabeth Gilbert on Schopenhauer’s porcupines. Staying warm without impaling yourself on someone else’s spines.

Target, Pregnancy, and Predictive Analytics, Part II [Dean Abbott/Data Mining and Predictive Analytics. The Target story was interesting for what it says about the possibilities and perils of analytics. This was my favorite writeup, for its overview of to succeed with data analysis:

1) understand the data,
2) understand why the models are focusing on particular input patterns,
3) ask lots of questions (why does the model like these fields best? why not these other fields?)
4) be forensic (now that's interesting or that's odd...I wonder...),
5) be prepared to iterate, (how can we predict better for those customers we don't characterize well)
6) be prepared to learn during the modeling process

We have to "notice" patterns in the data and connect them to behavior. This is one reason I like to build multiple models: different algorithms can find different kinds of patterns. Regression is a global predictor (one continuous equation for all data), whereas decision trees and kNN are local estimators.

You Are Responsible for Getting Your Ideas to Spread [Tim Kastelle/Innovation Leadership Network]. Don’t blame the customer if your idea isn’t compelling; that’s a failure of your idea or your communication of it.

Machine Learning for Hackers [Review from David Smith/Revolution Analytics blog]. Sounds like a book I need to order.

Rather than merely providing a “cookbook” approach to say, building a “who to follow” recommendation system for Twitter, it takes the time to explain the methodology behing the algorithms and give the reader a better basis for understanding why these methods work (and, equally importantly, how they can go wrong).

What’s new? Exuberance for novelty has benefits [John Tierney/The New York Times]. In a longitudinal study, people who combined novelty-seeking with persistence and “self-transcendence” showed the most success over the years (good health, lots of friends, few emotional problems, greatest satisfaction with life).

Links for January 20, 2012

Big data market survey: Hadoop solutions [Edd Dumbill/O'Reilly Radar].

Apache Hadoop is unquestionably the center of the latest iteration of big data solutions. At its heart, Hadoop is a system for distributing computation among commodity servers. It is often used with the Hadoop Hive project, which layers data warehouse technology on top of Hadoop, enabling ad-hoc analytical queries.

I’m starting my first ever project with Hadoop this week–a prototype of an analytics warehouse using Amazon Elastic MapReduce. Colleagues have told me EMR is a great way to get your head around Hadoop-based data processing.

CBO Report: Medicare pilot programs don’t control health-care costs [Megan McArdle/The Atlantic blogs]. McArdle describes what happened with a housing-project demolition program whose pilot studies suggested  much better effects than were actually seen at scale:

The initial study was small and involved highly screened people with a lot of support. And it seems to have suffered from publication bias–the most spectacular results got the most attention, even though these might just have been outliers.

This is distressingly common–not just in government or social-do-gooding research, but in organizations of all kinds–including corporations.

Programs at scale often don’t show results as good as pilot studies of those programs. More generally in program evaluation, it’s hard to find evidence of strong (or even weak) effects of interventions. Social systems are complex; factors other than those targeted by the intervention often determine outcomes. This is something I need to communicate regularly to my colleagues and our partners–student learning is largely determined by factors other than what we have control over. That’s not to say we shouldn’t improve our course design, teaching practices, and so forth but it is to say that there aren’t many easy pickings out there for improving student outcomes.

For-profits vs not-for-profits [Felix Salmon/Reuters blog].

I know full well that a lot of not-for-profit organizations are run in a dreadful fashion; I’m just not convinced that introducing a profit motive is always or even often the best way to fix that problem…. I very much doubt that for-profit education is ever a good idea. I just don’t see how the incentives there could possibly be aligned.

But the profit motive can’t provide optimal outcomes if there isn’t consumer discipline along with it. For-profit higher education is subsidized by the government in the form of grants and low-interest loans (and note that nonprofit education is subsidized in additional ways as well, in the case of public institutions). Would-be students do not have an incentive to seriously evaluate whether the education they are purchasing is worth what they pay, because there is a third-party payer involved. The situation is much like health care. Good discussion in post of the issues and controversy over for-profit higher education.

Links for January 15, 2012

The rise of the new group think [Susan Cain/New York Times].

Virtually all American workers now spend time on teams and some 70 percent inhabit open-plan offices, in which no one has “a room of one’s own.” During the last decades, the average amount of space allotted to each employee shrank 300 square feet, from 500 square feet in the 1970s to 200 square feet in 2010….

Privacy also makes us productive. In a fascinating study known as the Coding War Games, consultants Tom DeMarco and Timothy Lister compared the work of more than 600 computer programmers at 92 companies. They found that people from the same companies performed at roughly the same level — but that there was an enormous performance gap between organizations. What distinguished programmers at the top-performing companies wasn’t greater experience or better pay. It was how much privacy, personal workspace and freedom from interruption they enjoyed. Sixty-two percent of the best performers said their workspace was sufficiently private compared with only 19 percent of the worst performers. Seventy-six percent of the worst programmers but only 38 percent of the best said that they were often interrupted needlessly.

I work in an open-plan office and I rather like it, mainly because my coworkers are fun and because my clean, small, mostly quiet work area is such a nice change from my sprawling, messy, mostly noisy house. We work on a puzzle together when we’re taking a break from work and wear headphones when we want uninterrupted time. I wonder, though, if I’d be more productive with a private office or even a cubicle. I don’t achieve flow as much I’d like at work. Not sure if that’s because the job is relatively new to me or because the work environment is an obstacle.

Hume, causation & science [Barry Ritholtz/The Big Picture]. “We humans love a grossly over-simplified narrative.” Determining when we can attribute causation to a correlation is one of the major challenges of research design and statistical analysis.

How to work from home like you mean it [Kevin Purdy/Fast Company]. I’m thinking of working one day a week at home to achieve some of that flow I’ve been missing. If I do, I’ll follow some of these tips so it doesn’t devolve into eight hours of Internet surfing.

Lack of interest and aptitude keeps students out of STEM majors [Olga Khazan/Washington Post On Small Business blog]. “A study released this week by Georgetown University’s Center on Education and the Workforce found that recent graduates in computer science, mathematics and engineering all had unemployment rates below 9 percent (with the rates dropping below 6 percent among those who had some experience.) Conversely, the rates for graduates in architecture and the arts were 13.9 and 11.1 percent, respectively.”

What is college for? (Part 2) [Gary Gutting/The New York Times].

Concretely, students graduating from high school should, to cite one plausible model, be able to read with understanding classic literature (from, say, Austen and Browning to Whitman and Hemingway) and write well-organized and grammatically sound essays; they should know the basic outlines of American and European history, have a good beginner’s grasp of at least two natural sciences as well as pre-calculus mathematics, along with a grounding in a foreign language.

Students with this sort of education would be excellent candidates for many satisfying and well-paying jobs in, for example, sales and service industries, except for those that require highly specialized skills. From the standpoint of employment, high school graduates would have no need of college unless they wanted to be accountants or engineers, pursue pre-professional programs leading to law or medical school or train for doctoral work in science or the humanities. Apart from this, the only good reason they would have for going to college would be for its intellectual culture.

Compelling idea, but seems unlikely to happen because (1) our high schools are mostly incapable of providing such an education and (2) our culture is overly invested in the idea of college as the basic ticket to success in today’s economy. E.g.: D.C. may require college application for all [Joanne Jacobs].

Links for January 7, 2012

Nutrition advice: The vitamin D-lemma [Amy Maxmen/Nature]. “The difficulty of distilling strong advice from weak evidence.” This is a key challenge for researchers/statisticians/data scientists in any domain, not just in health.

Will Amazon offer analytics as a service? [Quentin Hardy/Bits]. Interesting to get an idea what that might look like. I don’t think, though, this would compete with SAS and similar software as the post implies. Would someone looking to implement a product recommendation engine implement it in SAS? Probably not. For example, Google is said to use R for model exploration and prototyping, then puts them into production using Python or C++. I feel a “choosing your analytics tool” post coming on.

Community college budget cuts drive students to for-profit school [Chris Kirkham/Huffington Post]. Balanced coverage of why students turn to for-profit schools and the pros and cons of such choices. My observation: community college tuition is artificially low due to government subsidization while for-profit tuition is artificially high, again because of government interference (in the form of financial aid). No market forces to bring about a reasonable balance between supply and demand. The big losers are students (and taxpayers).

Benchprep is codecademy for any subject, high school to med school [Josh Constine/TechCrunch]. “Eventually, publishers might get a clue that interactive digital education is going to destroy their paper book business. If they’re smart they’ll start developing their own courses or raise licensing fees. Until then though, BenchPrep will be the savior of anyone frustrated by the static book-learning experience.” I’m pretty certain some big textbook publishers see that already.

Forget dieting, try intermittent fasting [Josh Ozersky/Time Ideas]. “And that’s why instead of eating healthier, I’m going for longer stretches without eating so I can actually enjoy a whole meal. I don’t starve myself; I drink a protein shake if I get hungry and consume endless glasses of diet iced tea. People tell me this is bad, that I will soon gain back all the weight I’ve lost – and these rejoinders are always given with a smug malice, as if the people uttering them actually despise me for trying to compensate for the pleasures of the plate.”

I fast most days at work until about 2 or 3 pm, then have a small snack. I eat whatever I want once I get home from work around 5 pm. I find this allows me to eat generally what I want while maintaining my weight at a level I’m happy with. I have found, like Josh, that people get really upset about this plan, almost offended that I would eat this way. Funny how everyone thinks they know what is healthy and what is not, despite the difficulties in determining that (see first link in this post).

Links for December 30, 2011

Yes, and… [W.P. McNeill/Corner Cases]. Living by the “yes, and” ethos of improvisational comedy. Always build on what the other person said–stay open to their insight and direction. Be a pliable weed not a concrete pylon. Don’t get mired in dogma. I’m thinking this would work equally well in interactions with coworkers as with kids.

College has been oversold [Alex Tabarrok/Marginal Revolution]. The total number of students graduating from college is way up, but the numbers graduating with STEM degrees haven’t increased. That’s bad for individuals and bad for the economy. “An argument can be made for subsidizing students in fields with potentially large spillovers, such as microbiology, chemical engineering, nuclear physics and computer science. There is little justification for subsidizing sociology, dance and English majors.”

You have to break connections to get your ideas to spread [Tim Kastelle/Innovation Leadership Network. Innovation requires disruption. "When you come up with a great new idea, you need to think about this economic network in two ways. The first is: how can I connect to all of the complementary parts of the economy that are needed to get my idea to work? The second is: if I’m going to get my idea to spread, which of these existing connections need to be broken?"

The second economy [W. Brian Arthur/McKinsey Quarterly]. We are in the process of building out the economy’s neural system, what Arthur calls “the second economy” growing up alongside the first economy, the industrial economy. Downside: loss of jobs as computers take over.

Selecting amongst large classes of models [Brian D. Ripley] (pdf). We have the data and the computational resources to “trawl through literally thousands of models (and in some cases many more).” How to pick among them? A subject I intend to learn a lot more about in 2012.

Curing the big data storage fetish [Dan Woods/Forbes]. “One popular way to express lust for big data for its own sake is to create a gargantuan Hadoop cluster.” Not enough to just store the data, need to build a data-driven culture. “But how do you create  a company culture like CapitalOne or Google or eBay or Zynga or LinkedIn, where data is essentially part of the management team? At all of these companies there are data scientists, the elite professionals, but there are also swarms of data enthusiasts, people who are eager to use data to help do their jobs better.”

links for 2011-03-05

  • "I claim that some of the reasons why so many people who have greatness within their grasp don't succeed are: they don't work on important problems, they don't become emotionally involved, they don't try and change what is difficult to some other situation which is easily done but is still important, and they keep giving themselves alibis why they don't. They keep saying that it is a matter of luck."

links for 2011-03-04

  • "I claim that some of the reasons why so many people who have greatness within their grasp don't succeed are: they don't work on important problems, they don't become emotionally involved, they don't try and change what is difficult to some other situation which is easily done but is still important, and they keep giving themselves alibis why they don't. They keep saying that it is a matter of luck."

links for 2011-02-13

links for 2011-01-28

links for 2011-01-21