Links for February 27, 2012

Kathy Sierra on gamification in education [Larry Ferlazzo/Larry Ferlazzo's Websites of the Day... for Teaching ELL, ESL, & ESL] Kathy offers guidelines around when gamification may be safe vs. dangerous. What falls in the dangerous category? Learning and engaging that is intrinsically rewarding, since psychology studies have suggested that rewarding such activity destroys a person (or a monkey’s) interest in doing the activity for its own sake:

The studies are both counter-intuitive and disturbing. The monkeys that enjoyed playing with wooden puzzles until given their favorite treat reward for solving the puzzles, at which time their puzzle-solving diminished. The kids given ribbons for their drawings then showed less interest in drawing. The writers shown a list of possible external reasons for writing immediately wrote less complex and interesting poems than those shown a list of intrinsically-rewarding reasons for writing. And on and on and on and on. Animals, humans, children, adults, across wide-ranging domains and in studies conducted by dozens of independent researchers.

If 99.9% of big data is irrelevant, why do we need it [Michael Wu/Lithium Lithosphere blogs] Lithium’s Principal Scientist of Analytics Wu says “Just because you can track, store, and analyze big data, doesn’t mean you should.” He argues that in many cases you can answer the questions you need to answer just by getting the relevant data — which might be able to be loaded and analyzed on a beefy computer.

Lazily musing about sharing [JP Rangaswami/Confused of Calcutta]. “Sharing is serious business” — it has serious consequences for businesses, especially for those built upon not-sharing. Five ideas about sharing:

1. For anything to be social, it must be shared.

2. Sharing, the act of making social, happens because people are made social.

3. Sharing is encouraged by good design.

4. When you share physical things like food, sharing reduces waste.

5. When you share non-physical things like ideas, sharing increases value.

Want to get value our of your data and analytics investment? Then deal with this issue before you buy the software [Maz Iqbal/B2C Business to Community]. People don’t think statistically correctly, even professional statisticians. Getting the right data into systems that can analyze it is the easy part. The hard part is:

Getting managers to give up their pet theories, their ideological convictions, their vested interests, their intuition, their past experience and use data and analytics to make decisions. That is the central issue that you have to and should deal with.

Desultory musing* about mashed-up selves

Trying to find yourself is a staple of the self-help literature, along with the striving for authenticity and building up your self-esteem. I probably wrote about authenticity and how you needed to practice it in my book** because way back in 2007, I thought it was a good thing, a necessary thing.

Now I’m convinced that’s all wrong. The self that matters isn’t some tightly defined, self-loving, individuated thing in the world. The self that matters is the mashed-up self, the networked self — the self made up of relationships and experiences and interactions and ideas. It’s way bigger and more powerful than the un-networked you.

These are some ideas I want to explore: combinatorial creativity, connectivist learning, the third person perspective in the creative process, and self-transcendence. What all these have in common is they all overturn the idea that the individuated self is primary.

Writer and artist Austin Kleon on how we are mashups:

We can pick our teachers and we can pick our friends and we can pick the books we read and the music we listen to and the movies we see, etcetera. You are a mashup of what you let into your life. [via Maria Popova]

Maria Popova on networked knowledge and combinatorial creativity:

Which is interesting, recognizing not only the absolute value of content but also its relational value, the value not just of information itself but also of information architecture, not just of content but also of content curation….

The idea that in order for us to truly create and contribute to the world, we have to be able to connect countless dots, to cross-pollinate ideas from a wealth of disciplines, to combine and recombine these pieces and build new castles.

This relates to how we conceive of ourselves. Are we distinct individuals with hard boundaries? Or are we somehow only ourselves when you consider how we fit into a network of experiences, people, and knowledge?

Robert Fritz, author of The Path of Least Resistance, is someone who might be called a creativity guru but instead of that I would call him a creation guru. He doesn’t write so much about being creative as about actually creating. In his view, the self that we want so much to develop and pin down should be set aside:

Don’t try to define yourself, instead, suspend the question. That gives you a better lens by which to create anything you want to be.

Here’s what Fritz discovered when he tried to help people become more creative by improving their self-image:

Questions of identity got in the way of the creative process, even when people thought well about themselves.  It took years to come to understand this.  Everything pointed to a fairly simple and, yet, revolutionary pair of insights.  The first was that what people thought about themselves, good, bad, or indifferent, wasn’t going to change.  All notions of self-esteem training are predicated on the idea that people can change how they see themselves.  This is one reason they don’t work.

The other insight was just as major.  It is that your view of yourself has no place in the creative process.  Simply put, the moment you make your success or failure about you, that’s the moment you can’t learn what you need to learn, experience what comes with the creative territory, and keep your focus where it needs to be, on the outcome you are working to create, and where you currently are in the creative process.

In his books, Fritz suggests that creators need to use a third-person perspective that takes themselves out of the equation rather than the first-person which makes creation all about the self, the I, the creator herself. Creation doesn’t grow out of some authentic, independent self. It launches from a networked self which is almost like no self at all.

Maybe the reason that thinking too much about our identities as distinct individuals stops us from creating is because creation comes through mashing up, through navigating networks of people and knowledge and ideas, not from the perspective of one isolated node in the network. The node alone is useless.

So we need to stop thinking so much about our individual selves — we need to transcend ourselves. Interesting that  some of the most satisfied people combine a love of the new with persistence and self-transcendence. These seem like exactly the traits you’d need to succeed in a networked world. Neophilia (novelty-seeking, love of the new) draws you to new ideas, new people, and new experiences, giving you more material for the mashup that is you and the mashups you create. Persistence keeps you from being merely a dilettante, flitting from one new thing to another. And self-transcendence stops you from thinking that it’s all about you.

It’s really freeing to realize your self alone is this puny, incompetent thing whose self-love or self-loathing matters not a bit. It’s your networked mashup self that matters, that’s capable of doing and creating great things.


* With credit to JP Rangaswami for his “musing lazily about” series. I like to be able to wander around a topic without reaching any conclusions or forcing it into some structure that might obscure the evolving ideas.

** Which shall remain unnamed and unlinked because I’m so beyond what I wrote then even though I feel like there were some really great ideas that I’d like to expand upon and refine. E.g., busy vs. bursty.

Links for February 24, 2012

Cognitive inequality [The Economist Free Exchange].

is this an iron rule of innovation in information technology—that the cheaper information becomes and the easier it becomes to manipulate it the greater will be the gap, productive and otherwise, between the informationally capable and the rest? …

We might well be in an intial phase of the information age in which technology amplifies cognitive gaps which gives way to a period in which technology mutes those gaps.

Our greedy colleges 2.0 [Andrew Gillen/Inside Higher Ed]. The Bennett Hypothesis says that increases in federal financial aid subsidies enable colleges to raise their tuition without concern for what students can actually afford. Study described here found that aid directed to low-income students is less likely to lead to tuition increases compared to aid directed at relatively affluent students.

A modeled student [Cathy O'Neil/mathbabe]. Do systems that recommend courses and majors for students reinforce discrimination?

Economics of the cold start problem in talent discovery [John Horton/Online Labor]. Novices can’t get hired if their talent won’t be revealed until after they get hired. Some empirical evidence. One possible help: “talent revealing sites like StackOverflow and Github as replacements for traditional resumes.”

Links for February 22, 2012

Elizabeth Gilbert on What the Porcupine Dilemma Can Teach Us About the Secret of Happiness [Maria Popova/Brain Pickings]. Elizabeth Gilbert on Schopenhauer’s porcupines. Staying warm without impaling yourself on someone else’s spines.

Target, Pregnancy, and Predictive Analytics, Part II [Dean Abbott/Data Mining and Predictive Analytics. The Target story was interesting for what it says about the possibilities and perils of analytics. This was my favorite writeup, for its overview of to succeed with data analysis:

1) understand the data,
2) understand why the models are focusing on particular input patterns,
3) ask lots of questions (why does the model like these fields best? why not these other fields?)
4) be forensic (now that's interesting or that's odd...I wonder...),
5) be prepared to iterate, (how can we predict better for those customers we don't characterize well)
6) be prepared to learn during the modeling process

We have to "notice" patterns in the data and connect them to behavior. This is one reason I like to build multiple models: different algorithms can find different kinds of patterns. Regression is a global predictor (one continuous equation for all data), whereas decision trees and kNN are local estimators.

You Are Responsible for Getting Your Ideas to Spread [Tim Kastelle/Innovation Leadership Network]. Don’t blame the customer if your idea isn’t compelling; that’s a failure of your idea or your communication of it.

Machine Learning for Hackers [Review from David Smith/Revolution Analytics blog]. Sounds like a book I need to order.

Rather than merely providing a “cookbook” approach to say, building a “who to follow” recommendation system for Twitter, it takes the time to explain the methodology behing the algorithms and give the reader a better basis for understanding why these methods work (and, equally importantly, how they can go wrong).

What’s new? Exuberance for novelty has benefits [John Tierney/The New York Times]. In a longitudinal study, people who combined novelty-seeking with persistence and “self-transcendence” showed the most success over the years (good health, lots of friends, few emotional problems, greatest satisfaction with life).

So you call yourself a data scientist?

Hilary Mason (in Glamour!)

I just watched this video of Hilary Mason* talking about data mining. Aside from the obvious thoughts of what I could have done with my life if (1) I had majored in computer science instead of philosophy/economics and (2) hadn’t spent all of the zeroes having babies, buying/selling houses, and living out an island retirement fantasy thirty years before my time, I found myself musing about her comments on the “data scientist” term. She said she’s gotten into arguments about it. I guess some people think it doesn’t really mean anything — it’s just hype — who needs it? Someone’s a computer scientist or a statistician or a business intelligence analyst, right? Why make up some new name?

I dunno, I rather like the term. My official title at work is “data scientist” — thank you to my management for that — and it seems more appropriate than statistician or business intelligence analyst or senior software developer or whatever else you might want to call me. The fact is, I do way more than statistical analysis. I know SQL all too well and (as my manager knows from my frequent complaints) spend 75% + of my time writing extract-transform-load code. I use traditional statistical methods like factor analysis and logistic regression (heavily) but if needed I use techniques from machine learning. I try to keep on top of the latest online learning research and I incorporate that into our analytics plans and models. Lately I’ve been spending time looking at what sort of big data architectures might support the scale of analytics we want to do. I don’t just need to know what statistical or ML methods to use — I need to figure out how to make them scalable and real-time and — this is critical — useful in the educational context. That doesn’t sound like pure statistics to me, so don’t just call me a statistician**.

I do way more than data analysis and I’m capable of way more, thanks to my meandering career path that’s taken me from risk assessment (heavy machinery accident analysis at Failure Analysis now Exponent) to database app development (ERP apps at Oracle) to education (AP calculus and remedial algebra teaching at the Denver School of Science and Technology) and now to Pearson (online learning analytics). I earned a couple of degrees in mathematical statistics and applied statistics/research design/psychometrics meanwhile. 

Drew Conway's Venn diagram of data science

None of what I did made sense at the time I was wandering the path — and yet it all adds up to something useful and rare in my current position. Data science requires an alchemistic mixture of domain knowledge, data analysis capability, and a hacker’s mindset (see Drew Conway’s Venn diagram of data science reproduced here). Any term that only incorporates one or two of these circles doesn’t really capture what we do. I’m an educational researcher, a statistician, a programmer, a business analyst. I’m all these things.

In the end, I don’t really care what you call me, so long as I get the chance to ask interesting questions, gather the data to answer them, and then give you an answer you can use — an answer that is grounded in quantitative rigor and human meaning.


*Yes, I do have a girl-crush on Hilary. I think she’s awesome.

** Also, my kids cannot seem to pronounce the word “statistician.” I need a job title they can tell people without stumbling over it. I hope to inspire them to pursue careers that are as rewarding and engaging, intellectually and socially, as my own has been.

Getting ready for connected learning

Here’s a cool idea: the web enables a connectivist learning style based on network navigation, where “learning is the process of creating connections and developing a network.” Seems to me before you can learn connectedly, though, you need to first learn in more socially and contextually constrained ways.

Background: Three generations of distance education pedagogies

In this week’s Learning Analytics 2012 (LAK12) web session, Dragan Gasevic pointed us at an interesting paper describing three generations of distance education: cognitive-behaviorist, social constructivist, and connectivist. From Anderson and Dron (2011):

Anderson and Dron did not claim that the connectivist model would replace the cognitive-behaviorist or social-constructivist models but said that “all three current and future generations of [distance education] pedagogy have an important place in a well-rounded educational experience.”

These three models co-exist online today

LAK12 is itself an example of a course built in the connectivist paradigm, but just because a course is massive, open, and online doesn’t mean that it’s connectivist. For example, the Stanford machine learning class offered last fall was a (very effective) example of a cognitive-behaviorist approach. Students watched videos on their own schedule. Regular quizzes and homework assignments checked understanding. Andrew Ng was content creator and sage on the stage. While there was a Q&A forum available, the course design did not rely on them. A student could use them or not.

Typical online college courses today are often built in the social-constructivist mode, with instructors seeking to design and run courses that encourage many-to-many engagement through discussion threads and group projects. Does the addition of social features drive learning? It seems to be an article of faith among instructional designers today that it does. I’m not up on the research so I can’t say — but I can say that in online courses I’ve reviewed and taken, I don’t see evidence that social features have been designed in such a way that they make a difference in learning.

When are the different approaches useful?

I am thinking that whether a cognitive-behaviorist or constructivist or connectivist approach is best depends upon the preparation and goals of the learner. Maybe something like this:

I suspect that a student needs to gain basic grounding and fluency in a subject before constructivist approaches will be useful. An elementary schooler needs to learn to read and write and do arithmetic before you can do a group science project, for example. And it seems like a connectivist approach will be most effective once you already have some intermediate and contextual knowledge of a subject before trying to navigate out from it.

What do you think? When are cognitive-behaviorist vs. social constructivist vs. connectivist approaches to learning most useful? Do you think you need to have achieved a certain level of contextual and subject knowledge before connected learning is effective?

Links for January 20, 2012

Big data market survey: Hadoop solutions [Edd Dumbill/O'Reilly Radar].

Apache Hadoop is unquestionably the center of the latest iteration of big data solutions. At its heart, Hadoop is a system for distributing computation among commodity servers. It is often used with the Hadoop Hive project, which layers data warehouse technology on top of Hadoop, enabling ad-hoc analytical queries.

I’m starting my first ever project with Hadoop this week–a prototype of an analytics warehouse using Amazon Elastic MapReduce. Colleagues have told me EMR is a great way to get your head around Hadoop-based data processing.

CBO Report: Medicare pilot programs don’t control health-care costs [Megan McArdle/The Atlantic blogs]. McArdle describes what happened with a housing-project demolition program whose pilot studies suggested  much better effects than were actually seen at scale:

The initial study was small and involved highly screened people with a lot of support. And it seems to have suffered from publication bias–the most spectacular results got the most attention, even though these might just have been outliers.

This is distressingly common–not just in government or social-do-gooding research, but in organizations of all kinds–including corporations.

Programs at scale often don’t show results as good as pilot studies of those programs. More generally in program evaluation, it’s hard to find evidence of strong (or even weak) effects of interventions. Social systems are complex; factors other than those targeted by the intervention often determine outcomes. This is something I need to communicate regularly to my colleagues and our partners–student learning is largely determined by factors other than what we have control over. That’s not to say we shouldn’t improve our course design, teaching practices, and so forth but it is to say that there aren’t many easy pickings out there for improving student outcomes.

For-profits vs not-for-profits [Felix Salmon/Reuters blog].

I know full well that a lot of not-for-profit organizations are run in a dreadful fashion; I’m just not convinced that introducing a profit motive is always or even often the best way to fix that problem…. I very much doubt that for-profit education is ever a good idea. I just don’t see how the incentives there could possibly be aligned.

But the profit motive can’t provide optimal outcomes if there isn’t consumer discipline along with it. For-profit higher education is subsidized by the government in the form of grants and low-interest loans (and note that nonprofit education is subsidized in additional ways as well, in the case of public institutions). Would-be students do not have an incentive to seriously evaluate whether the education they are purchasing is worth what they pay, because there is a third-party payer involved. The situation is much like health care. Good discussion in post of the issues and controversy over for-profit higher education.