Category Archives: links

Daily Links 04/11/2017

Demystifying data science

The key to a successful analytical model is having a robust set of variables against which to test for their predictive capabilities. And the key to having a robust set of variables from which to test is to get the business users engaged early in the process.

How machine learning is shaking up e-commerce and customer engagement

From a content perspective, [Sitecore] performs semantic analysis to:

  • Auto generate taxonomies and tagging
  • Help improve the tone of your content by analyzing for things like wordiness, slang, and other grammar-like faux pax

From a digital marketing perspective, ML can:

  • Help detect segments of your customers or audience
  • Improve the effectiveness of your testing and optimization processes
  • Provide content and product recommendations that increase the engagement time a customer spends on your website.

And from a backend perspective, it can help with fraud detection, something that every company with an e-commerce model needs to monitor actively.

Gartner 2017 magic quadrant for data science platforms: gainers and losers

Firms covered:

  • Leaders (4): IBM, SAS, RapidMiner, KNIME
  • Challengers (4): MathWorks (new), Quest (formerly Dell), Alteryx, Angoss
  • Visionaries (5): Microsoft, (new), Dataiku (new), Domino Data Lab (new), Alpine Data
  • Niche Players (3): FICO, SAP, Teradata (new)

Gartner notes that even the lowest-scoring vendors in MQ are still among the top 16 firms among over 100 vendors in the heated Data Science market.

Among those not on the quadrant, I’ve been impressed by DataRobot.

Daily Links 04/05/2017

New technology pushes machine smarts to the edge

“The set of possible smart edge devices that can be used for industrial control is rapidly expanding as ever more compute and sensing capability moves to the edge,” says Greg Olsen, senior vice president, products, at Falkonry. “As long as the device can transform signal observation into operational commands or guidance, it can be considered a control device. Smartness is clearly subjective, but the range can include anything from advanced process control all the way up to artificial intelligence.”

Want to be happier and more successful? Learn to like other people

It sounds paradoxical, but according to University of Georgia researcher Jason Colquitt and his colleagues, people who tend to trust others at work score higher on a range of measure than those who don’t, from job performance to commitment to the team. And since we know that it’s our relationships—particularly with our bosses and colleagues—that determine how happy and successful we are as our careers progress, it may be worth asking some new questions. Instead of, “How can I improve?” the better question might be, “How can I start seeing more of the good in people, more often?”

Google’s Cloud Jobs API

Company career sites, job boards and applicant tracking systems can improve candidate experience and company hiring metrics with job search and discovery powered by sophisticated machine learning. The Cloud Jobs API provides highly intuitive job search that anticipates what job seekers are looking for and surfaces targeted recommendations that help them discover new opportunities. In order to provide the most relevant search results and recommendations, the API uses machine learning to understand how job titles and skills relate to one another, and what job content, location and seniority are the closest match for a jobseeker’s preferences.


Daily Links 04/04/2017

Emotion Detection and Recognition from Text Using Deep Learning

The researchers used a data set of short English text messages labeled by Mechanical Turkers with five emotion classes anger, sadness, fear, happiness, and excitement. A multi-layered neural network was trained to classify text messages by emotion. The model was able to classify anger, sadness, and excitement well but didn’t do well at recognizing fear.

Adapting ideas from neuroscience for AI

We don’t really know why neurons spike. One theory is that they want to be noisy so as to regularize, because we have many more parameters than we have data points. The idea of dropout [a technique developed to help prevent overfitting] is that if you have noisy activations, you can afford to use a much bigger model. That might be why they spike, but we don’t know. Another reason why they might spike is so they can use the analog dimension of time, to code a real value at the time of the spike. This theory has been around for 50 years, but no one knows if it’s right. In certain subsystems, neurons definitely do that, like in judging the relative time of arrival of a signal to two ears so you can get the direction.

Five AI Startup Predictions for 2017

My favorite: “Full stack AI startups actually work”

When you focus on a vertical, you can find high level customer needs that we can meet better with AI, or new needs that can’t be met without AI. These are terrific business opportunities, but they require much more business savvy and subject matter expertise. The generally more technical crowd starting AI startups tend to have neither, and tend to not realize the need for or have the humility to bring in the business and subject matter expertise required to ‘move up the stack’ or ‘go full stack’ as I like to call it.

The Silicon Gourmet: training a neural network to generate cooking recipes

Pears Or To Garnestmeam


¼ lb bones or fresh bread; optional
½ cup flour
1 teaspoon vinegar
¼ teaspoon lime juice
2  eggs

Brown salmon in oil. Add creamed meat and another deep mixture.

Discard filets. Discard head and turn into a nonstick spice. Pour 4 eggs onto clean a thin fat to sink halves.

Brush each with roast and refrigerate.  Lay tart in deep baking dish in chipec sweet body; cut oof with crosswise and onions.  Remove peas and place in a 4-dgg serving. Cover lightly with plastic wrap.  Chill in refrigerator until casseroles are tender and ridges done.  Serve immediately in sugar may be added 2 handles overginger or with boiling water until very cracker pudding is hot.

Yield: 4 servings

Also see In Which a Neural Network Learns to Tell Knock-Knock Jokes

Daily Links 01/27/2015

Traditionally we say: If we find statistical significance, we’ve learned something, but if a comparison is not statistically significant, we can’t say much. (We can “reject” but not “accept” a hypothesis.)

But I’d like to flip it around and say: If we see something statistically significant (in a non-preregistered study), we can’t say much, because garden of forking paths. But if a comparison is not statistically significant, we’ve learned that the noise is too large to distinguish any signal, and that can be important.

So, to sum up, science is not about data; it’s not about the empirical content, about our vision of the world. It’s about overcoming our own ideas and continually going beyond common sense. Science is a continual challenging of common sense, and the core of science is not certainty, it’s continual uncertainty—I would even say, the joy of being aware that in everything we think, there are probably still an enormous amount of prejudices and mistakes, and trying to learn to look a little bit beyond, knowing that there’s always a larger point of view to be expected in the future. 

We really have no idea what dolphins or octopi or crows could achieve if their brains were networked in the same way. Conversely, if human beings had remained largely autonomous individuals they would have remained rare hunter-gatherers at the mercy of their environments as the huge-brained Neanderthals indeed did right to the end. What transformed human intelligence was the connecting up of human brains into networks by the magic of division of labour, a feat first achieved on a small scale in Africa from around 300,000 years ago and then with gathering speed in the last few thousand years.

Take Salesforce for example. Right now it just presents data, and the human user has to draw her or his predictive insights in their heads. Yet most of us have been trained by Google, which uses information from millions of variables based on ours and others’ usage to tailor our user experience … why shouldn’t we expect the same here? Enterprise applications — in every use case imaginable — should and will become inherently more intelligent as the machine implicitly learns patterns in the data and derives insights. It will be like having an intelligent, experienced human assistant in everything we do.

Daily Links 01/19/2015

He told me to get a big wall calendar that has a whole year on one page and hang it on a prominent wall. The next step was to get a big red magic marker.

He said for each day that I do my task of writing, I get to put a big red X over that day. “After a few days you’ll have a chain. Just keep at it and the chain will grow longer every day. You’ll like seeing that chain, especially when you get a few weeks under your belt. Your only job next is to not break the chain.”

On January 2nd of this year I started publishing a daily data science blog post for my team at IQNavigator with analytic results of some sort or another–charts, statistical analyses, machine learning output. My goal is to write such a post every working day for 2015, following Seinfeld’s advice of seeking consistent daily action. I’ve missed one working day so far (last Friday) but otherwise it’s been a great way to ensure I stay engaged with hands-on data science work and consistently discover interesting insights in our data set.

As value shifts from software to the ability to leverage data, companies will have to rethink their businesses, just as Netflix and Google did. In the next decade, data-driven, personalized experiences will continue to accelerate, and development efforts will shift towards using contextual data collected through passive user behaviors.

We in the West hate to acknowledge – and most refuse to believe – that our leaders have been flagrantly wasteful of Muslim lives for a century now, in countless wars and military encounters instigated by overwhelming Western power. What is the message to Muslims of the US-led invasion of Iraq in 2003? More than 100,000 Iraqi civilians – a very conservative estimate – died in a war that was based on utterly false pretenses. The US has never apologized, much less even recognized the civilian slaughter.

“The Google search algorithm” names something with an initial coherence that quickly scurries away once you really look for it. Googling isn’t a matter of invoking a programmatic subroutine—not on its own, anyway. Google is a monstrosity. It’s a confluence of physical, virtual, computational, and non-computational stuffs—electricity, data centers, servers, air conditioners, security guards, financial markets—just like the rubber ducky is a confluence of vinyl plastic, injection molding, the hands and labor of Chinese workers, the diesel fuel of ships and trains and trucks, the steel of shipping containers.

Daily Links 12/22/2014

The bottom line is that science is not merely a bag of clever tricks that turn out to be useful in investigating some arcane questions about the inanimate and biological worlds. Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview, centered on the modest insistence that empirical claims must be substantiated by empirical evidence.

I have said many times that teamwork is over-rated. It can be a smoke screen for office bullies to coerce fellow workers. The economic stick often hangs over the team: be a team player or lose your job, is the implication in many workplaces. One of my main concerns with teams is that people are placed on them by those holding hierarchical power and are then told to work together (or else). However, there are usually power plays internal to the team so that being a team player really means doing what the leader says. For example, I know many people who work in call centres and I have heard how their teams are often quite dysfunctional. Teamwork too often just means towing the party line.

A more accurate title for this role might be CDMO – Chief Data Monetization Officer – as their role needs to be focused on deriving value from, or monetizing, the organization’s data assets.  This also needs to include determining how much to invest to acquire additional data sources that would complement the organization’s existing data sources and enhance their analytic results.

Block out time
Change your defaults
Rely on apps and automation
Do routine cleanup
Think ahead (long-term)
Create separate calendars

I know many others that are like me in this regard and for you I have these recommendations: 1- avoid unnecessary meetings, especially if you are already in full-productivity mode. Don’t be afraid to use this as an excuse to cancel.  If you are in a soft $ institution, remember who pays your salary.  2- Try to bunch all the necessary meetings all together into one day. 3- Separate at least one day a week to stay home and work for 10 hours straight. Jason Fried also recommends that every work place declare a day in which no one talks. No meetings, no chit-chat, no friendly banter, etc… No talk Thursdays anyone?

We have identified that when these four skills are brought together as one, they produce an optimal collaborative environment that breeds the most successful teams and a workplace culture that continuously propels innovation and initiative:

  • Seeing opportunities with broadened observation
  • Sowing opportunities with extensive innovation
  • Growing the seeds of opportunity of greatest potential
  • Sharing the opportunities you create and sustain with others

In fact, a study by my organization revealed that the workplace is not innovative enough because employees are mostly proficient “sowers” (with the propensity of doing what they are told very well).

Daily Links 12/21/2014

The winds of change originate in the unconscious minds of domain experts. If you’re sufficiently expert in a field, any weird idea or apparently irrelevant question that occurs to you is ipso facto worth exploring. [3] Within Y Combinator, when an idea is described as crazy, it’s a compliment—in fact, on average probably a higher compliment than when an idea is described as good.

Today I believe that a major transition towards what some futurists call a “knowledge-based society” is underway. In that context what I call wirearchy represents an evolution of traditional hierarchy. I don’t think most humans can tolerate a lack of some hierarchical structure, primarily for the purposes of decision-making. The working definition I developed (and which has been ‘tested’ br a range of colleagues and friends interested in the issue(s) recognizes that the necessary adaptations to new conditions will likely involve temporary, transient but more intelligent hierarchy. The implication is that people in a wirearchy should be focused on seeking to better understand and use the growing presence of feedback loops and double-loop learning.

In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different “cause-effect pairs” selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).

In an interview with Kevin Smith, writer and television producer Paul Dini complained about a worrying trend he sees in television animation and superhero shows in particular: executives spurning female viewers because they believe girls and women don’t buy the shows’ toys.