Daily Links 12/17/2014

My personal feeling is that this will really take off if you can start linking performance information to the more objective factual data within the various systems. How does the performance of interim staff vary and is that linked to which agency they come through, their employment history, the length of their assignment or other factors? We’ve probably all had experience of working with interim staff who were brilliant; and with others who weren’t worth a fraction of their day rate. So you can imagine some really powerful analysis that might give a strong steer into how you best choose, structure and manage your contingent workforce – and maybe even take that into the permanent staff world!

Totally agree! Now we just need to get hold of comprehensive and reliable performance data…

Truly some awesome stuff here, including the link below on writing an R package from scratch. I should definitely do that for the utility functions I use over and over. 

This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)

People are searching for products on Amazon, rather than using Google. The only reason search makes money for Google is that people use it to search for products they would like to buy on the internet, and Google shows ads for those products. Increasingly, however, people are going straight to Amazon to search for products. Desktop search queries on Amazon increased 47% between September 2013 and September 2014, according to ComScore.

Jeff: I think it takes more time to analyze something like that. Again, one of my jobs is to encourage people to be bold. It’s incredibly hard.  Experiments are, by their very nature, prone to failure. A few big successes compensate for dozens and dozens of things that didn’t work. Bold bets — Amazon Web Services, Kindle, Amazon Prime, our third-party seller business — all of those things are examples of bold bets that did work, and they pay for a lot of experiments.

What really matters is, companies that don’t continue to experiment, companies that don’t embrace failure, they eventually get in a desperate position where the only thing they can do is a Hail Mary bet at the very end of their corporate existence. Whereas companies that are making bets all along, even big bets, but not bet-the-company bets, prevail. I don’t believe in bet-the-company bets. That’s when you’re desperate. That’s the last thing you can do.

“The dirty secret is that a significant majority of big-data projects aren’t producing any valuable, actionable results,” said Michael Walker, a partner at Rose Business Technologies, which helps enterprises build big-data systems. According to a recent report from the research firm Gartner Inc., “through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation and will be abandoned.”

Daily Links 12/16/2014

3) The Convergence of VMS and FMS The continued adoption of FMS software in 2015 will produce ramifications for other segments of the labor ecosystem, particularly project-based contingent labor. Vendor Management Systems (VMS), which are used primarily to manage temporary staff and contract labor, do not address the specific needs of freelance management.

Yet data science, as a business, is still young. As the technology moves beyond the Internet incubators like Google and Facebook, it has to be applied company by company, in one industry after another.

At this stage, there is a lot of hand craftsmanship rather than software automation.

So the aspiring software companies find themselves training, advising and building pilot projects for their commercial customers. They are acting far more as services companies than they hope to be eventually.

While that may sound like a condition to be remedied, in fact we are living in an era where uncertainty and ambiguity are increasing. The reality is that we can’t shoo it away by becoming more rigid, creating more rules, or imposing more authoritarian controls. We need to loosen control, make more whitespace, give people more autonomy, and rely on the network of loose connections to influence everyone’s actions. We need a climate of soft power in a social network based on sparsity, not density, where weak and lateral connections dominate. That is the wellspring of organizational flexibility and adaptability.

Daily Links 12/10/2014

In his 2003 book, Open Innovation, Henry Chesbrough defined this important concept. In short, open innovation is a product or technology development model that extends beyond the boundaries of a firm to involve others in a collaborative way. Today, much of this activity uses various social networking tools and technologies to empower people to generate ideas, fine-tune concepts, share knowledge or solve critical problems.

When you look at the evolution of digital measurement in the enterprise and study organizations that have achieved a significant degree of maturity, you’ll notice that they come in two distinct flavors: the analytic and the informational. Analytic organizations have strong teams studying the data and driving testing, personalization and customer lifecycle strategies. Informational organizations have widespread, engaged usage of data across the organization with key stakeholders absorbing and using data intelligently to make decisions. It’s not impossible for an enterprise to be both analytic and informational, but the two aren’t necessarily related either. You might expect that organizations that have gotten to be good in measurement would be mature in both areas, but that’s not really the common case. Instead, it seems that most enterprises have either a culture or a problem set that drives them to excel in one direction or the other.

“Garbage in, garbage out” is the cliché of data-haters everywhere. “It is not true that companies need good data to use predictive analytics,” Taylor said. “The techniques can be robust in the face of terrible data, because they were invented by people who had terrible data,” he noted.

Revolution R Open (RRO) is the enhanced distribution of R from Revolution Analytics. RRO is based on version 3.1.1 of the statistical software R and includes additional capabilities for improved performance, reproducibility and platform support.

Daily Links 12/09/2014

But the world operates differently today. Companies own less infrastructure, inventory and manufacturing equipment than ever. They’ve outsourced everything from customer service to supply chain. And a growing portion of their workforce is not on their full-time payroll. 

MetaMind delivers Artificial Intelligence enterprise solutions via its AI platform and Smart Module offerings. The general-purpose platform can predict outcomes for language, vision and database tasks, and delivers best in class accuracies on standard benchmarks.

A reader recently pointed my attention to the following quote from the composer and artist John Cage:

“If something is boring after two minutes, try it for four. If still boring, then eight. Then sixteen. Then thirty-two. Eventually one discovers that it is not boring at all.”

The scientific method, based in deduction and falsifiability, is better at proliferating questions than it is at answering them.

As a result, predictive API providers will face increasing pressure to specialize in one or a few verticals. At this point, elegant and general APIs become not only irrelevant, but a potential liability, as industry- and domain-specific feature engineering increases in importance and it becomes crucial to present results in the right parlance. Sadly, these activities are not thin adapters that can be slapped on at the end, but instead are ravenous time beasts that largely determine the perceived value of a predictive API. No single customer cares about the generality and wide applicability of a platform; each is looking for the best solution to the problem as he conceives it.

The meta-spiral

What she said:

Now I want to keep doing the same things over and over again but maybe in a different key sometimes or with different backup singers or in a different arrangement. I want Nietzschean eternal recurrence except the intra-life version. I’m happy with what I have and if I can redo it again and again, into eternity, I will be satisfied. I am satisfied, even if it is February.

Despite the difficulties I’ve faced since I wrote those words in 2007, I still feel that way–happy and satisfied with the opportunities and challenges life has presented to me; wanting more of the same (but in variation) as long as I can keep spiraling and evolving. I want more time with my family and friends. More chances to engage with smart people and good ideas at work. More laughter and joy. More heartbreak? Sure, that too, because it means I’m still alive and connecting.

Spiraling progress: Looking to a bursty 2015

1280px-NautilusCutawayLogarithmicSpiralI like to think that progress in life happens in a spiral – we return again and again to the same places and lessons, going deeper each time as we evolve into our best and most whole selves.

I cherish the spiral of my life, as long as I find more meaning and human connection each time I come back around to what sometimes seems like the same exact place I was five or ten or twenty years ago.

This afternoon I’ve been browsing the Wayback Machine looking at past blogging I’ve done (such as at The Barely Attentive Mother and Anne 2.1), thinking as I am of dedicating myself in 2015 to renewed blogging and a whole lot more connecting than I’ve done recently. The past five or so years I was focused first on getting my PhD then recovering from the divorce that may have been related in some complex way to my pursuit of the PhD at the same time that I was launching a career in data science. Between those three activities I had little energy and inspiration left to consider any but the most mundane concerns. I was working working working all the time. At least when I wasn’t crying.

So I’ve been busy, not bursty for the past five years. It’s been a whole lot of perspiration not inspiration. But I’m feeling inspired and excited – ready to make connections again – with great people and great ideas – with great people who have great ideas.

In addition to returning to regular writing online other things I’m spiraling back to are these: mountain adventure (skiing, snowboarding, hiking, backpacking), music (playing both guitar and piano, plus helping my middle child find her own musical muse), leading a team at work (have just put together the processes and people I need to accelerate IQN’s data-driven innovation efforts in 2015), plenty of dating of the casual and serious varieties, and Latin American travel (planning a Christmas trip to South America for next year).

I haven’t decided exactly what my blogging and connecting in 2015 will look like but I’m excited to get going, excited to evolve and grow and spiral some more.

Putting the science in data science

Data science is not just overhyped marketing BS, at least not if you are doing it right.

Owning up to the title of data scientist [Sean McClure | Data Science Central]:

To own up to the title of data scientist means practitioners, vendors and organizations must be held accountable to using the term science, just as is expected from every other scientific discipline. What makes science such a powerful approach to discovery and prediction is the fact that its definition is fully independent of human concerns. Yes, we apply science to the areas we are interested in, and are not immune to bias and even falsification of results. But these deviations of the practice do not survive the scientific approach. They are weeded out by the self-consistent and testable mechanisms that underly the scientific method. There is a natural momentum to science that self-corrects and its ability to do this is fully understandable because what survives is the truth. The truth, whether inline with our wishes or not, is simply the way the world works.

Opinions, tools of the trade, programing languages and ‘best’ practices come and go, but what alway survives is the underlying truth that governs how complex systems operate. That ‘thing’ that does work in real world settings. That concept that does explain the behavior with enough predictive accuracy to solve challenges and help organizations compete. This requires discovery; not engineered systems, business acumen, or vendor software. Those toolsets and approaches are only as powerful as the science that drives their execution and provides them their modeled behavior. It is not a product that defines data science, but an intangible ability to conduct quality research that turns raw resources into usable technology.

Why are we doing this? To make our software better – to help it learn about the world and then, based on that learning, improve business outcomes:

The software of tomorrow isn’t programming ‘simple’ logic into machines to produce some automated output. It is using probabilistic approaches and numerical and statistical methods to ‘learn’ the behavior and act accordingly. The software of tomorrow is aware of the market in which it operates and takes actions that are inline with the models sitting under its hood; models that have been built from intense research on some underlying phenomenon that the software interacts with. Science is now being called upon to be a directly-involved piece of real-world products and for that reason, like never before in history, the demand for ushering in science to help enterprise compete is exploding.

Any time someone equates data science with storytelling I get worked up. Science is not storytelling and neither is data science. There is science to figuring out how the world works and how to make things better based on knowing how it works.