Category Archives: links

Daily Links 12/21/2014

The winds of change originate in the unconscious minds of domain experts. If you’re sufficiently expert in a field, any weird idea or apparently irrelevant question that occurs to you is ipso facto worth exploring. [3] Within Y Combinator, when an idea is described as crazy, it’s a compliment—in fact, on average probably a higher compliment than when an idea is described as good.

Today I believe that a major transition towards what some futurists call a “knowledge-based society” is underway. In that context what I call wirearchy represents an evolution of traditional hierarchy. I don’t think most humans can tolerate a lack of some hierarchical structure, primarily for the purposes of decision-making. The working definition I developed (and which has been ‘tested’ br a range of colleagues and friends interested in the issue(s) recognizes that the necessary adaptations to new conditions will likely involve temporary, transient but more intelligent hierarchy. The implication is that people in a wirearchy should be focused on seeking to better understand and use the growing presence of feedback loops and double-loop learning.

In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different “cause-effect pairs” selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).

In an interview with Kevin Smith, writer and television producer Paul Dini complained about a worrying trend he sees in television animation and superhero shows in particular: executives spurning female viewers because they believe girls and women don’t buy the shows’ toys.

Daily Links 12/17/2014

My personal feeling is that this will really take off if you can start linking performance information to the more objective factual data within the various systems. How does the performance of interim staff vary and is that linked to which agency they come through, their employment history, the length of their assignment or other factors? We’ve probably all had experience of working with interim staff who were brilliant; and with others who weren’t worth a fraction of their day rate. So you can imagine some really powerful analysis that might give a strong steer into how you best choose, structure and manage your contingent workforce – and maybe even take that into the permanent staff world!

Totally agree! Now we just need to get hold of comprehensive and reliable performance data…

Truly some awesome stuff here, including the link below on writing an R package from scratch. I should definitely do that for the utility functions I use over and over. 

This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)

People are searching for products on Amazon, rather than using Google. The only reason search makes money for Google is that people use it to search for products they would like to buy on the internet, and Google shows ads for those products. Increasingly, however, people are going straight to Amazon to search for products. Desktop search queries on Amazon increased 47% between September 2013 and September 2014, according to ComScore.

Jeff: I think it takes more time to analyze something like that. Again, one of my jobs is to encourage people to be bold. It’s incredibly hard.  Experiments are, by their very nature, prone to failure. A few big successes compensate for dozens and dozens of things that didn’t work. Bold bets — Amazon Web Services, Kindle, Amazon Prime, our third-party seller business — all of those things are examples of bold bets that did work, and they pay for a lot of experiments.

What really matters is, companies that don’t continue to experiment, companies that don’t embrace failure, they eventually get in a desperate position where the only thing they can do is a Hail Mary bet at the very end of their corporate existence. Whereas companies that are making bets all along, even big bets, but not bet-the-company bets, prevail. I don’t believe in bet-the-company bets. That’s when you’re desperate. That’s the last thing you can do.

“The dirty secret is that a significant majority of big-data projects aren’t producing any valuable, actionable results,” said Michael Walker, a partner at Rose Business Technologies, which helps enterprises build big-data systems. According to a recent report from the research firm Gartner Inc., “through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation and will be abandoned.”

Daily Links 12/16/2014

3) The Convergence of VMS and FMS The continued adoption of FMS software in 2015 will produce ramifications for other segments of the labor ecosystem, particularly project-based contingent labor. Vendor Management Systems (VMS), which are used primarily to manage temporary staff and contract labor, do not address the specific needs of freelance management.

Yet data science, as a business, is still young. As the technology moves beyond the Internet incubators like Google and Facebook, it has to be applied company by company, in one industry after another.

At this stage, there is a lot of hand craftsmanship rather than software automation.

So the aspiring software companies find themselves training, advising and building pilot projects for their commercial customers. They are acting far more as services companies than they hope to be eventually.

While that may sound like a condition to be remedied, in fact we are living in an era where uncertainty and ambiguity are increasing. The reality is that we can’t shoo it away by becoming more rigid, creating more rules, or imposing more authoritarian controls. We need to loosen control, make more whitespace, give people more autonomy, and rely on the network of loose connections to influence everyone’s actions. We need a climate of soft power in a social network based on sparsity, not density, where weak and lateral connections dominate. That is the wellspring of organizational flexibility and adaptability.

Daily Links 12/10/2014

In his 2003 book, Open Innovation, Henry Chesbrough defined this important concept. In short, open innovation is a product or technology development model that extends beyond the boundaries of a firm to involve others in a collaborative way. Today, much of this activity uses various social networking tools and technologies to empower people to generate ideas, fine-tune concepts, share knowledge or solve critical problems.

When you look at the evolution of digital measurement in the enterprise and study organizations that have achieved a significant degree of maturity, you’ll notice that they come in two distinct flavors: the analytic and the informational. Analytic organizations have strong teams studying the data and driving testing, personalization and customer lifecycle strategies. Informational organizations have widespread, engaged usage of data across the organization with key stakeholders absorbing and using data intelligently to make decisions. It’s not impossible for an enterprise to be both analytic and informational, but the two aren’t necessarily related either. You might expect that organizations that have gotten to be good in measurement would be mature in both areas, but that’s not really the common case. Instead, it seems that most enterprises have either a culture or a problem set that drives them to excel in one direction or the other.

“Garbage in, garbage out” is the cliché of data-haters everywhere. “It is not true that companies need good data to use predictive analytics,” Taylor said. “The techniques can be robust in the face of terrible data, because they were invented by people who had terrible data,” he noted.

Revolution R Open (RRO) is the enhanced distribution of R from Revolution Analytics. RRO is based on version 3.1.1 of the statistical software R and includes additional capabilities for improved performance, reproducibility and platform support.

Daily Links 12/09/2014

But the world operates differently today. Companies own less infrastructure, inventory and manufacturing equipment than ever. They’ve outsourced everything from customer service to supply chain. And a growing portion of their workforce is not on their full-time payroll. 

MetaMind delivers Artificial Intelligence enterprise solutions via its AI platform and Smart Module offerings. The general-purpose platform can predict outcomes for language, vision and database tasks, and delivers best in class accuracies on standard benchmarks.

A reader recently pointed my attention to the following quote from the composer and artist John Cage:

“If something is boring after two minutes, try it for four. If still boring, then eight. Then sixteen. Then thirty-two. Eventually one discovers that it is not boring at all.”

The scientific method, based in deduction and falsifiability, is better at proliferating questions than it is at answering them.

As a result, predictive API providers will face increasing pressure to specialize in one or a few verticals. At this point, elegant and general APIs become not only irrelevant, but a potential liability, as industry- and domain-specific feature engineering increases in importance and it becomes crucial to present results in the right parlance. Sadly, these activities are not thin adapters that can be slapped on at the end, but instead are ravenous time beasts that largely determine the perceived value of a predictive API. No single customer cares about the generality and wide applicability of a platform; each is looking for the best solution to the problem as he conceives it.

Daily Links 07/18/2014

RelateIQ and Salesforce: It’s not just about data science | VentureBeat | Big Data | by Andy Byrne, Clari

5 things I wish I knew about Tableau when I started – The Information Lab

One staffing buyer comes to mind when I think about harnessing the competitive spirit of the supplier community. This buyer discloses the ranking and number of placements for each of the top 15 vendors in the program each month on an all-supplier call, and those stats are also provided via email following the call to the vendors and internal stakeholders. This not only brings transparency, builds credibility and creates trust in the program, but it also generates a level of focus and priority to that client because of the open competition it creates.

I’ve just scratched the surface of this, but I hope you got the idea that scalability can mean quite different things. In Big Data (meaning the infrastructure side of it) what you want to compute is pretty well defined, for example some kind of aggregate over your data set, so you’re left with the question of how to parallelize that computation well. In machine learning, you have much more freedom because data is noisy and there’s always some freedom in how you model your data, so you can often get away with computing some variation of what you originally wanted to do and still perform well. Often, this allows you to speed up your computations significantly by decoupling computations. Parallelization is important, too, but alone it won’t get you very far.

Why does data need to have sex? – High Scalability -

Sex is nature’s way of bringing different data sets together, that is our genome, and creating something new that has a chance to survive.

Daily Links 07/01/2014

I’m going to argue here that a business model that could make money for software companies, while benefiting users, is creating an open market for data. Yes, your data. For sale. On an open market. For anyone to buy. Privacy is dead. Isn’t it time we leverage the death of privacy for our own gain?

The idea is to create an ecosystem around the production, consumption, and exploitation of data so that all the players can get the energy they need to live and prosper.

You need a custom MapReduce programmer every time you want to get something out of Hadoop, but that’s not the case for Spark, said Mathew. Alteryx is working toward a standardized Spark interface for asking questions directly against data sets, which broadens Spark’s accessibility from hundreds of thousands of data scientists to millions of data analysts — folks who know who to write SQL queries and model data effectively, but aren’t experts in writing MapReduce programming jobs in Java.

The Spark framework is well equipped to handle those queries, as it exploits the memory spread across all of the servers in a cluster. That means it can run analytics models at blazing-fast speeds compared to MapReduce: Programs can go as much as 100 times faster in memory or 10 times faster on disk. Those performance enhancements — and the subsequent customer demand – has prompted Hadoop distribution vendors like Cloudera and MapR to support Spark.

Namely, as enterprise applications become more data-centric, the roles of data scientist and application developer are merging. In the short-term, this means the two roles must learn collaborate more effectively and both must assume new ways of thinking. For data scientists, this means starting to think more about how the insights they uncover can be translated into repeatable form factors consumable by end-users. And application developers need to gain a better understanding of data flows and how analytic requirements impact application performance.