Daily Links 07/18/2014

RelateIQ and Salesforce: It’s not just about data science | VentureBeat | Big Data | by Andy Byrne, Clari

5 things I wish I knew about Tableau when I started – The Information Lab

One staffing buyer comes to mind when I think about harnessing the competitive spirit of the supplier community. This buyer discloses the ranking and number of placements for each of the top 15 vendors in the program each month on an all-supplier call, and those stats are also provided via email following the call to the vendors and internal stakeholders. This not only brings transparency, builds credibility and creates trust in the program, but it also generates a level of focus and priority to that client because of the open competition it creates.

I’ve just scratched the surface of this, but I hope you got the idea that scalability can mean quite different things. In Big Data (meaning the infrastructure side of it) what you want to compute is pretty well defined, for example some kind of aggregate over your data set, so you’re left with the question of how to parallelize that computation well. In machine learning, you have much more freedom because data is noisy and there’s always some freedom in how you model your data, so you can often get away with computing some variation of what you originally wanted to do and still perform well. Often, this allows you to speed up your computations significantly by decoupling computations. Parallelization is important, too, but alone it won’t get you very far.

Why does data need to have sex? – High Scalability -

Sex is nature’s way of bringing different data sets together, that is our genome, and creating something new that has a chance to survive.

Daily Links 07/01/2014

I’m going to argue here that a business model that could make money for software companies, while benefiting users, is creating an open market for data. Yes, your data. For sale. On an open market. For anyone to buy. Privacy is dead. Isn’t it time we leverage the death of privacy for our own gain?

The idea is to create an ecosystem around the production, consumption, and exploitation of data so that all the players can get the energy they need to live and prosper.

You need a custom MapReduce programmer every time you want to get something out of Hadoop, but that’s not the case for Spark, said Mathew. Alteryx is working toward a standardized Spark interface for asking questions directly against data sets, which broadens Spark’s accessibility from hundreds of thousands of data scientists to millions of data analysts — folks who know who to write SQL queries and model data effectively, but aren’t experts in writing MapReduce programming jobs in Java.

The Spark framework is well equipped to handle those queries, as it exploits the memory spread across all of the servers in a cluster. That means it can run analytics models at blazing-fast speeds compared to MapReduce: Programs can go as much as 100 times faster in memory or 10 times faster on disk. Those performance enhancements — and the subsequent customer demand – has prompted Hadoop distribution vendors like Cloudera and MapR to support Spark.

Namely, as enterprise applications become more data-centric, the roles of data scientist and application developer are merging. In the short-term, this means the two roles must learn collaborate more effectively and both must assume new ways of thinking. For data scientists, this means starting to think more about how the insights they uncover can be translated into repeatable form factors consumable by end-users. And application developers need to gain a better understanding of data flows and how analytic requirements impact application performance.

Daily Links 06/26/2014

Why your kids will want to be data scientists

According to Burtch Works’ 2014 study of salaries for data scientists – typically those with university degrees in a quantitative field of study that are comfortable with programming languages and statistical methods – the median salary for employees not working as part of a team was $80,000 for those with 0-3 years’ experience and $150,000 for those with 9 or more years’ experience.

At the managerial level the median salaries were higher, with those responsible for a team of 1-3 earning $140,000 and those responsible for a team of 10 or more earning $232,500.

By contrast, the mean average annual income for a lawyer in America was $131,990 in 2013, while doctors earned $183,940, according to data from the U.S. Bureau of Labor Statistics.

Daily Links 06/09/2014

The reason I’m skeptical is because I believe in the science portion of our field’s name. One of the primary things that separates a data scientist from someone just building models is the ability to think carefully about things like endogeneity, causal inference, and experimental and quasi-experimental design. Data scientists must understand and think about things like data generating processes and reason through how misspecifying them could influence or undermine the inferences they draw from their analyses.

But what data can do is it can disprove things, often quite easily. While Scott Winship will argue to death that Piketty’s market-income data is not the best kind of data to understand changes in income inequality, but what you can’t do is proclaim or expound a theory explaining a decrease in market income inequality.

Daily Links 06/08/2014

The basic idea here was that if we exposed students more directly to the educational market that Bennett had identified—making them borrow the money to attend—we could then count on those self-interested economic actors to behave as consumers are supposed to and do something about the problem. The policy was “to re-emphasize self-help,” per the New York Times. But this particular market has never worked that way, and the only effect, of course, was to raise up the Himalayas of student debt that are such a familiar part of the landscape today.

If we can’t or won’t speak in our authentic voices, if we disconnect from our own inner authority, if we refuse to ask for what we need (or don’t know what we need), how can men and women reach across the divide that separates them and recognize each other for who we truly are? Maybe we shouldn’t cluck our tongues over the rising divorce rate; maybe we should just be awed and amazed that men and women stay together for any length of time at all.

Nilofer Merchant suggests a small idea that just might have a big impact on your life and health: Next time you have a one-on-one meeting, make it into a “walking meeting” — and let ideas flow while you walk and talk.

Wonder if any of my colleagues would be up for that…

Big Data teaches you to build these systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy to understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they’re built.

Chapter one “a new paradigm for big data” is free – need to check it out.

Cloud security trends: The significant six-pack | Smarter Computing Blog

Instead of protecting data portals or pipelines, data-centric security focuses on data’s three states:

at rest,
in motion
and in use.

Daily Links 06/06/2014

Weiner, who in the past life worked for Warner Brothers and Yahoo, said his company wants to build the “economy graph” which would allow LinkedIn to map jobs to skills, talent, companies and geographies.

Ever since the iPad landed in the market as a clear luxury item without a specifically defined use case, market researchers have been tracking it, alongside other tablets, to figure out what we actually use them for and where. In short, we use tablets almost everywhere we don’t use laptops, or where we would use laptops in an absolute pinch but would prefer not to: bedrooms, living rooms, bathrooms, and kitchens. Tablets have filled in a particular kind of pared-down computer experience everywhere the laptop wasn’t, or everywhere that it might have been incidentally but not quite suited to the job—sitting open and waiting to be spilled on, or nestled in some blankets on a couch or in a bed, the less-than-capacious battery trickling away.

Daily Links 06/05/2014

School funding declined in 2012 for the first time in 35 years, reports the Census Bureau. New York was the top spender, at $19,552 per pupil, while Utah spent only $6,206.

The ultimate goal is to reduce friction, error, and deliver the value of predictive analytics accessibly and quickly. “It’s all about not treating advanced analytics as this big scary thing requiring PhD’s and a big cumbersome architecture,” Hillion says. Instead, the goal is to allow enterprises to “look at an existing business problem and get results this week.”

The two classes of women also defined “sluttiness” differently, but neither definition had much to do with sexual behavior. The rich ones saw it as “trashiness,” or anything that implied an inability to dress and behave like an upper-middle-class person….

The poorer women, meanwhile, would regard the richer ones as “slutty” for their seeming rudeness and proclivity for traveling in tight-knit herds. As one woman said, “Sorority girls are kind of whorish and unfriendly and very cliquey.”