Data science: Don’t forget psychometrics

I can hardly believe that in this 5,000-word-plus blog post on data science there is not one single mention of psychometrics.

Statistician Andrew Gelman calls psychometrics, “the most underrated science.” He also says:

This reminds me of a longstanding principle in statistics, which is that, whatever you do, somebody in psychometrics already did it long before.

What? You don’t know what psychometrics is? Well, don’t feel bad. I didn’t know what it was until after I enrolled in a doctoral program that had it as one of its core topics. Yet even the program description doesn’t mention the word “psychometrics.” So what is it? Here’s one definition:

Psychometrics is the field of study concerned with the theory and technique of educational and psychological measurement, which includes the measurement of knowledge, abilities, attitudes, and personality traits. The field is primarily concerned with the construction and validation of measurement instruments, such as questionnaires, tests, and personality assessments. [Wikipedia]

So psychometrics is the science of measuring unobservable characteristics of people. Another name for those unobservables is “latent variables.” You can’t measure knowledge, abilities, attitudes, and personality traits directly. I can’t just look at you and know how good you are at math, for example. Psychometricians develop measurement instruments like standardized tests, questionnaires, IQ assessments and so forth to measure these latent psychological constructs. They rely on a vast foundation of theory and tools that help ensure these measurement instruments measure what they purport to measure (validity) and measure it consistently without excessive error (reliability).

Of course psychometrics is relevant to analyzing web data, because what is the web about anyway? People doing things online, as well as what they might like to do online (subscribe to a web service, rent a movie, buy a nutritional supplement). Web properties want to use their vast pools of data to tell them something about the psychology and predicted behavior of the people using their websites. Psychometrics can help.