Tag Archives: latent class analysis

Two kinds of people in the world…

… those that like to classify people into different kinds and those that don’t. I’m a classifier.

That’s why I’m intrigued by latent class analysis (LCA), where you statistically divide up people into unobserved classes based on some observed variables (like behavior). Take the example of autism. Is Asperger’s Syndrome on the autistic spectrum or is it an altogether different thing? LCA might be able to answer that question.

I’ve spent the last couple days reading through simulation studies on identifying classes in an LCA-type technique called growth mixture modeling (GMM) where you try to identify classes underlying different developmental trajectories. The oft-cited example in this area is alcohol use, tracked during adolescence and sometimes into adulthood. These studies typically find a few distinctly different trajectories, so different that they (apparently) qualify as different latent classes. For example, this 2003 study found five growth trajectories:

  • Occasional very light drinkers
  • Moderate escalators
  • Infrequent bingers
  • Rapid escalators
  • De-escalators

I’m thinking of designing and running my own simulation study of growth mixture modeling, starting from the ideas in Bauer & Curran (2003). They demonstrated that GMM using information criteria routinely in use at that time would likely extract too many classes given non-normal inputs.

I”m thinking I could go the opposite way: look at cases where there are multiple classes generating the data and see what happens when you treat the data as coming from a single population. Jedidi, Jagpal, & DeSarbo (1997) tackled this question in the case of LCA (not growth curve analysis) with applications to marketing.

But what I’m struggling with is this: when you see non-normal data, is that because there really are multiple classes generating that? Or is the data inherently non-normal? How can you detect the difference, given that non-normal distributions can be approximated by mixtures of normal distributions?

On the one hand, I have this philosophical sense that there aren’t any “classes” of people in the world, just different ways of classifying. On the other hand, genotype differences are real, so I need to keep the medical interpretation in mind. For example there is clearly a class of people who have cystic fibrosis compared to a much larger class of people who do not. Those are the sorts of situations I need to keep in mind when I design the simulation. Alcohol use is interesting but I’m not sure I’d use it as a template for what I’d like to explore.