Modeling scale usage heterogeneity the Bayesian way

Posts in my journal club category are my summaries and thoughts on journal articles I read. I’ve found I absorb material much better if I try to summarize it in a way that might make sense to someone else. The article covered here offers a potential solution to a problem I ran into in the TIMSS data set.

Rossi, P.E., Gilula, Z., Allenby, G.M. (2001). Overcoming scale usage heterogeneity: A Bayesian hierarchical approach. Journal of the American Statistical Association 96(453), 20-31.

Abstract. Questions that use a discrete ratings scale are commonplace in survey research. Examples in marketing include customer satisfaction measurement and purchase intention. Survey research practitioners have long commented that respondents vary in their usage of the scale: Common patterns include using only the middle of the scale or using the upper or lower end. These differences in scale usage can impart biases to correlation and regression analyses. To capture scale usage differences, we developed a new model with individual scale and location effects and a discrete outcome variable. We model the joint distribution of all ratings scale responses rather than specific univariate conditional distributions as in the ordinal probit model. We apply our model to a customer satisfaction survey and show that the correlation inferences are much different once proper adjustments are made for the discreteness of the data and scale usage. We also show that our adjusted or latent ratings scale is more closely related to actual purchase behavior.

Assume the observed item indicators (matrix X) are a discrete version of underlying latent continuous data Y. i indexes the individuals and j the questions. There are K+1 common, ordered cutoff points ck, the first at negative infinity and the last at positive infinity such that

$x_{i,j} = k \;\; \textup{if} \;\;c_{k-1}\leq y_{i,j}\leq c_k$

The underlying latent continuous variables are distributed multivariate normal:

$y_i \sim N(\mu^*_i, \Sigma^*_i)$

The cutoffs discretize the latent variable Y. This is similar to an multinomial probit model but we’re not interested in the conditional distribution of one discrete variable but rather the joint distribution of a bunch.

This model provides for a different mean vector and covariance matrix for each respondent, but the authors simplify it by using a respondent-specific location-scale shift:

$y_i = \mu + \tau_i \iota + \sigma_i z_i$

$z_i \sim N(0, \Sigma)$

This allows for acquiescent/disacquiescent response styles, for overuse of a particular response value, and for extreme response styles:

• Acquiescent (disacquiescent) would be represented with a large positive (negative) location shift and a shrunken scale parameter.
• Overuse of a particular response value would be represented by a location shift to that value with a shrunken scale parameter.
• Extreme response styles would be represented with no scale shift and a very large scale parameter, which would tend to put a lot of probability density into the two tails.

The location and log scale parameters are modeled as bivariate normal, allowing them to be correlated with each other:

$\begin{bmatrix} \tau_i\\ \textup{ln} \: \sigma_i \end{bmatrix} \sim N(\varphi, \Lambda)$

You need to specify or model the cutoff values somehow. You could assume them to be known, say equally spaced between the actual values on the rating scale. This model specifies them in a way that allows for nonlinear spread, which you can imagine might be the case:

$c_k = a + bk + ek^2$

The authors go over a number of assumptions that force identification of the model. I get the need for this if not quite understanding why they did what they did, or what implications it has. Will come back to that at some point.

Then you need priors for mu, sigma, phi, lambda, and e.  They use flat priors on the means and the “cutoff” parameter e and inverse-Wishart priors for the covariance matrices.

And from there it’s just an easy simulation problem. Ha, right!

Thankfully, I’m not on my own trying to understand and implement something like this because Rossi, Allenby, & McCulloch wrote a textbook that includes a case study dealing with it. There’s even software and data sets to go with it. But since Penrose doesn’t have the book, I have to wait to get it from the University of Northern Colorado, darn. Too bad, because it would have been fun to spend spring break sorting it all out.