I’ve put together a data set that includes eighth grade math achievement scores from TIMSS 2007, per-capita GDP from the CIA World Fact Book, and country-level cultural values from the Inglehart-Welzel Cultural Map of the World. I also constructed my own Positive Attitude Towards Math (PATM) index, using four self-report items from the TIMSS data set.
Before I model this in HLM, I’m getting to know the data set. I’ve learned from somewhat painful experience that it’s better to know beforehand what’s in the data rather than doing a bunch of regressions and then later having to go back and explore why you’re getting unexpected results.
One issue is potential multicollinearity across my predictors — if two predictors are highly correlated, regression coefficient estimates may not be stable. I expect GDP to be correlated with both the dimensions from the Inglehart-Welzel cultural map, which are these:
- Traditional vs. Secular-rational – measures the extent to which religion is very important in a society. I’d expect more secular-rational countries to have higher math achievement.
- Survival vs. Self-expressive – measures the transition from industrial society to post-industrial. As economic survival becomes more assured, people become more interested in expressing themselves. I’d expect this to be negatively correlated with math achievement, because putting effort into math is typically a means of achieving economic success in the job market not expressing your authentic self.
The transition from traditional to secular-rational occurs during industrialization, so of course you’d expect GDP to be positively correlated with the secular-rationality dimension. The transition from survival to self-expressive as well occurs alongside an economic transition.
Here’s a scattermatrix of country level variables (including country mean scores on PATM and math achievement), with lowess fit lines added. While linear fit lines might show me the correlation, I think lowess fits are more informative. Can’t assume every relationship is linear.
- Rational and self expressive values don’t seem to have much relationship. That’s what you would expect, given that they are supposed to represent two distinctly different dimensions of cultural values.
- GDP is positive related to both rational and self-expressive dimensions. That’s not surprising, given that development of both rational and self expressive values occur alongside economic transitions.
- PATM is negatively related to GDP, except at the highest levels. That uptick at the end probably reflects the skewed distributions I saw on PATM for some regions with relatively low math scores (e.g., Latin America, the Middle East).
- There doesn’t seem to be a clear relationship between self-expressive values and PATM, which is not what I would expect. I would think countries that had made the post-industrial shift would show lower mean liking for math.
- There is a positive relationship between math scores and GDP but it’s not linear. The relationship seems to flatten out at high levels of GDP (which may represent the influence of the transition from survivalist to self-expressive values).
- Liking for math and math scores are negatively correlated. Why? Countries with generally higher math scores report lower mean levels of liking for math. But if you look within countries, you’ll see that higher liking for math usually means higher math scores. This is exactly why a hierarchical linear model is needed — to be able to model what’s happening within clusters while still taking into account cluster level influences (in this case, cultural values).
As far as potential multicollinearity problems go, it seems I may have problems with having GDP along with the two cultural values indexes in the model.