psychometrics, statistics

Confirmatory factor analysis: The basics

My psychometrics prof did a quick intro to confirmatory factor analysis in class last night, and since next quarter I’m taking a class out of sequence that depends on it (latent growth curve modeling), thought I’d summarize here to consolidate what I learned.


You can use exploratory factor analysis (EFA) or confirmatory factor analysis (CFA) to investigate the construct validity of a psychometric instrument. With exploratory, you don’t specify the factor structure up front — the analysis finds factors and their loadings on items for you. With confirmatory, you specify the factors and how they relate to items from the instrument.

The “factors” you are looking for are also known as latent variables, things you can’t directly measure (that’s why they’re called “latent”). As an example, intrinsic motivation to study math is a latent variable or construct. There’s no way to measure it directly so you develop some measurement instrument  — usually a set of items asking about that construct. You might assess more than one construct at a time, for example intrinsic and extrinsic motivation, and so you want to see if the items relate to the underlying factors as you theorized.

You can do a confirmatory factor analysis with EFA techniques (e.g., with principal axis factoring and oblique or orthogonal factor rotation) but there are additional benefits to using CFA (from Gable & Wolf, 1993):

  • You get a unique factor solution with CFA
  • CFA assesses the degree of model fit
  • CFA output on individual model parameters suggests how to improve the model
  • You can test factorial invariance across groups

How to do a simple CFA

You express a factor analysis model using structural equation modeling (SEM) notation. A circle or oval indicates a latent variable (a.k.a. factor). A square or rectangle indicates an observed variable (a.k.a. indicator). A single-headed arrow shows causality, with factors causing indicators, not the other way around. A curved double-headed arrow indicates unanalyzed assocation.

Here’s an example of a CFA diagram that I made in Amos. This shows two factors — positive attitudes towards math (should be positive attitude toward math I think) and extrinsic motivation to learn math, each with four indicators.

Then you input correlations or covariances from your data set and run the analysis. You’ll get estimated factor loadings as well as a bunch of measures of goodness of fit of the model.

I won’t go over running the analysis here — I haven’t actually gotten that far myself — but here’s what to look for to see if you have good model fit:

  • Root mean squared error of approximation (RMSEA) should be less than .05.
  • Comparative Fit Index (CFI) — excellent model if > .95, good if between .90 and .95, poor if less than .95

The prof also mentioned the chi square measure of fit, but admitted it is worthless. Note for the chi square test statistic you are looking for a nonsignificant chi-square not a significant one, in contrast to most statistical tests. For large samples the chi square will virtually always be significant, so some statisticians recommend dividing by degrees of freedom. Some say that a model with chi square / df less than three is good.

Here’s a useful page summarizing more measures of goodness of fit of structural equation models.


Gable, R. K., & Wolf, M. B. (1993). Instrument development in the affective domain: Measuring attitudes and values in corporate and school settings. Evaluation in education and human services. Boston: Kluwer Academic.

psychology, psychometrics

Because I’m bored: A post about novelty seeking

Do you know anyone who is easily bored? Always looking for the novel, the exciting, the stimulating? You might see this trait manifest in different ways: the heli-skiier, the intellectual omnivore, the golf-sensation/sex-addict, even the heroin abuser likely have in common a drive to avoid boredom and a twin drive to experience excitement in whatever form works best for them.

Some psychologists call this novelty seeking, sensation seeking, or stimulation seeking.* Here’s one definition of sensation seeking:

a trait defined by the seeking of varied, novel, complex, and intense sensations and experiences, and the willingness to take physical, social, legal, and financial risks for the sake of such experiences. (Zuckerman & Kuhlman, 2000)

I took the Zuckerman-Kuhlman Personal Questionnaire sensation seeking scale and scored 84%, High bordering on Very High. I’m easily bored. I’m always looking for the next excitement, usually intellectual but could be something else. My need for novelty makes it hard to ever reach equilibrium because I inevitably get distracted by sparkly objects passing by.

It’s a good thing

That definition makes sensation seeking sound like a mostly negative thing (all those risks!) but in my experience, it’s not. Because I’m easily bored, I’ve had a pretty exciting life, I think. In Penelope Trunk’s framing, I’ve prioritized having an interesting life over a happy one. In the past, I have sacrificed comfort and stability for the new and different, whether it was a new job or a new career or a new house or a new state.

But now I find myself leaning more towards trying to have a happy, stable life rather than an interesting and exciting one. Sensation seeking declines with age, and I think I might have reached a pretty optimal level for where I am in my life. I’m totally willing to take intellectual leaps and risks, where some people in their early 40s might be stuck with tired ideas. I wouldn’t rule out moving our family yet again if the right opportunity arose. I take risks like blogging about random stuff that enters my bored brain. And yet I’m settled and stable in many ways I couldn’t have imagined in my 20s: I am satisfied with my husband, my neighborhood, my house, my career path, my colleagues.

It’s in the genes

There’s evidence of a genetic basis for novelty seeking, and also evidence that novelty seeking may be a risk factor for drug dependence.

And, novelty seekers may be more intelligent on average. From a 2002 paper in the Journal of Personal and Social Psychology:

The prediction that high stimulation seeking 3-yr-olds would have higher IQs by 11 yrs old was tested in 1,795 children on whom behavioral measures of stimulation seeking were taken at 3 yrs, together with cognitive ability at 11 yrs. High 3-yr-old stimulation seekers scored 12 points higher on total IQ at age 11 compared with low stimulation seekers and also had superior scholastic and reading ability. Results replicated across independent samples and were found for all gender and ethnic groups. Effect sizes for the relationship between age 3 stimulation seeking and age 11 IQ ranged from 0.52 to 0.87. Findings appear to be the first to show a prospective link between stimulation seeking and intelligence. It is hypothesized that young stimulation seekers create for themselves an enriched environment that stimulates cognitive development. [emphasis added]

I think adult stimulation/sensation/novelty seekers can do the same thing.


*Are novelty-seeking and sensation-seeking the same thing? Maybe.

psychometrics, statistics

Measurement, comparison, and variability

Andrew Gelman:

the activity we call “statistics” exists in the middle of the Venn diagram formed by measurement, comparison, and variability. No two of the three is enough.

I’m dealing with the middle of that Venn diagram right now:

  • Measurement — am I measuring what I want to measure? I wanted to measure intrinsic motivation to learn math but I’ve got an index that’s likely better described as positive attitudes towards math (PATM). And I don’t even know what it’s really measuring.
  • Comparison — can I compare PATM across countries? If I use it as a regression predictor for math achievement, are results across countries comparable?
  • Variability — how do scores on PATM vary across countries? Certainly not enough to just know the mean. This provides some information to answer the question about comparison.

When I plot histograms of different countries’ scores on PATM, I see some strange results. Some countries (most English-speaking and Asian for example)  have a roughly normal distributions of scores, with extra mass sometimes at the top or bottom since scores are between -6 and 6. But most Middle Eastern, Latin American, and African countries have negatively skewed distributions, with half or more of the students hitting the very top score. And Eastern European countries tend to have positively skewed distributions, with more students saying they don’t like math than they do.

For example, here’s Japan:

(My histograms use IMLM not PATM because I made them when I was still calling the measure Intrinsic Motivation to Learn Math).

And here’s Jordan:

What’s the problem? I would expect positive attitudes towards math to be basically normally distributed throughout the population — most people would be neutral, some would dislike it, and some would really like it. If a lot of students say “I really really like it” (or, as in Eastern Europe, “I really really don’t like it”) then I wonder if my measurement instrument is measuring some additional thing besides what they feel about math. I don’t know that I can compare a Japanese student who scores 6 on PATM to a Jordanian student who scores the same.

What to do?

I’m not sure what to make of this or what to do with it. One thought I had is to just do my initial analysis with the countries that have roughly normal distributions of scores. For those cases, I feel like I can have some confidence that the PATM measure is actually getting at the actual distribution of positive attitudes towards math in the population vs. measuring something else or something in addition (optimist/pessimist mindset? lack of any rigorous math education which would separate the like-maths from the dont-like-maths?)

I guess I might also correlate skewness of the PATM distribution with other measures, for example, the country-level cultural measures I’m using to characterize the context in which students learn math or even just with math scores. Maybe that would give some insight into what’s going on here.

This is another example of how useful it was for me to present at our research meeting on Tuesday. One of the other students asked if there was good variability on the scores. I said, “sure, all the countries vary across all the values.” But then I realized the point she was making: how do they vary across the score levels… now that is important.


Measuring intrinsic motivation vs. positive attitude towards math

One comment I got on my research directions presentation was that my measure of intrinsic motivation to study math was not measuring motivation. I called it “liking math” in the chart I put together, but what is it really? It is composed of these four items:

  • I would like to take more math
  • I enjoy learning math
  • Math is boring (reversed)
  • I like math

I should probably call it “positive affect towards math” or better “positive attitudes towards math.” TIMSS calls it the latter (PATM) though they have created a transformed index based on the raw item scores. They report PATM as low, medium, high while I just averaged the scores. Their measure has less information than mine.

You might measure intrinsic motivation with items like this, from the Intrinsic Motivation Inventory:

  • I enjoyed doing this activity very much
  • This activity was fun to do.
  • I thought this was a boring activity.   (R)
  • This activity did not hold my attention at all.            (R)
  • I would describe this activity as very interesting.
  • I thought this activity was quite enjoyable.
  • While I was doing this activity, I was thinking about how much I enjoyed it.

Seems to me that the “positive attitudes towards math” and intrinsic motivation are going to be highly correlated. But maybe you could have positive affect towards math but not much intrinsic motivation? I can’t really see that.

I don’t think this distinction is critical to my project. What I’m investigating is how country-level cultural values influence “returns” to a positive attitude towards math that can be invested in math study as a sort of raw material. I would prefer to look at intrinsic motivation but I’m stuck with what’s in my data set.

Anyway, the issue I’m trying to explore is how different contexts (in this case different cultural values) better support or discourage higher achievement in math for students that have good raw material for learning math.

One potential problem is that positive attitudes towards math are partially caused by prior success in math. So if you’ve been unable to succeed in math in the past, you’re going to have lower PATM. Thus the distributions of PATM across countries could be very different; countries with poor educational systems might have positively skewed (left-leaning) PATM distributions while countries with good ones might have negatively skewed PATM distributions. I’m not sure how this would affect slopes of country-specific math achievement on PATM regressions.