Category Archives: diary of a doctoral student

Dissertation topic: Constructing predictive indexes

The actual working title of my dissertation is: Modeling Social Participation as Predictive of Life Satisfaction and Social Connectedness: Scale or Index?

When I tell people my topic, I usually start with the domain area: social participation as related to life satisfaction in older U.S. adults (my data set is people age 65 and over from the Health and Retirement Study), but really, the topic is a statistical and measurement one. Participation happens to be something I’m personally interested in and fits the statistical problem area, but I could do this same project in a variety of domains with a range of constructs. Maybe I ought to change my elevator speech to start with the statistical/measurement part.

Most psychometrics concerns itself with the measurement of latent psychological constructs like attitudes, intelligence, academic achievement and so forth. Psychometricians have developed sophisticated means of constructing instruments (surveys or assessments, for example) that can measure these latent constructs. The approach taken is often based on either classical test theory or item response theory. Either way, the assumption is that observed data (such as a student’s answers to test questions or a subject’s survey responses) are caused by whatever unobserved trait is intended to be measured.

However, there are some things we want to measure that don’t fit this model. Social participation is one of them. Participation instruments generally ask the respondent to report his or her level of participation in various activities. In a latent factor setting, you would then assume some underlying level of participation that gave rise to the observed frequencies of participation. That’s not quite right though. If someone increases their participation in some area — say by joining an investment club — their overall level of participation goes up. The increase in participation in the investment club seems causally prior to the increase in overall participation. This is the opposite direction of causality than that proposed by traditional psychometric models.

Some people call a measurement instrument developed by some sort of summation of disparate items an index rather than a scale, where a scale follows the latent factor model. The development of such indexes follows a so-called formative measurement model, where what you’re trying to measure is formed of what you observe, in contrast to the development of scales that follows a reflective measurement model, where what you observe reflects the underlying latent factor of interest. In the diagram, the first figure represents formative measurement (observed indicators x1-x3 cause the latent construct eta 1) and the second figure represents reflective (observed indicators y1 to y3 reflect the level of the latent construct).

There has been plenty of criticism of formative measurement, but I think it can be made useful, and that’s the aim of my dissertation project. I’m now at the analysis stage and just beginning to really understand the usefulness and potential of formative indexes.

As an aside, I don’t like to call formative measurement “measurement.” I prefer to think of it as “modeling.” I think what you’re doing with index development is constructing a one- or few-number summary of a lot of individual data items in a way that predicts outcomes of interest. Think of the Apgar score as a good example. It gives you a one number summary of the health of the baby and its likelihood to survive and thrive, but you’re not measuring one thing in particular about the baby. Well, maybe you are measuring overall health. Hmmmm.

To be continued…

Eight-week doctoral exam prep plan

I’ve spent the last week and a half or so getting the kids started with school; in the meantime my Ph.D. comp prep has fallen off the schedule. Now I’m trying to get back to it, as the exam is less than two months away.

I searched online for information about doctoral comprehensive or qualifying exams from other departments, and found a useful exam description (pdf) from Rutgers’ doctoral program in planning and public policy. Their suggestions for preparing include this useful tip:

Practice defining concepts and succinctly discussing their relevance (e.g., What is an ANOVA test and under what circumstances is it used?). Also practice comparing concepts and commenting on the appropriateness of alternative methods (e.g., clustered vs. stratified sampling, t distribution vs. normal curve, logit model vs. linear regression). Finally, prepare yourself to discuss “big picture” issues such as research design in longer essay questions.

Their topic and reading list covers almost everything I am expected to know: research design, quantitative and qualitative methods, basic measurement, and survey sampling. To that, I’ll have to add item response theory and research ethics. I’ll also have to attack ANOVA in more depth, since that is a special favorite of my department head. Oh, and let me not forget the advanced statistical techniques I love so well: structural equation modeling, latent growth curve modeling, and (my favorite) hierarchical linear modeling.

Here’s my eight-week plan. I’m going to have to cover a lot of ground each week. For each subtopic, I’ll write a blog post, make flashcards for key points, sources, and formulas, then formulate some essay questions of my own and write answers for them.

Week Begins Topics
1 9/6 Research design, introductory stats
2 9/13 Correlation and regression, ANOVA
3 9/20 Psychometrics, validity and reliability
4 9/27 Multivariate methods, qualitative methods
5 10/4 Structural equation modeling, hierarchical linear modeling, latent growth curve modeling
6 10/11 Program evaluation theory, survey research, research ethics
7 10/18 Review
8 10/25 Review

Dot plan for autumn

I have really great memories of my first job after finishing my master’s degree. I worked as a Unix/C++ programmer on an intelligence agency software development contract. The people I worked with were really smart and the work was engaging.

Many of us at that workplace kept “.plan” (say it “dot plan”) files in our home directories that said what we were working on. You could see what someone else was doing by “fingering” them (kind of a precursor to Facebook poke, but with a reaction–a listing of the person’s .plan). Keeping public plans was a good way for us to share what we were working on, without being annoying about it. People use Twitter for that now, and I do intend to get back to Twitter, someday soon. But for now, it feels comfortable to write and think alone in my hermit-cave here.

Back to school

I completed my two big summer projects: submitted two studies to the AERA 2011 conference then prepared for and passed the SAS base programming certification exam. Now I’m thinking about back-to-school activities and fall quarter. It feels like the right time to update my plan.

These are my fall projects:

Submit a manuscript to a journal. I haven’t decided which study to rework into a journal article. Both studies are based on the TIMSS 2007 data set and fortunately I’m attending training in D.C. at the end of this month to learn more about that and other international education databases, so I think I’ll be in good shape to do this.

Prepare for my doctoral comprehensive exam, scheduled for late October. I’ll be blogging about the topics I expect to see on the exam, so if you see some tutorial-like posts, that’s why.

Study for and pass the SAS advanced programming certification. I plan to do this after taking comps, but ideally before January, when I’ll start looking for a job. Some of the most interesting statistician positions I’ve seen require SAS. Plus my advisor and I have a plan to do a missing data simulation study in the winter and she suggested we use SAS. I might have selected R if it were up to me, but I plan to use R for my dissertation research, so I’ll have both adequately covered.

Find a good middle school for my middle child. It kills me that Denver no longer supports neighborhood schools; it’s all choice choice choice. This is great when you find a school that suits your child and your family circumstances. The problem is there’s no default choice in many neighborhoods now. I don’t know anyone who sends their kids to our neighborhood middle school or high school, and I wouldn’t feel comfortable sending my daughter to either of those schools since her peers will go elsewhere. We’ll be looking at private and magnet schools. We may also consider trying to “choice in” to a traditional public school that’s near us but has a better reputation than the one that we are assigned to.

Complete 14 units of coursework. I am taking Cost-Benefit Analysis, Economic Fundamentals: Global Applications, Item Response Theory, and a required seminar in which I learn to administer IQ tests. After this quarter, I’ll have just two classes left, Qualitative Research and Analysis of Variance, and I can focus on my dissertation research and job search.

Meanwhile, keep the family happy and healthy. I’d like to get in the habit of starting my kids off each day with healthy breakfasts: scrambled eggs, berry smoothies, pancakes and waffles made with good stuff. We eat dinner together almost every night and I’d like to continue that too, including continuing to try new recipes on a regular basis so I can feed my need for novelty.

Summer break postponed

It seemed like I had no break between spring quarter and summer, as the three-credit research practicum I signed up for started immediately. My goal was to submit two paper proposals to the AERA 2011 conference, scheduled for next April in New Orleans. The due date was today. So I’m done, right?

No! I’m not! The conference planners extended the deadline by a week. I would be done if the deadline hadn’t been extended — because my collaborators would have been forced to give me their feedback on the two draft proposals by today. Then I’d address their feedback and submit. Now, they have a reprieve. And that means I’m not done with the practicum yet.

In my heart, I’m done. The proposals are good, I think. If one or both get accepted, I’ll be happy. It means I can justify a trip next year to New Orleans, where my sister lives. If they don’t get accepted, that’s okay too. My long-term plans don’t depend on presenting papers at AERA conferences. My direction is back into industry, once I finish the doctorate.

I really miss the energy of business and of tech. Tech conferences don’t have people plan their talks almost a year in advance! It’s ludicrous that I’m submitting paper proposals for a conference next April.

Even with the extension, it feels like summer break. My older two kids finished their four weeks at sleep-away camp. I withdrew the third from her day camp since now that big bro and big sis are home, she wants to be home too. We’re going to the pool, seeing movies, playing tennis, making ice cream, hanging out. It’s fun. They’re all old enough to entertain themselves when I’m working. And they don’t need too much support when we go out. This is a great period in motherhood, I think. It’s one I’m actually pretty good at — I was never great with babies, toddlers, or preschoolers. They need too much attention and they don’t have enough ideas of their own to be interesting to me.

In a week, my summer work will be over. But I have other plans: studying to take the SAS base programmer certification so I can put that on my resume and reviewing for comps (my Ph.D. comprehensive exam, that is). I’d also like to blog a bit more. This is the start. See you soon.

Energetic ideas

Seems that the one original idea I had for my book may have some basis in science or at the very least can be expressed as a stochastic process.

So maybe it wasn’t an original idea. Still, it was a good idea, an idea with energy. You’d think that being in grad school I’d find a lot of ideas with energy but no, not much energy there.

That’s one reason I’d really like to blog regularly again, to find some energy in ideas, to find energetic ideas, to connect with people who are energetically exploring ideas. But all my energy and time has been drained by data analysis labs and departmental politics and trying to overcome my temperamental limitations.

I do actually enjoy the data analysis labs but they are not stimulating in the way that exchange of ideas is stimulating. I analyze data because I want to know, I want to understand, I want to explain. I analyze data because I want to find meaning and insight and energy.

But data analysis labs are not about understanding something new; they are about showing the professor that you have some grasp of the basic concepts she has presented. Also, you have to show the professor that you can wrangle the often-angry software that will estimate the model she has presented. You don’t have to actually understand the model you’ve drawn graphically pretty, just make it run and get the right output.

I am busy now but I will be bursty later; bursty and angry and maybe not all that pretty but energetic indeed.

Plans for spring fun

Aside from undergoing death and rebirth and thereby achieving atonement with the father this spring, I have a lot of fun stuff to look forward to.

First, my classes, which promise to be extra fun since they are almost all statistics:

  • Structural equation modeling. Social scientists do this mainly by drawing graphs showing presumed causal relationships between variables but I came across these long ago in economics, and economists like to do them with equations (I think I like equations better). Sometimes this is called “causal modeling” but better to avoid that term because causal inference is hard.
  • Latent growth curve modeling. The prereq is SEM, but my advisor agreed I can take this simultaneously. Good thing because otherwise I’d have to wait two years to take this required course. In two years I’m going to be done!
  • Multivariate analysis. According to my friends, this is just a grab bag of different techniques. I’ve seen most of them before, so this should be pretty easy. And I have my data set for the project all ready. I’ll just re-use the TIMSS 2007 8th grade math achievement data.
  • Meta-analysis. Cool thing about this is it’s a great way to get a publishable paper out of a lit review plus some data analysis. I’m actually assigned as the TA but I’ve never taken it before, so I won’t be that helpful.
  • Ethnographic research. The lone qualitative class amongst a sea of quantitative. Not even sure I will take it — how fun would a quarter of all statistics be? — but it will cover one more requirement and get me one class closer to finished with coursework. Also, I like the idea of finding out what my sister the anthropology professor has been doing for the past 10 or 15 years. I’m terrified at the thought of doing my own mini-ethnography, though.

Then NCME and AERA are in Denver, score! I signed up for a few extras:

  • Data visualization using R. So excited for that, because my graph-making skills really suck. I think I could do a whole lot better if I started using R instead of fighting with SPSS.
  • Using the TIMSS 2007 International Database for secondary data analysis. I really need to know more about using TIMSS. I think I will use it in my dissertation research.
  • Grad student mentoring session. “What They’re Looking for in Hiring New Graduates: Transitioning From Graduate Student to Professional.” Maybe it’s kind of early for me to be attending something like that but since I’m on the bullet train to a doctorate I figured I might as well start thinking about what comes after. It’s probably most useful to know now, not a year or two from now, what employers are looking for.

Also taking this online workshop:

On the personal side:

  • Road trip to see Mount Rushmore. The girls already decided which stuffed animals they’re bringing even though it’s weeks away. The older one asked, “Can I bring two backpacks, one for my clothes and one for stuffed animals?”
  • April birthday marathon. So many people in my family were born in April — three out of five in my immediate family alone. The many birthdays are a great excuse to celebrate and hang out with my favorite people. But by the time mine arrives in late April I don’t want cake or presents or a party, I want a nap.
  • 500 — 500! — daffodils and tulips in my front yard. I splurged and had them planted last fall. Now that is something to look forward to.
  • Tennis classes at the neighborhood park. Love the teacher, enjoy the other students, and can’t wait to feel the smack of my racket against the ball. Too bad it doesn’t start until May. I really need to get out and play more.

Happiness expert Gretchen Rubin says one key to feeling good is having something to look forward to. I couldn’t agree more. There’s so much pleasure in anticipation, maybe even more than in the realization.

Research directions, first cut

I talked about my research interests at last night’s department meeting. Made me realize how little I know about what I want to study. Anyway, first step is to finish bulk of coursework and pass comps. In parallel though I’d like to figure out what to do for my dissertation. Right now I’m leaning towards comparing fully Bayesian hierarchical linear models to MLE with Empirical Bayesian.

Here’s the preso I gave.

Six tools for research in educational statistics

In this months’ Journal of Educational and Behavioral Statistics, Howard Wainer* identifies six necessary tools that researchers in educational and behavior statistics should master:

  • Bayesian methods. “Bayesian methods allow us to do easily what would be hard otherwise.” Sounds like I am on the right track.
  • Causal inference. It’s not enough to chant “correlation is not causation.” You need to read and understand Rubin.
  • Missing data. That’s not exactly a tool, is it? It’s a problem that afflicts all researchers working with educational and behavioral data. Wainer says, “Dealing with missing data is, quite simply, the most important practical problem facing researchers.”
  • Picturing data. “A graph of data is the best way to find something that you were not looking for.” Resources: Tufte, Flowing Data.
  • Writing clear prose. I think it’s funny he should bring this up, because to make this list more clear, he should have made each bullet point parallel (e.g., “Use Bayesian methods,” “Understand causal inference,” “Handle missing data,” “Explore data visually,” “Write clear prose.”) In other words, I agree with him on the importance of writing clear prose.
  • A deep understanding of Type I (false negative) and Type II (false positive) errors. Specifically, statistical researchers need to pay as much attention to Type II errors as to Type I.

* Wainer, H. (2010). 14 Conversations about three things. Journal of Educational and Behavioral Statistics 35(1).

Finding the right thing to work on

How do you find a dissertation topic? It’s got to be something that interests you, or you won’t feel intrinsically motivated to work on it. It’s got to be something that fits somehow into your discipline, or your department faculty won’t approve it. It’s got to be important, in some way or another, so that it’s worthy of a dissertation.

In Working hard is overrated, Caterina Fake says:

Much more important than working hard is knowing how to find the right thing to work on. Paying attention to what is going on in the world. Seeing patterns. Seeing things as they are rather than how you want them to be. Being able to read what people want. Putting yourself in the right place where information is flowing freely and interesting new juxtapositions can be seen.

And where do you find free-flowing information with interesting new juxtapositions? On the web, of course. That’s why I’m blogging again.