The normal distribution

Ph.D. Topics : Statistics

The normal distribution, or bell curve, is probably the most important probability distribution in statistics. Many quantities we observe are roughly normally distributed; the central limit theorem provides a mathematical explanation for this.

The probability density function is given by:

f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

Continue reading

links for 2010-09-12

  • "Simply by providing a context in which users establish what they are working on, and posting notes about their progress — or asking other for help to make progress — and receiving feedback as they make progress, workers using streaming apps are likely to experience time as moving more quickly. This is either associated in our minds with other experiences that make us happy, or directly makes us happy. In either case, it seems fairly obvious that users are happier when exposed to social work contexts with these characteristics."

links for 2010-09-10

The significance of t-tests

Ph.D. Topics : Statistics

No, I’m not talking about statistical significance here; I’m talking about practical significance.

The first statistical significance test an intro stats student learns is usually a t-test to test differences between group means. If she goes on to use statistics, she may never use a t-test for such a purpose again. Why not? Because few real-world data analysis projects involve just one dichotomous independent variable and one normally distributed dependent variable. It almost seems like t-tests aren’t that important.

But they are, because they:

  • provide small-sample estimators; they don’t rely on asymptotic properties like many other statistical methods
  • illustrate null hypothesis testing in a simple manner
  • present the basics of frequentist statistics in the barest form possible
  • allow you to test the significance of regression coefficients, something much more common than two-group comparisons in day-to-day data analysis

Continue reading

links for 2010-09-08

links for 2010-09-05

Eight-week doctoral exam prep plan

I’ve spent the last week and a half or so getting the kids started with school; in the meantime my Ph.D. comp prep has fallen off the schedule. Now I’m trying to get back to it, as the exam is less than two months away.

I searched online for information about doctoral comprehensive or qualifying exams from other departments, and found a useful exam description (pdf) from Rutgers’ doctoral program in planning and public policy. Their suggestions for preparing include this useful tip:

Practice defining concepts and succinctly discussing their relevance (e.g., What is an ANOVA test and under what circumstances is it used?). Also practice comparing concepts and commenting on the appropriateness of alternative methods (e.g., clustered vs. stratified sampling, t distribution vs. normal curve, logit model vs. linear regression). Finally, prepare yourself to discuss “big picture” issues such as research design in longer essay questions.

Their topic and reading list covers almost everything I am expected to know: research design, quantitative and qualitative methods, basic measurement, and survey sampling. To that, I’ll have to add item response theory and research ethics. I’ll also have to attack ANOVA in more depth, since that is a special favorite of my department head. Oh, and let me not forget the advanced statistical techniques I love so well: structural equation modeling, latent growth curve modeling, and (my favorite) hierarchical linear modeling.

Here’s my eight-week plan. I’m going to have to cover a lot of ground each week. For each subtopic, I’ll write a blog post, make flashcards for key points, sources, and formulas, then formulate some essay questions of my own and write answers for them.

Week Begins Topics
1 9/6 Research design, introductory stats
2 9/13 Correlation and regression, ANOVA
3 9/20 Psychometrics, validity and reliability
4 9/27 Multivariate methods, qualitative methods
5 10/4 Structural equation modeling, hierarchical linear modeling, latent growth curve modeling
6 10/11 Program evaluation theory, survey research, research ethics
7 10/18 Review
8 10/25 Review

links for 2010-09-02

links for 2010-09-01

  • "New research indicates that exercise also increases the sensitivity of neurons that are related to the control of the feeling of satiation. Therefore, you feel full rather than hungry sooner and/or more often.

    In rodents. So far."

  • On selection bias and measurement bias in survey data. One of my research projects is looking at another bias in survey research: response style.
  • "# Validity is a measure of the degree to which the assumptions employed in the construction of the model are thought to correspond to the real processes underlying the phenomena represented by the model.
    # Comprehensiveness is the degree to which the model is thought to succeed in capturing the major causal factors that influence the features of the behavior of the system in which we are interested.
    # Robustness is a measure of the degree to which the results of the model persist under small perturbations in the settings of parameters, formulation of equations, etc.
    # Autonomy refers to the stability of the model's results in face of variation of contextual factors.
    # Reliability is a measure of the degree of confidence we can have in the data employed in setting the values of the parameters."

Midlife career reboot

My husband Rick started work as a patent attorney this week. He was trained as an aeronautical and mechanical engineer, worked as a NASA engineer and later a Boeing executive, then went back to school in his early forties to launch a second career. This isn’t typical today, but according to Virginia Postrel, perhaps it should be. In a time of increasing life expectancy and good health in old age, we need to reframe the way we think about career evolution:

But changing that picture means exchanging today’s architectural metaphor, “building a career,” for another one: adaptive reuse. This is the human-capital equivalent of turning industrial lofts into apartments, factories into medical schools, power plants into art museums, or saw mills into shopping centers. Your original career may be economically obsolete, or you may just want a change, but your knowledge and experience still have their charms. Instead of equating success with a steady progression of better-paying jobs, each related to the previous one, this model emphasizes taking on new challenges and making new contributions, even if that means going back to school, taking a pay cut, or starting as a trainee when you’re middle-aged.

I’m engaged in some human capital adaptive reuse myself, as I upgrade my statistical education from master’s to Ph.D. level. I’m looking towards a second career that combines my experience in software development with new expertise in research design, psychometrics, and the latest in statistical modeling.

Carlo Strenger and Arie Ruttenberg, writing in the Harvard Business Journal in 2008, suggest that midlife career change is not just desirable but existentially necessary. It’s important for financial risk management too, they say:

Hanging on for dear life is usually the wrong strategy. In terms of long-term risk management, it might be much better to start a new career at a relatively young age. Many people need to start thinking about alternatives that suit their abilities and personalities when they still have two or three productive decades ahead of them. In this way, they can discover the possibilities that will allow them to work much longer and thus ensure their financial well-being.

It’s scary, for sure, to be doing this. It’s exhilarating too.