Which of the following must be present in order to make a causal claim between two variables?

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Correlation and Causation

What are correlation and causation and how are they different?


Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).

For example, for the two variables "hours worked" and "income earned" there is a relationship between the two if the increase in hours worked is associated with an increase in income earned. If we consider the two variables "price" and "purchasing power", as the price of goods increases a person's ability to buy these goods decreases (assuming a constant income).

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.

Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.

Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.



Why are correlation and causation important?

The objective of much research or scientific analysis is to identify the extent to which one variable relates to another variable. For example:

  • Is there a relationship between a person's education level and their health?
  • Is pet ownership associated with living longer?
  • Did a company's marketing campaign increase their product sales?

These and other questions are exploring whether a correlation exists between the two variables, and if there is a correlation then this may guide further research into investigating whether one action causes the other. By understanding correlation and causality, it allows for policies and programs that aim to bring about a desired outcome to be better targeted.

How is correlation measured?
For two variables, a statistical correlation is measured by the use of a Correlation Coefficient, represented by the symbol (r), which is a single number that describes the degree of relationship between two variables.

The coefficient's numerical value ranges from +1.0 to –1.0, which provides an indication of the strength and direction of the relationship.

If the correlation coefficient has a negative value (below 0) it indicates a negative relationship between the variables. This means that the variables move in opposite directions (ie when one increases the other decreases, or when one decreases the other increases).

If the correlation coefficient has a positive value (above 0) it indicates a positive relationship between the variables meaning that both variables move in tandem, i.e. as one variable decreases the other also decreases, or when one variable increases the other also increases.

Where the correlation coefficient is 0 this indicates there is no relationship between the variables (one variable can remain constant while the other increases or decreases).

While the correlation coefficient is a useful measure, it has its limitations:

Correlation coefficients are usually associated with measuring a linear relationship.


For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear (or straight line) relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear, where the correlation coefficient may be closer to 0.


Care is needed when interpreting the value of 'r'. It is possible to find correlations between many variables, however the relationships can be due to other factors and have nothing to do with the two variables being considered.
For example, sales of ice creams and the sales of sunscreen can increase and decrease across a year in a systematic manner, but it would be a relationship that would be due to the effects of the season (ie hotter weather sees an increase in people wearing sunscreen as well as eating ice cream) rather than due to any direct relationship between sales of sunscreen and ice cream.

The correlation coefficient should not be used to say anything about cause and effect relationship. By examining the value of 'r', we may conclude that two variables are related, but that 'r' value does not tell us if one variable was the cause of the change in the other.

How can causation be established?

Causality is the area of statistics that is commonly misunderstood and misused by people in the mistaken belief that because the data shows a correlation that there is necessarily an underlying causal relationship

The use of a controlled study is the most effective way of establishing causality between variables. In a controlled study, the sample or population is split in two, with both groups being comparable in almost every way. The two groups then receive different treatments, and the outcomes of each group are assessed.

For example, in medical research, one group may receive a placebo while the other group is given a new type of medication. If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes.

Due to ethical reasons, there are limits to the use of controlled studies; it would not be appropriate to use two comparable groups and have one of them undergo a harmful activity while the other does not. To overcome this situation, observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups' behaviours and outcomes and observe any changes over time.

The objective of these studies is to provide statistical information to add to the other sources of information that would be required for the process of establishing whether or not causality exists between two variables.

Return to Statistical Language Homepage

Further information

ABS:


1500.0 - A guide for using statistics for evidence based policy

Which of the following must be present in order to make a causal claim between two variables?

What do the studies suggest about why people procrastinate? Photo credit: Timothy Hodgkinson/Shutterstock


The BBC magazine, Science Focus, collected a set of studies on procrastination that might help students at the beginning of a new school year.  The journalist opens with a couple of simple frequency claims. 

[If you procrastinate,] you’re not alone: an estimated 20 per cent of adults (and above 50 per cent of students) regularly procrastinate.

Then the article addresses three topics: 1. Is procrastination a sign of poor time management? 2. Is procrastination unhealthy? 3. How can we stop? Let's take each topic and apply some research methods content as we go.

1. Is procrastination a sign of poor time management?

The journalist quoted psychologist Fuschia Sirois, who explained that contrary to what many people believe, procrastinators aren't more likely to be bad at time management. Instead, she explains, it's about emotion regulation:

“At its core, procrastination is about not being able to manage your moods and emotions. Although many think impulsivity and self-control are the problems – and they do play a factor – underneath is a poor emotional response.”

As Sirois explains, every person faces stressful situations, demanding tasks that trigger brain activity that involves a brain region known amygdala. And it’s the amygdala that processes emotions and signals threats, capable of prompting a ‘fight or flight’ response linked to procrastination.

“Interestingly, people who say they are chronic procrastinators tend to have larger grey matter volume in the amygdala,” says Sirois.

“This means they will also be more sensitive to the potential negative consequences of their actions, leading to more negative emotions and procrastination.”

a) What were the variables in the amygdala study? Was this study correlational or experimental? 

b) Sketch a scatterplot (or bar graph) of the results Dr. Sirois describes here.

2. Is procrastination "bad for your health"?

The journalist implies causality with this headline, and with lines like these: 

...procrastination can cause a lot more problems than missed deadlines. Over decades Sirois has examined the impact of chronic procrastinating on a person’s health, her findings worrying at best – and downright terrifying at worst.

Let's see if the evidence can support these causal claims. He quotes Dr. Sirois again, who stated, 

“People who chronically procrastinate – people who make it a habit – have higher levels of stress and a greater number of acute health problems. They are more likely to have headaches or insomnia or digestive issues. And they’re more susceptible to the flu and colds.”

Even more alarming, Sirois has found that procrastination is a factor that can lead to hypertension and cardiovascular disease, with chronic procrastinators more likely to put off healthy behaviour such as exercise.

c) Is Dr. Sirois describing correlational or experimental evidence here? (Explain your answer by specifying the variables she mentions).

d) Why can't the evidence described by Dr. Sirois support the journalist's assertions about "the impact of chronic procrastinating"? Specify the temporal precedence problem and the internal validity problems here (when you discuss internal validity, specify a particular third, or "C" variable that might correlate with both A and B). 

3. How can we stop procrastinating?

In this section, the journalist describes some intervention studies targeting procrastination.

For example, one compelling Psychological Science paper described how downsizing larger metrics of time (think 48 hours instead of 2 days, or 10,950 days instead of 30 years) can make events seem more immediate, prompting people to engage in upcoming tasks.

You can follow the underlined link in the quoted text (above) to see that this paper described several experiments (for example, the abstract states, "we manipulated time metric".

e) Given the journalist's description (and the empirical study's abstract), name the independent variable in this experiment and name its levels.

f) Name the dependent variable in this experiment.

g) Sketch a bar graph of the study's result, as described by the journalist. 

h). Because this study was an experiment, it is more likely to support causation.
Think about something you have to do in the near future. How can you apply the results from the Lewis and Oyserman study in Psychological Science to that task, to prevent yourself from procrastinating on it? 

Selected Answers

a) One variable was "saying you are a chronic procrastinator or not" and the other was "volume of grey matter in the amygdala". Both variables must have been measured, so this was a correlational study.  

b) One axis should have "grey matter volume in amygdala" and the other should have "level of chronic procrastination"; the scatterplot should have a positive slope, but you don't have information on how strong the relationship was, so you can't really know how spread out the dots should be. 

c) This is likely correlational evidence. They are measuring (not manipulating) people's procrastination habits. They are measuring (one cannot manipulate) people's acute health problems, susceptibility to the flu, hypertension, and cardiovascular disease. 

d) We can take acute health problems as an example. Even if there is an association between chronic procrastination (A) and acute health problems (B), we still don't know if the procrastination led to the health problems (A to B), or if the health problems lead to more procrastination (B to A). In addition, there could be some outside variable such as life stressors or poverty (C) that is associated with both procrastination (A) and health problems (B). 

e) The independent variable is apparently "time metric" and the levels were hours vs. days, or perhaps days versus years. 

f) The dependent variable seems to be "how immediate events seem to be"

g) The x-axis should have hours and days, and the y-axis should indicate "how immediate events seem to be". The bar for "hours" should be higher than the bar for "days". 
This study seems to suggest that if you have homework due in two days, you might be more likely to get started on it if you say to yourself, "My homework is due in 48 hours." 


Page 2

Which of the following must be present in order to make a causal claim between two variables?

Is social media use responsible for depressed mood? Photo: Ian Allenden/Alamy stock

Do smartphones harm teenagers? If so, how much? In this blog, I've written before about the quasi-experimental and correlational designs used in research on screen time and well-being in teenagers. In that post you can practice identifying the different designs we can use to study this question. 

Today's topic is more about the size of the effect in studies that have been published. A recent Wired story tried to put the effect size in perspective. 

One side of the argument, as presented by Robbie Gonzalez in Wired, scares us into seeing social media as dangerous.

For example, first

...there were the books. Well-publicized. Scary-sounding. Several, really, but two in particular. The first, Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked, by NYU psychologist Adam Alter, was released March 2, 2017. The second, iGen: Why Today's Super-Connected Kids are Growing Up Less Rebellious, More Tolerant, Less Happy – and Completely Unprepared for Adulthood – and What That Means for the Rest of Us, by San Diego State University psychologist Jean Twenge, hit stores five months later.

In addition,

...Former employees and executives from companies like Facebook worried openly to the media about the monsters they helped create. 

But is worry over phone use warranted? Here's what Gonzalez wrote after talking to more researchers:

When Twenge and her colleagues analyzed data from two nationally representative surveys of hundreds of thousands of kids, they calculated that social media exposure could explain 0.36 percent of the covariance for depressive symptoms in girls.

But those results didn’t hold for the boys in the dataset. What's more, that 0.36 percent means that 99.64 percent of the group’s depressive symptoms had nothing to do with social media use. Przybylski puts it another way: "I have the data set they used open in front of me, and I submit to you that, based on that same data set, eating potatoes has the exact same negative effect on depression. That the negative impact of listening to music is 13 times larger than the effect of social media."

In datasets as large as these, it's easy for weak correlational signals to emerge from the noise. And a correlation tells us nothing about whether new-media screen time actually causes sadness or depression. 

 There are several things to notice in the extended quote above. First let's unpack what it means to, "explain 0.36% of the covariance". Sometimes researchers will square the correlation coefficient r  to create the value R2. The R2  tells you the percentage of variance explained in one variable by the other (incidentally, they usually say "percent of the variance" instead of "percent of covariance."). In this case, it tells you how much of the variance in depressive symptoms is explained by social media time (and by elimination, it tells you what percentage is attributable to something else). We can take the square root of 0.0036 (that's the percentage version of 0.36%) to get the original r between depressive symptoms and social media use. It's r = .06. 

Questions

a) Based on the guidelines you learned in Chapter 8, is an r of .06 small, medium, or large? 

b) Przybylski claims that the effect of social media use on depression is the same size as eating potatoes. On what data might he be basing this claim? Illustrate your answer with two well-labelled scatterplots, one for social media and the other for potatoes. Now add a third scatterplot, showing listening to music. 

c) When Przybylski states that the correlation held for the girls, but not the boys, what kind of model is that? (Here are your choices: moderation, mediation, or a third variable problem?)

d) Finally, Przybylski notes that in large data sets, it's easy for weak correlation signals to appear from the noise. What statistical concepts are being applied here? 

e) Chapter 8 presents another example of a large data set that found a weak (but statistically significant) correlation. What is it? 

f) The discussion above between Gonzalez and Przybylski concerns which of the four big validities? 

g) Finally, Przybylski mentions that "a correlation tells us nothing about whether new-media screen time actually causes sadness or depression".  Why not? 

Suggested answers:

a) An r of .06 is probably going to be characterized as "small" or "very small" or even "trivial."  That's what the "potatoes" point is trying to illustrate, in a more concrete way. 

b) One scatterplot should be labeled with "potato eating" on the x axis and "depression symptoms" on the y axis. The second scatterplot should be labeled with "social media use" on the x axis and "depression symptoms" on the y axis. These first two plots should show a positive slope of points with the points very spread out--to indicate the weakness of the association. The spread of the first two scatterplots should be almost the same, to represent the claim the two relationships are equal in magnitude. The third scatterplot should be labeled with "listening to music" on the x axis and "depression symptoms" on the y axis, and this plot should show a much stronger, positive correlation (a tighter cloud of points).

c) It is a moderator. Gender moderates (changes) the relationship between screen use and depression. 

d) Very large data sets have a lot of statistical power. Therefore, large data sets can show statistical significance for even very, very, small correlations--even correlations that are not of much practical interest. A researcher might report a "statistically significant' correlation, but it's essential to also ask about the effect size and its practical value (the potatoes argument).  Note: you can see the r = .06 value in the original empirical article here, on p. 9. 

e) The example in Chapter 8 is the one about meeting one's spouse online and having a happier marriage--that was a statistically significant relationship, but r was only .03. That didn't stop the media from hyping it up, however. 

f) Statistical validity

g) The research on smartphones and depressive symptoms is correlational, making causal claims (and causal language) inappropriate. That means that we can't be sure if social media is leading to the (slight) increase in depressive symptoms, or if people who have more depressive symptoms end up using more social media, or if there's some third variable responsible for both social media use and depressive symptoms. As the Wired article states, 

...research on the link between technology and wellbeing, attention, and addiction finds itself in need of similar initiatives. They need randomized controlled trials, to establish stronger correlations between the architecture of our interfaces and their impacts; and funding for long-term, rigorously performed research.

Finally, the Wired article quotes (the seemingly skeptical) Pyrbylski as saying, 

"Don't get me wrong, I'm concerned about the effects of technology. That's why I spend so much of my time trying to do the science well," Przybylski says.

Good science is the best way to answer our questions. 


Page 3

Which of the following must be present in order to make a causal claim between two variables?

The sun sets in Amarillo, TX an hour later than it does in Huntsville, AL though they are on the same time zone. Amarillo residents get less sleep and earn more money: Is there a causal connection? Photo: Creativeedits/Wikimedia Common


Sleep is an essential human function and getting more sleep is associated with improved mood, cognitive performance, and physical performance. Therefore, it might make sense that sleep would improve people's productivity and ability to earn money. That's the topic of a Freakonomics episode on the "Economics of Sleep." You can read the transcript or listen to the 45 minute episode here. (The section I focus on starts around minute 10.)   

Freakonomics' hosts interviewed a set of economists (including Matthew Gibson, Jeff Shrader, Dan Hamermesh, and Jeff Biddle) about their research on sleep, work hours, and income. The economists mentioned that, in order to establish a causal link between sleep and income:

What we need is something like an experiment for sleep. Almost as though we go out in the United States and force people to sleep different amounts and then watch what the outcome is on their wages.

While it is theoretically possible to conduct such an experiment, it is practically difficult to assign people to different sleep conditions for a long enough period of time to notice an impact on their wages. So the economists took an alternative path and used quasi-experimental data. In a creative twist, they compared wages at two ends of a single American time zone. The example they gave is Huntsville, AL and Amarillo, TX. Here's why. Gibson stated: 

It turns out that ever since we’ve put time zones into place, we’ve basically been running just that sort of giant experiment on everyone in America.

The story continued. You'll see the transcript version quoted below:

Consider two places like Huntsville, Alabama — which is near the eastern edge of the Central Time Zone — and Amarillo, Texas, near the western edge of the Central zone. [...]

...even though Amarillo and Huntsville share a time zone, the sun sets about an hour later in Amarillo, according to the clock, and since the two cities are at roughly the same latitude as well, they get roughly the same amount of daylight too.

So you’ve got two cities on either end of a time zone, roughly the same size — just under 200,000 people each — where, according to the clock time, sunset is an hour apart. Now, what good is that to a pair of economists interested in sleep research?

GIBSON: It turns out that the human body, our sleep cycle responds more strongly to the sun than it does to the clock. People who live in Huntsville and experience this earlier sunset go to bed earlier.

And the people of Amarillo go to bed quite a bit later. You can see this in data from the American Time Use Survey.

GIBSON: If we plot the average bedtime for people as a function of how far east they are within a time zone, we see this very nice, clean nice straight line with earlier bedtime for people at the more eastern location.

But since Huntsville and Amarillo are in the same time zone, people start work at roughly the same time, which means alarm clocks go off at roughly the same time.

GIBSON: That means if you go to bed earlier in Huntsville, you sleep longer.

The economists didn't use only Huntsville and Amarillo--they also conducted multiple comparisons of cities around the U.S. that were similarly on each end of a single time zone. Using "city of residence" as their quasi-experimental operationalization of "amount of sleep", the economists were ready to report the results for wages: 

So now Gibson and Shrader plugged in wage data for Huntsville vs. Amarillo and other pairs of cities that had a similar sleep gap.

GIBSON: We find that permanently increasing sleep by an hour per week for everybody in a city, increases the wages in that location by about 4.5 percent.

Four and a half percent — that’s a pretty good payout for just one extra hour of sleep per week. If you get an extra hour per night, Gibson and Shrader discovered — here, let me quote you their paper: “Our main result is that sleeping one extra hour per night on average increases wages by 16%, highlighting the importance of restedness to human productivity.”

Questions:

a) What is the independent variable in this time zone and wages study? What is the dependent variable?

b) Is the IV independent groups or within groups?

c) Which of the four quasi-experimental designs is this? Non-equivalent control group posttest only, Non-equivalent control group pretest-posttest, Interrupted time series, or Non-equivalent control group interrupted time series? 

d) The economists asserted, "sleeping one extra hour per night on average increases wages by 16%" (italics added). What do you think? Can their study support this claim? Apply the three causal rules, especially taking note of internal validity issues that this study might have. 

e) If you consider only one pair of cities, there are multiple alternative explanations, besides sleep, that can account for wage differences. Name two or three such threats (considering Huntsville and Amarillo as an example). Now consider, how might many of these internal validity threats be reduced by conducting the same analysis over many other city pairs?  

f) This Freakonomics episode was aired in 2015, but the study (about time zones) they reviewed is not yet published. What do you think about that? 

Answers to selected questions

a) The IV is "Hours of sleep" (but you could also call it "location on the time zone: East or West") and the DV is "Wages".

b) The IV is independent-groups.

c) Non-equivalent control group posttest only.

d & e) The results of the study support covariance: People in cities in the Eastern portion of time zones get more sleep and have higher wages than people in the Western portions. Temporal precedence is unclear, I think: Because the data were collected at the same time, it's not clear if the timezone came first, leading to more sleep and higher wages, or if people began to earn higher wages first, and then systematically moved Eastward. (However, the second direction certainly seems less plausible than the first.)

As for internal validity, if we consider only the city pair of Huntsville and Amarillo, we could come up with several alternative explanations. The two cities have different historical trajectories and different ethnic diversities; they are in two different states that have different fiscal policies and industry bases. Perhaps Amarillo has poorer wages in general and people are losing out on sleep there because they are working more than one job. However, these internal validity threats become less of an issue when you consider multiple pairs of cities. It is less plausible that internal validity threats that apply to one city pair would also, coincidentally, apply to all the other city pairs that are at opposite ends of a time zone. 

Even though the method is fairly strong, psychologists would be unlikely to make a strong causal claim simply from quasi-experimental data like these, because the independent variable is not truly manipulated. Nevertheless, the method and results of this quasi-experiment are certainly consistent with the argument that getting more sleep may be a factor in earning higher wages. 


Page 4

Which of the following must be present in order to make a causal claim between two variables?

Do surgeons perform better, or worse, after taking a day off? Photo: Altrendo Images / Getty Images.

There's a great example of a quasi-experiment in the news. The research question concerned the skills of surgeons.  As the story points out, surgeons are similar to musicians, in that their manual dexterity skills develop with constant use and practice.  The study asked, Would surgeons lose their skills at surgery after a few days off? 

This summary of the research was reported by NPR's Shankar Vedantam.  The research was conducted by Lorens Helmchen and Jason Hockenberry of George Mason University. They analyzed hospital records of about 56,000 surgery patients. As Vedantam explains:

...the researchers compared the outcomes of patients in two different groups. In the first group, the patient's surgeon had performed surgery on other patients the previous day. In the second group, the patient's surgeon had not performed surgery the previous day.

The previous day's lack of surgery might have been a weekend day, a vacation day, or even an office visit day with no surgeries.

Now there are three possible outcomes. One, that the surgeons are so skillful that the break makes no difference whatsoever. The second possibility -which is what I would have picked - is that the surgeons are actually going to be better when they come back from vacation because they're going to be refreshed.

...unfortunately, [the data supported] the third hypothesis.... The outcomes were worst when the doctors had not practiced surgery the previous day.

a) Which of the four quasi-experimental designs from Ch 13 does this study appear to follow?  The non-equivalent control group post-test only design?  The non-equivalent control group prettest-posttest design? The interrupted time-series design ? Or the non-equivalent control group interrupted time-series design?

b) Sketch a graph of this outcome.

The story on NPR has some additional information about the study:

VEDANTAM: I want to be very clear, the difference is very small. In fact, it was so small that an individual doctor or even an individual hospital probably would not notice the difference. And I asked Helmchen to describe the size of the effect that he found.

HELMCHEN: If you take 100,000 heart bypass surgery patients, of those about 2700 die before they just charged from the hospital. Our study suggests that every additional day that the surgeon was away from the operating room increases that number by an additional 70 patients.

(NPR host INSKEEP's response): So if they had a day off, there's a tiny difference. If they had a couple days off, there's a little bigger difference. If they had a two-week vacation or a month-long sabbatical, there's a big difference.

c) What kind of validity is being discussed in the segment above? Does this conversation change how you drew your sketch in part b?

Finally, the NPR story discussed a number of explanations for these findings.  The first two explanations appear to be mediators. That is, they are reasons why vacation days might lead to worse outcomes for patients.  Here they are:

It could ...be that the surgeons are doing fine during surgery, but they're missing potential complications. Helmchen and Hockenberry find that when surgeons come back from a break hospital costs go down. So it could be surgeons are ordering fewer tests or not thinking about very rare risks. There's another possibility, which is it might have to do with the team surrounding the surgeon and on the first day back, the team is still sort of getting its act together or getting its edge together and they're not quite as good as they were when they've had several days of practice.

d) Sketch one or both of these mediator explanations, following the models in Figure 9.13 (p. 260).

The other explanation that the story mentioned for this finding is that there was a third variable problem, or an internal validity problem:

There's a final explanation, and this is completely innocuous, which is the hospitals are lining up the sickest patients when the surgeons come back first, but because they're so sick they're more likely to die.

 e) What is the third variable here?  Can you sketch it according to the models for third variable problem in Figure 9.13?

f) You knew it was coming: The headline of this story reads, "Study: Time away can hurt surgeons' job performance." The verb, "Hurt" is a causal one. Can the study really support NPR's causal-claim headline? 

Suggested answers:

a) I think that the best option here is to call this quasi-experiment a non-equivalent control groups, posttest only design. Patients are non-randomly assigned to surgeons who have either just had a vacation day, or who have not.

b) Given the design, a simple bar graph might suffice here. The x-axis would have "surgeons who just had a vacation day" and "surgeons who did surgery the day before." The y-axis would have "Quality of patient outcomes."  The bar for the "surgeons who just had a vacation day" would be lower in height.

c) This section is talking about the effect size of the result, which is part of statistical validity. This means that the two bars you drew in the graph should be different, but not greatly. 

By the way, this might be an example of a small effect size that, nonetheless, has large practical implications. The researcher himself mentioned that the vacation day effect adds 70 deaths to the usual number of 2700 deaths out of 100,000 heart bypass patients.  That's a small effect, but a large number of lives.

d) You could sketch the first mediator explanation like this:

Surgeon vacation day ---> surgeon not thinking about rare risks and not ordering tests --> worse patient outcomes

You could sketch the second mediator explanation like this:

Surgeon vacation day --> surgical team re-learning to work together --> worse patient outcomes

e) You could sketch this internal validity, or third-variable problem, like so:

Sicker patients ---> worse outcomes
and Sicker patients ---> surgeons returning after a break

f) Of course, we can't determine causation from a quasi-experiment.

There is covariance (surgeons returning after a day off of surgery had worse patient outcomes).  And there is temporal precedence (the day off came before the patient outcome).
But as noted in question e), patients were not randomly assigned to surgeon days. Therefore, we can't rule out third variables such as patient health in this association. The study does not meet the internal validity criterion. NPR should have used the wimpier association headline: "After a surgeon's day off, surgery patients' outcomes are worse."


Page 5

Which of the following must be present in order to make a causal claim between two variables?
An NPR story carried the headline, "Study: Commuting Adversely Affects Political Engagement." The verb in the headline, "affects" makes this a causal claim. Is there evidence in the story that commuting causes changes in political engagement?

In an interview, researcher Joshua Johnson stated the relationship in causal terms, too:

We found that when people spent more time commuting [or spending] extra hours on their commute, that made them less likely to be engaged in politics.

a) Think for a moment: What are the variables in this research? Would they most likely be measured or manipulated?

b) Sketch a scatterplot that would show the relationshiop between commuting time and political engagement.

c) Can the researchers really support a causal claim from these data, at least as decribed here?

Shankar Vedantam, the NPR host who described the study, explained:

There's something about commuting in particular that seems to affect engagement, and the researchers are drawing here on this paper on earlier work by the behavioral economist Daniel Kahneman. He's found that commuting ranks among the most unpleasant parts of people's day. There's something uniquely stressful about commuting, and so when you get home after a hellacious day, you really have nothing to give to other people in terms of civic engagement, in terms of getting involved in your neighborhood politics.

Here's a reminder to my readers--just because you can offer a good explanation for a correlation, that doesn't change the fact that a correlational study cannot definitively support causation!

But back to the story. The correlational pattern gets more complex--it involves a moderator! I'll quote  the transcript of the NPR report:

INSKEEP: Are all people affected the same way by this stress?

VEDANTAM: So this is a really important point, Steve, because it turns out that even though there's a general connection between political engagement and commuting, the effect is not experienced evenly by everyone. Commuting disproportionately seems to cause the poor to disengage from politics. And as we go up the income ladder, the effects that commuting have on political engagement actually decrease....

Commuting is stressful for everyone, but the poor find it harder to buffer themselves against the effects of the stress. When you're well off, you come home from a terrible day, you can go out for dinner. You can buy yourself a treat. When you're poor, you have less access to those kinds of safety nets.

d) In this example, we can say that social class moderates the relationship between commuting and political engagement. Sketch a moderator table that would capture this pattern.

Suggested Answers:

a) Think for a moment: What are the variables in this research? Would they most likely be measured or manipulated?

The two variables are time spent commuting and degree of political engagement. These are both measured variables, because it would be practically and ethically impossible to assign people to have a long commute.

b) Sketch a scatterplot that would show the relationshiop between commuting time and political engagement.

Your scatterplot should have Time spent commuting on one axis, and Degree of political engagement on the other. The dots should be spread in a negatively sloping pattern, because the more time people spend commuting, the less politically engaged people are.

c) Can the researchers really support a causal claim from these data, at least as decribed here?

No. There is covariance here: time spent commuting goes with less political engagement. Temporal precedence seems ambiguous: Which variable came first? Did commuting come first, followed by less engagement? Or do less politically engaged people prefer to work farther from the center of things? Finally, internal validity isn't clear--there may be third variables that explain this relationship. Perhaps younger people tend to commute further, and they are also less politically engaged. Perhaps less educated people commute further and are also less engaged. We would want to see Johnson's original research report--he probably statistically controlled for several such variables. His report would tell us which of these possible third variables were statistically controlled for, to help rule out third variable explanations.

d) In this example, we can say that social class moderates the relationship between commuting and political engagement. Sketch a moderator table that would capture this moderation pattern.

Here's one possible pattern (These data are fabricated, but I made them to fit the pattern described):

Social class level        Relationship between commuting time and engagement

Lower SES                    -0.21*

Middle SES                 0.03

Note: Initially, to make the moderator easier to understand, I oversimplified the quote.  But in fact, the moderation pattern was a little more complex:

Commuting disproportionately seems to cause the poor to disengage from politics. And as we go up the income ladder, the effects that commuting have on political engagement actually decrease until we get to the very wealthy, where the longer your commute, the more likely you are to be politically engaged.

To represent the full pattern, you might show this (again, I fabricated the data to illustrate):

Social class level        Relationship between commuting time and engagement

Lower SES                    -0.21*

Middle SES                 0.03

Highest SES                0.17*


Page 6

Recently, researchers tested how makeup affects people's impressions of women. Here is how the story was covered by ABC News--the story includes a video.

In the study, a set of female targets posed for photos in which they were wearing no makeup, some makeup, professional makeup, or full, "glamour" makeup. Then the photos were rated by a large number of observers. You can read about the details of the study on the ABC News website.

The results showed that for the most part, women wearing makeup were rated more positively on all four ratings. Here are some quotes from the lead researcher:

"We found that when faces were shown very quickly, all ratings went up with cosmetics in all different looks," said Nancy Etcoff.... "The women were judged as more competent, likable, attractive and trustworthy."

When the photos were shown more slowly, the results were a little bit different:

"When they got to the more dramatic makeup looks, people saw them as equally likable and much more attractive and competent, but less trustworthy," Etcoff said. "Dramatic makeup was no longer an advantage compared to when people saw the photos very quickly."

Here are some questions about the study:

a.) Consider only the study in which participants viewed the photos for 250 ms. What were the independent and dependent variables in this study? Was the independent variable between-subjects or within subjects?

b.) Sketch a graph of the results of the 250ms study. What is the best way to graph these results?

c.) Is this study an experiment? Does it support the causal claim in the headline, "Makeup makes women seem more competent"?

d. ) Why did the authors conduct the study under two conditions--250 ms of judgment as well as a slower one? (as you reread the article, look for the part about "cognitive pressure".) What do these conditions suggest about the generalizability and real-world applicability of their findings?

Suggested answers

a.) The independent variable was the degree of makeup that appeared on each target. It had four levels. According to the description in the web story, the independent variable was probably within-subjects--participants viewed all four levels of makeup (none, some, professional, and "glamour"). The dependent variables were the outsiders' ratings of likability, competence, attractiveness, and trustworthiness.

b.) You would probably make a bar graph with "level of makeup" on the x-axis and "rating" on the y-axis. You might have presented all four DV's on the same graph, labeling four different colored bars or lines that represent likability, competence, attractiveness, and trustworthiness.  (To see the actual results of the study, of course, you'd need to look at the original research.)

Which of the following must be present in order to make a causal claim between two variables?
c.) This study is an experiment--it manipulated the independent variable (makeup level) and measured the dependent variables. Because it was an experiment, the study meets the temporal precedence rule. The results of the study also show covariance--women wearing makeup were judged more favorably. What about internal validity? By using each target as her own control (and only varying how much makeup each was wearing), it helps ensure that makeup use was not confounded with the attractiveness of the women. That shows good internal validity. In addition, the researchers did not let the women in the photos know how much makeup they were wearing, preventing the target women from inadvertently posing more confident facial expressions. Because the experiment appears to be well done (and meets all three causal rules) we can conclude that makeup can cause women to appear more competent. 

d.) The researchers seem to be using the 250ms version of the study as a way of mimicking real-world situations in which people make quick decisions or when they are distracted. The basic pattern of results replicated across two studies (with the exception of the trustworthiness ratings of women in the "glamour" condition), supporting the study's replicability and external validity. In addition, by presenting 25 diverse women as targets in this study, the researchers are able to say that the makeup effect can probably generalize to a variety of female targets. 


Page 7

b) What seem to be the two variables in the association and causal claims above? 

The journalists summarizes the study this way: 

The study, which drew on 163 responses from staff at five medical institutes in Sydney, also delved into the reality of working at home for researchers. [...]

Some 28 percent of scientists said they wore pajamas at least once a week -- a cohort who were twice as likely to report worsened levels of mental health than those who dressed normally each day, according the study, by David Chapman and Cindy Thamrin [...]

Given the survey format of this study, we can assume that all the variables in this study were self-report. To learn more about this relatively simple study, you can visit the open-access original journal article here, and you can see the full, original text of the survey here. 

Let's work through the four big validities. 

c) Construct validity: Look through the original survey (here) and find the variables that measured pajama-wearing and mental health assessment. What do you think--how well do these variables seem to be measured? (Guess what? You're assessing face validity.) 

d) Construct validity: When you view the original text of the survey (still here)  you might be a bit surprised by the lighthearted tone of some the survey questions. For example, they ask about the "typical home working environment" they included the option, "hiding in the bathroom." When they asked people what they wear during remote meetings, one of the options was, "none of your business, camera turned off."  How might this casual tone affect the construct validity of the variables being measured? Do you think it will it affect the accuracy of self-reports?

e) External validity: The journalist refers to "scientists", so that seems to be the population of interest. The sample was described this way by the journalist:

The study, which drew on 163 responses from staff at five medical institutes in Sydney...

In the empirical article, there is more detail about the sample: 

An invitation to participate was emailed to all staff, students and affiliates of the Woolcock Institute of Medical Research, Sydney, and later extended to other medical research institutes in Sydney (Garvan Institute, Children’s Medical Research Institute, Centenary Institute, Brain and Mind Centre).

Can this study generalize from this sample to the population of interest? Why or why not? (Don't be tempted to focus on the size of the sample here. Remember that it's not the size that matters for external validity)

f) Statistical validity: Here is the effect size of the relationship between wearing pajamas and mental health: 

[people who wore pajamas at least once per week] were twice as likely to report worsened levels of mental health than those who dressed normally each day

In the empirical article, the rates of poorer mental health were described as 59% (for pajama-wearers) vs. 26% (for non-wearers). What do you think of the strength of this effect? What more information would you like that is relevant to statistical validity?

g)  Internal validity. Let's evaluate whether the study can support the causal claim, "wearing pajamas put the scientists at risk for poorer mental health?" In order to support a causal claim, we need to conduct an experiment. Was this an experiment or a correlational study?  Explain your answer. 

h) Internal validity: Now let's apply the three criteria for causation. 

The study does show covariance, because people who report working in pajamas did have twice the rate of mental health decline, compared to those who did not.

What about temporal precedence? Does the method establish which variable came first in time? 

What about internal validity? Can you think of a third variable (some "C" variable) that might be associated with both wearing pajamas and having worse mental health?