If You Pay a Mouse To Eat a Cookie, Will He Like It More or Less?

Researchers Constança Esteves-Sorenson and Robert Broce reviewed more than 100 tests — and ran one of their own — to find out if pay harms performance on enjoyable tasks. Here’s what they learned.
By: Brigitte C. Madrian

The beloved children’s storybook “If You Give a Mouse a Cookie” details the long sequence of follow-on events that happen if, as the title suggests, you give a mouse a cookie to eat. It does not, understandably, address whether if you pay that mouse to eat a cookie, he will like it more or less. Traditional economic models posit that paying the mouse to eat a cookie will increase the reward from cookie eating, encouraging the mouse to eat more cookies.

But psychologists offer a compelling counterargument: that paying the mouse to eat cookies will crowd out the mouse’s intrinsic pleasure from such consumption, reducing the likelihood that the mouse will want to eat cookies in the future absent compensation.

Brigitte C. Madrian is a former editor of The Review of Economics and Statistics.

Although “If You Give a Mouse a Cookie” doesn’t answer this question, there is a large academic literature that does, or at least that attempts to do so. The evidence for the idea that incentives may crowd out intrinsic motivation comes primarily from a canonical experiment in which participants are recruited to engage in a task that they enjoy, such as solving puzzles.

Researchers pay some participants to perform this enjoyable task, say $1 per puzzle solved, but at some point remove this compensation. They then measure the impact of being paid on outcomes such as the time participants spend on the task (e.g., minutes spent solving puzzles), their output (e.g., the number of puzzles solved), or their productivity (e.g., minutes spent per puzzle). A decline in any of these measures when compensation is removed relative to never being compensated in the first place is taken as evidence that pay harms performance by crowding out intrinsic interest in the task.

Although many variants of the experiment described above have been run by researchers, the answer to the question of whether incentives crowd out intrinsic motivation is not at all clear. Some systematic reviews conclude that the evidence supports this notion, whereas others conclude it does not.

In a compelling new study, Constança Esteves-Sorenson and Robert Broce point out deficiencies in the existing literature and propose an explanation to reconcile the contradictory findings, before running a new experiment designed to more definitively address the question at hand. The enjoyable task in their new experiment? What else — eating cookies!

They start by investigating whether the conflicting evidence in the existing literature could stem from the outcomes studied. The three main performance metrics — time spent on the task, output (quantity produced), and productivity (the ratio of time over output) — are interrelated and can yield contradictory results. Suppose that participants in the canonical experiment spend less time on the task, but their output remains unchanged because they are more productive. If researchers focus only on time spent on the task, they will conclude that pay harms performance. But if they focus instead on output or productivity, they will conclude the opposite.

In reviewing more than 100 tests on the effects of pay on the performance of enjoyable tasks, Esteves-Sorenson and Broce find that not a single one jointly reports results for all three potentially contradictory metrics. They focus either on time spent on the task, or on output, or on productivity, but not all three. Thus, the conflicting evidence could stem from the choice of metrics. They also find that the typical sample size in these tests is quite small — only 15 participants — further muddling the evidence as small samples generate results that are often too imprecise to allow researchers to appropriately detect true effects.

“… the harmful effects of pay on interesting activities are not as easily produced as suggested by prior research.”

Esteves-Sorenson and Broce then replicate the canonical experiment, although with a novel new enjoyable task — they recruit subjects to participate in a cookie taste-testing activity. The upshot: Paying people does not reduce performance in the enjoyable activity of eating cookies. However, the unexpected withdrawal of pay leads people to retaliate by reducing the quality of their cookie evaluations.

In our exchange below, Constança Esteves-Sorenson explains what motivated the reassessment of the existing literature on whether compensation crowds out intrinsic motivation, and how a better-conceived experiment allows for a more definitive answer to this question.

Brigitte C. Madrian: Before we get to the substance of your study, I need to first ask the obvious question: which cookie came out on top in your taste tests?

Constança Esteves-Sorenson: Bisconova Classici Ladyfingers came out on top, but a very close second was Bahlsen Waffeletten Dark Wafer Rolls.

BCM: Other than having scientific license for ready access to lots of cookies, what motivated you to do this study? What do we learn from it?

CE-S: We found intriguing the idea that paying people to perform activities they enjoy could lead to lower productivity. Although most jobs entail a mix of enjoyable and unenjoyable tasks, it seems desirable for workers to largely enjoy their jobs. Not only are they happier but they should also be more productive. But research on the crowding out of enjoyment by monetary incentives suggests that pay would harm performance in these seemingly desirable situations.

Our review of existing studies of this phenomenon indicates that the evidence for this idea depends on the outcomes studied, and also suffers from small samples. Further, the results of our test, which mirrors the canonical experiment, revealed no evidence that pay undermines performance. This suggests that the harmful effects of pay on interesting activities are not as easily produced as suggested by prior research, even when using the canonical test. So it is possible that these effects, if they exist, are rarer than previously thought.

We hope our study spurs the use of a comprehensive set of metrics in future investigations of the effect of rewards on interesting activities, even if these yield contradictory evidence, and also the use of larger samples.

BCM: There has been a push to replicate experiments in psychology and in other social sciences. How does your study fit into this trend?

CE-S: Open Science Collaboration, a consortium of scientists, has replicated several published studies in psychology, but using larger samples. They found that for every 100 published results, only 36 held when replicated with larger samples. In other words, most results reported in published research seem to disappear in larger samples. This consortium also found that the magnitude of published results appeared inflated, as replications with larger samples led to smaller effect sizes.

Open Science Collaboration found that for every 100 published results in the field of psychology, only 36 held when replicated with larger samples.

The fact that many published results do not replicate with larger samples could stem from “publication bias.” Often samples are too small to disentangle whether an experimental result is a signal of a true phenomenon or whether it is noise. So, if by chance, a small sample shows a large result, researchers may conclude that the signal is strong and, therefore, that there is an effect (even though the result was due to random chance or noise). These results are nonetheless published because scientists “found an effect.”

But replications with larger samples, which are less noisy, allow for a more precise assessment of the signal relative to the noise. Thus, they often reveal that the effect appeared not to have existed in the first place: it arose by chance in the smaller sample. Therefore, research with small samples has been challenged for not being very robust.

This is very relevant for our study, which documents that the typical sample size in tests of the effect of pay on the performance of enjoyable tasks is small, at 15 people. So there is concern about the robustness of these results.

BCM: Beyond small samples, have social scientists looked at other issues that might lead to conflicting conclusions?

CE-S: Beyond small samples, researchers have also noted that the choice of outcome can also lead to contradictory results. Uri Simonsohn, Joseph Simmons and co-authors have a series of articles in psychology discussing how experiments can yield different outcomes and how the choice of outcome by researchers influences whether they find an effect for the phenomenon they are investigating.

This is also very pertinent for our study. Our review of more than 100 tests documents that none jointly reported results for three conflicting metrics: output, productivity, and time spent in the task. This suggests that the choice of outcome could have influenced whether researchers found evidence for the harmful effects of pay.

BCM: You mention above that there is a “canonical experiment” for testing the effect of pay on enjoyable tasks. What is this canonical experiment and why is it so used?

CE-S: The canonical experiment starts by picking an interesting task for participants, such as solving puzzles, writing headlines, or taking pictures. Researchers need an enjoyable task because if the task is not enjoyable, then participants will not have any “intrinsic interest” for monetary rewards to undermine, and as a result, pay will not harm performance. So it is crucial that subjects enjoy the task in order for pay to be harmful.

Researchers have often assessed whether subjects like the activity with a questionnaire, for example. Then, participants are randomly assigned to one of two groups to work on the task for two sessions. One group gets an unexpected payment in the first session. For example, at the beginning of the first session they are informed that they will receive $1 per puzzle completed. But, at the beginning of the second session, they are told they will no longer be paid.

The second group also solves puzzles across the two sessions, but are never paid to do so. This group serves as the “control” group. Researchers use it to estimate the baseline performance on the enjoyable task, without the potential contamination of financial rewards on intrinsic interest.

Scientists use this two-session design because in the first session for the paid group it can be hard to disentangle whether pay harmed performance. The idea is that monetary incentives motivate people to produce more but also push them to produce less due to pay crowding out enjoyment. And the motivating effect of pay may win out.

Hence, the harmful effects of pay will only become noticeable in the second session, when pay is withdrawn. Participants no longer receiving the $1 per puzzle reward in the second session are therefore expected to perform worse in this session than those in the control group because pay in the first session crowded out their interest in solving puzzles in the second session.

This type of experiment has become the canonical test because it is considered the most compelling way to detect whether pay is harmful. Researchers feel they need, at a minimum, an interesting task, two sessions, the introduction and withdrawal of pay, and something akin to a control group to assess what would have occurred if monetary rewards had not been introduced in the first place. The canonical design delivers on all these aspects.

BCM: You note that your experiment builds on the canonical test. What was your experiment and what did you find?

CE-S: Our experiment builds on the canonical test, though we improved on it. Our experiment has an enjoyable task, two sessions of work, a group that is paid in the first session but not in the second, and a control group that is never paid. But for our enjoyable activity, we had participants taste and evaluate cookies, a simple market research task. Participants volunteered to taste and rate cookies for no pay, just more cookies at the end. They did not know they were part of an academic study, so we had a fairly natural setting.

The self-selection into the cookie-tasting ensured interest in the task. We did not use a questionnaire to assess people’s interest, as this had led to debate in prior studies: Does a rating of 5 on enjoyment mean that the person enjoys the task whereas a rating of 4 does not? So we instead used participants’ decision to volunteer for the cookie tasting to infer whether they found the task enjoyable. If they did not enjoy tasting cookies, they would not volunteer to do so.

Then we randomly assigned some participants to the typical paid group. Participants in this group learned, at the beginning of the first session, that they would receive 75 cents per cookie tasted and evaluated. And, at the beginning of the second session, they got the news that they would no longer be paid. We also randomly assigned some participants to a control group who were not paid in either session.

We measured and reported on the three potentially conflicting metrics: output, productivity and time spent of the task. We prioritized the analyses of output and productivity because they are stronger indicators of performance. Time spent on the task is a weaker measure of performance if not paired with output or productivity. For example, if time spent in the task declines, but output does not and, as result, productivity increases, then it is unclear that pay is harmful.

Paid participants tasted and rated 62 percent more cookies than unpaid participants in the control group and tasted them 29 percent faster.

We found that the payment of 75 cents per cookie boosted output and productivity by a lot in the first session: Paid participants tasted and rated 62 percent more cookies than unpaid participants in the control group and tasted them 29 percent faster.

But when pay was withdrawn in the second session participants did not produce less than those in the control group, nor were they less productive, as posited by the crowding out of enjoyment idea. In fact, they tasted and rated slightly more cookies and did so at a faster rate. And although they spent less time on the task, this effect was small and statistically insignificant.

Interestingly, though participants tasted and evaluated cookies at a faster rate despite the withdrawal of pay, it seems that they did so at the expense of the quality of the evaluations. We find suggestive evidence that the evaluations they supplied to the market research team were sloppier than those in the control group. This pattern is consistent with prior research indicating that when pay expectations are not met people lose morale or retaliate.

Finally, to assess whether pay harms performance in enjoyable tasks in environments in which there are no surprise introductions and removals of pay, we also randomly assigned some participants to a third “no-surprises” condition. In this condition, after being recruited for the cookie tasting, participants received the news that they would be paid in the first session but not in the second, so the introduction and withdrawal of pay was not a surprise. We found, again, that pay did not harm performance on our enjoyable task.

Brigitte C. Madrian is the Dean of the Brigham Young University Marriott School of Business and a former editor of the Review of Economics and Statistics.

Constança Esteves-Sorenson is an Adjunct Professor at the UCLA Anderson School of Management.

Posted on