Web Box 9.4 The Cutting Edge: Does Midbrain Dopaminergic Cell Firing Encode Reward-Prediction Error?

The hypothesis of Schultz (2010) was derived from changes in the firing of midbrain DA neurons measured in monkeys in several different experimental paradigms. Under baseline conditions, these cells fired intermittently at a low rate, which is depicted by the straight horizontal lines in Figure 1. Brief (about 100 ms long) bursts of firing were observed in response to a novel sensory stimulus (top panel) or an unsignaled (unexpected) reward such as a food treat (second panel). Interestingly, if the reward was paired in a classical conditioning paradigm with a conditioned stimulus (CS) so that the CS reliably predicted the reward, DA cell firing came to be elicited by the CS rather than the reward itself (third panel). Finally, if the conditioned monkeys were presented with the CS but then no reward followed, there was actually a depression in DA cell firing (bottom panel). The results prompted Schultz to conclude that these neurons signal the difference between prediction and actual occurrence of rewards (i.e., reward-prediction error). An unexpected reward leads to increased firing, a predicted reward causes no change, and failure of a reward to occur after it has been predicted leads to a brief depression in cellular activity. This reward-related role of DA has been linked to rapid and short-lived bursts of cell firing that cause large but transient increases in synaptic DA.

A graphical representation of the firing pattern of midbrain dopaminergic neurons. The graph represents time in 100 milliseconds intervals along the x-axis and firing time along the y-axis. The novel stimulus produces an increased firing rate after about 800 milliseconds. When the reward is given at 700 milliseconds, an increased firing rate is found around 800 milliseconds, with no C S. When the C S is given at 200 milliseconds, increasing firing occurs at around 300 milliseconds, and the reward is given around 700 milliseconds. When the C S is given at 200 milliseconds, increased firing occurs at around 300 milliseconds, and if no rewards are given at 700 milliseconds, decreased firing is seen around 800 milliseconds.

Later studies by Schultz and other researchers extended these initial findings in several ways. First, strong evidence has been obtained indicating that DA reward-prediction error encoding occurs not only in monkeys but in other species, most notably humans (Diederen and Fletcher, 2021). Second, the previously mentioned increase in DA cell firing in response to a novel sensory stimulus has been separated temporally (i.e., with respect to time) from the reward-related response (Schultz, 2016). In this way, there is a dual-component nature to dopaminergic signaling. The first component is thought to encode the physical salience of a novel stimulus (e.g., the strength of a light or tone). The stimulus in this case does not have to be related to a reward at all, and therefore the salience being described is not the same as incentive salience proposed by the incentive sensitization theory. The second component, which occurs slightly later than the first, is the reward-prediction error encoding. This property of DA signaling identifies a key role for the dopaminergic system in reward learning (Diederen and Fletcher, 2021). Finally, Schultz (2016) has proposed that the large increases in synaptic DA produced by addictive drugs might engage both the salience encoding function and the reward-prediction error encoding function of the dopaminergic system, thereby making the act of consuming the drug both highly salient and mimicking the effect of receiving an unexpected large reward.

There is no simple way to reconcile the incentive salience theory of DA action with the reward-prediction error theory. Both theories are framed by the kinds of experiments used to test them, and both receive support from those experiments. What is clear is that the dopaminergic neurons from which originate the mesolimbic, mesocortical, and nigrostriatal pathways have multiple, important roles in behavioral regulation, extending to the control of motivation, learning, and action selection. Since all of those functions are relevant to addiction, we can readily see how significant this neurotransmitter is for the development and maintenance of this maladaptive pattern of behavior.

References

Diederen, K. M. J., and Fletcher, P. C. (2021). Dopamine, prediction error and beyond. Neuroscientist, 27, 30–46.

Schultz, W. (2010). Dopamine signals for reward value and risk: Basic and recent data. Behav. Brain Funct., 6:24. doi: 10.1186/1744-9081-6-24.

Schultz, W. (2016). Dopamine reward prediction error coding. Dialogues Clin. Neurosci., 18, 23–32.