Timing is everything: How your brain learns to predict rewards
It is always nice to be rewarded for what you do. Rewards cause a burst of dopamine to be released in certain circuits in the brain, which makes us feel good. This process motivates us to behave in certain ways, to pursue the ‘rewards’ that keep us alive, such as food, for example.
More than 100 years ago, Ivan Pavlov demonstrated that our brains could be trained to respond to a certain stimulus that consistently preceded the reward. He conditioned dogs to associate a stimulus such as the ringing of a bell with the imminent arrival of food. Dogs that used to salivate when their food arrived now started salivating as soon as the bell rang instead.
More recent research has shown that a conditioned stimulus (the bell, in the case of Pavlov’s dogs) causes the same release of dopamine that the reward alone (the dog food) would have done, but the reward – if it comes as expected – no longer causes a dopamine boost. The brain has learned to recognise the signal that a reward is on its way almost as if the signal is a reward in its own right.
But despite the fact that the brain no longer responds to the reward in these circumstances, it does not ignore the reward. If the reward comes earlier than expected, it will trigger dopamine release. If the reward does not arrive at the predicted time, the brain reduces the level of dopamine in the relevant pathways – a negative response.
Clearly the brain has not only associated the stimulus with the reward, but has also identified how long it has to wait for the reward; it then monitors whether or not the reward arrives on time. How does it do that? And what happens if the time between stimulus and reward is variable?
Such questions were the focus of research published last week in the journal Neuron. In a study funded by the Wellcome Trust and the Medical Research Council, 28 people played a game while in an MRI scanner. Each player would see a symbol which would be followed a few seconds later by a resulting ‘prize’ of either 40p or nothing. There were three symbols in the game: one was always followed by a 40p win, one always by 0p, and the third by either 40p or nothing with a 50:50 chance.
The colour of the symbol indicated whether the reward would follow after a set time of 6 seconds, or after a random time between 3 and 10 seconds. Each player played 224 rounds: in about one round in seven, the result (0p or 40p) was not shown. Instead, the player had to push a button at the point they thought it would have appeared had this been a normal round. This was called the ‘bucket test’ and it was the only way players could influence the game – if they got the timing right and held out their ‘bucket’ at the right moment, they would increase their winnings (the most anyone won overall was a whopping £26).
While each participant was playing the game, the scientists studied blood flow in two specific areas of the brain closely associated with learning. Their analysis showed that the pattern of activity in a part of the brain called the ventral tegmental area was consistent with a model of how the brain might track both the expected size of reward and the probability of it coming at any particular time after seeing the symbol.
A different part of the brain, the ventral striatum, had a more flexible role in the game. The ventral striatum has the same capacity to track the size and timing of reward as the ventral tegmental area and, indeed, some previous research had suggested that both follow the same pattern. This was a plausible hypothesis because the ventral striatum and ventral tegmental area are strongly linked, so researchers had thought activity in one was simply causing the same pattern to appear in the other.
In this new research, however, the players were not just interested in the reward they got on each round, they were also trying to follow the timings used in the game so as to have a better chance of bumping up their winnings in the ‘bucket test’. In this game, the ventral striatum was no longer interested in the reward per se. Instead, its activity was more consistent with a model of computation for learning to follow and predict the timings of the game.
The findings suggest that the ventral tegmental area always keeps an eye on the size of the prize and whether it has yet arrived. The ventral striatum, on the other hand, can monitor more sophisticated activities, such as the player’s ability to predict the timing of information about the reward, regardless of how big the reward turned out to be. Our brains can use the ventral striatum for tasks less directly related to getting rewards. Perhaps it is involved in curiosity and investigation? Perhaps it is the neural centre of scientific research itself?
Perhaps I am getting carried away. (Almost certainly, in fact, as the authors of the Neuron paper were quick to point out when I suggested it to them!) But research like this – whether or not it involves the scientists’ own ventral striatums – is starting to unpick the way our brains learn and regulate our responses to the rest of the world. We can’t predict how long we’ll have to wait for a full explanation, but it will definitely be a worthwhile reward.
Klein-Flügge MC, Hunt LT, Bach DR, Dolan RJ, & Behrens TE (2011). Dissociable reward and timing signals in human midbrain and ventral striatum. Neuron, 72 (4), 654-64 PMID: 22099466