Softmax rule for exploration-exploitation

Posted by Neville Sanjana at 12:28 PM EST

A very nice neuroecon expt. in the newest Nature:

Daw et al. find that humans choose between multiple slot machines (with different payoff probabilities) based on expected value (versus just going with the highest probability one most of the time and then randomly choosing another one every so often). Then, with fMRI, they find brain areas correlated with different value predictions.

News & Views (Daeyol Lee)

Cortical substrates for exploratory decisions in humans (Daw, Dayan)

Abstract:

Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration-exploitation’ dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on the basis of accumulated experience, the richest option, against the desire to choose a less familiar option that might turn out more advantageous (and thereby provide information for improving future decisions). Far from representing idle curiosity, such exploration is often critical for organisms to discover how best to harvest resources such as food and water. In appetitive choice, substantial experimental evidence, underpinned by computational reinforcement learning (RL) theory, indicates that a dopaminergic, striatal and medial prefrontal network mediates learning to exploit. In contrast, although exploration has been well studied from both theoretical and ethological perspectives, its neural substrates are much less clear. Here we show, in a gambling task, that human subjects’ choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma. Furthermore, using this characterization to classify decisions as exploratory or exploitative, we employ functional magnetic resonance imaging to show that the frontopolar cortex and intraparietal sulcus are preferentially active during exploratory decisions. In contrast, regions of striatum and ventromedial prefrontal cortex exhibit activity characteristic of an involvement in value-based exploitative decision making. The results suggest a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes, and provide a computationally precise characterization of the contribution of key decision-related brain systems to each of these functions.

One Response to “Softmax rule for exploration-exploitation”

  1. Gambler Says:

    Can we derive a method for characterizing types of gamblers by measuring this magnetic resonance activity?

Leave a Reply

 

Additional comments powered by BackType

  • nd categories

  • contact us

    Neurodudes is moderated by Neville Sanjana, Bayle Shanks, and Stephen Larson. Comments that you post might be delayed so that we can tell our software that it's not spam -- however, not all comments are pre-screened so don't assume that we have read them, either. Any money we make off this site is used to pay for hosting, or given to charity; if in the future we pay contributors, we will include reader-authors. None of us are medical doctors so please don't ask for medical advice. Contact us here.