Bayesian associative learning

A cue-outcome associative learning model for reaction times
Associative learning in the context of volatile environments

Bayesian decision theory (BDT) is a probabilistic framework that is concerned with how decisions are made or should be made, in ambiguous or uncertain situations. The “normative” approach to BDT focuses on identifying the “optimal” or “rational” decision in a given context. This has been applied extensively to methods supporting decisions (e.g., statistical testing). In contradistinction, “descriptive” BDT attempts to describe what people actually do. In this context, optimal decisions are considered as quantitative predictions that can be tested against observed behaviour.

BDT relies on two processes: belief updating and decision making, which are related to prior distributions and utility functions, respectively. In the context of associative learning, prior beliefs capture subjective (and maybe implicit) constraints regarding behaviourally-relevant contingencies. In addition, utility functions can be regarded as a surrogate for a task goal and/or potentially non-trivial incentives that eventually shape peoples’ heuristic policies.

Bayesian learning models can be used either to capture peoples’ hidden priors beliefs (which shape the way they learn) and/or preferences (which map beliefs onto actions). Many Bayesian learning models for choices and/or reaction times are available in the VBA toolbox. We refer interested readers to Daunizeau et al. (2010a) for details regarding the specificity of inverting BDT models.

A cue-outcome associative learning model for reaction times

The script demo_AVL_recog.m demonstrates a simple Bayesian learning model, which predicts trial-by-trial variations in reaction times during an associative learning task. This script reprocudes the simulations of the companion paper Daunizeau et al. (2010b).

The model was designed to provide a normative way of solving participants’ task, which we briefly describe here. At each trial, people were asked to categorize a visual outcome (face or house) as quickly as possible. Just prior to the visual stimulus presentation, they were exposed to an auditory cue (high-pitch or low-pitch tone) that was predictive of the visual outcome. Participants had to learn this association, which was changing over the course of the experiment.

The constituents of the BDT model are:

a set of hierarchically organized states (3 per available action). The first hierarchical level captures potential perceptual uncertainty regarding visual outcomes. The second hierarchical level contains the moments (mean and variance: 2 states) of peoples’ posterior belief about the cue-outcome contingency.
two evolution parameters that control the perceptual uncertainty and the prior variance of cue-action contingency, respectively.
two observation parameters that control the speed-accuracy trade-off during perceptual categorization.

The Bayesian learning rule consists in an trial-by-trial update of the cue-outcome contingency, which essentially approximates a Kalman filter.

Associative learning in the context of volatile environments

A volatile environment undergoes frequent and unpredictable changes in behaviourally-relevant contingencies. Critically, an optimal agent should adapt her learning rate to the environmental volatility, which may itself change over time. This implies that an optimal Bayesian learner should also track the environmental volatility. In Mathys et al. (2011), we proposed a simple Bayesian learning rule that possesses this form of adaptive fitness. The script demo_volatileVB.m demonstrates this (hierarchical) Bayesian learning model in the context of an associative learning task.

The constituents of this model are:

a set of hierarchically organized states (6 per available action). The first two levels are similar to the above model (3 states). The third level contains the moments (mean and variance: 2 states) of peoples’ posterior belief about the the action-outcome contingency’s volatility.
three evolution parameters that control the dynamical changes of the agent’s effective learning rate (volatility weight, base volatility, and prior variance of the volatility transition distribution).
two observation parameters: bias and temmperature.

The script demo_dynLearningRate.m demonstrates how one can use inversion diagnostics (e.g., Volterra kernels) to detect systematic changes in learning rates that conform to volatile Bayesian learning. In turn, these diagnostics can be used to gradually increase the complexity of learning models to capture the sophistication of (hidden) computational mechanisms.