As we have described here, VBA deals with so-called nonlinear state-space models. As we discuss below, this class of generative models is in fact very general, and can be used to emulate more sophisticated generative models that would apparently necessitate an extension of VBA’s inference capabilities. What follows is a mathematical note that exemplifies how to frame apparently VBA-incompatible statistical models as state-space models that VBA can handle.

Auto-regressive AR(1) models

First-order auto-regressive models or AR(1) capture the simplest form of markovian stochastic processes, namely: random walks. They are defined recursively as follows:

xt=c+θxt1+ηt

where c and θ are constant parameters, and ηt is typically assumed to be i.i.d. Gaussian random noise.

Some parameter constraints are necessary for the model to remain wide-sense stationary. In particular, AR(1) processes with θ∣>1 are not stationary.

Note that, in the seminal definition of AR(1) processes, the variable xt is directly observable. This corresponds to a specific case of nonlinear state-space models, where the evolution function of hidden states is given by f(xt)=c+θxt, and the observation function is trivial and given by g(xt)=xt with a measurement noise precision σ. Practically speaking, one can use high values for measurement noise precision (e.g. 104).

Ornstein-Uhlenbeck processes

An Ornstein-Uhlenbeck process is a continuous variant of AR(1) processes that drifts back and forth towards its long-term mean, which is why it is called mean-reverting. Formally speaking, an Ornstein–Uhlenbeck process x(t) satisfies the following stochastic differential equation:

dx(t)=θ(μx(t))+βdW(t)

where θ and β are constant parameters, μ is the long-term mean of the process, and dW(t) is a Wiener process. As can be seen in the equation above, the process is expected to exhibit some form of regression to the mean, because deviations from the mean μx(t) effectively induce restoring forces.

VBA can approximate such continuous process using the following evolution function: f(xt)=xt+Δtθ(μxt), where Δt is the discretization step. Evolution parameters now include both μ and θ. The continuous limit is obtained by increasing the number of recursive calls to the evolution function between two time samples (e.g., by setting VBA’s micro-time resolution to options.decim = 10 or more). Note that the variance β of the approximated Wiener process can be recovered from VBA’s estimate of the state noise precision.

Auto-regressive AR(p) models

Higher-order auto-regressive models or AR(p) are a non-Markovian generalization of AR(1) processes:

xt=c+pi=1θixti+ηt

where c and θ1,...,θp are constant parameters, and ηt is assumed to be i.i.d. Gaussian random noise.

In principle, VBA only deals with Markovian systems, i.e. systems whose evolution depends solely upon their current state. Thus, the non-Markovian structure of AR(p) processes should eschew any VBA-based system identification. But in fact, a very simple solution to this apparent issue is to augment the native state-space of AR(p) processes with dummy states that are copies of past instances of native states. As we will see, this allows us to describe any non-Markovian system in terms of a (higher-dimensional) Markovian system. Let us define zt as the following p×1 augmented state variable:

zt=[xtxt1xtp+1]zt+1=[xt+1xtxtp+2]=f(zt)

where p is the target order of the autoregressive process. This augmented state-space can be used to circumvent non-Markovian dynamics of AR(p) processes, the structure of which can be emulated using the following evolution function on zt:

f(zt)=[c+pi=1θixti+1xtxtp+2]=[c00]+[ATL1TLp1T]zt

where A and L1,...,Lp1 are p×1 vectors given by:

A=[θ1θ2θp],L1=[100],...,Lp1=[010]

One can see that the native unidimensional AR(p) process is now described in terms of Markovian dynamics on a p-dimensional state-space z, whose evolution function f(z) is compatible with VBA analyses.

The corresponding observation function would then simply be given by:

g(zt)=L1Tzt=xt

with a measurement noise precision σ (in practice, 104 or so).

Evolution parameters include c and θ1,...,θp, which can be estimated using VBA in the usual way. Note that in this example, there is no observation parameter.

Now, there is a last problem to fix. Recall that, when inverting stochastic models, VBA assumes that hidden states’s dynamics are driven by a mixture of deterministic (the evolution function) and ramdom forces (state noise), i.e.: zt+1=f(zt)+ηt. State noise is required for AR(p) processes because it eventually triggers observed variations in xt. However, we do not want state noise to perturb the “copy-paste” operation performed on the p1 last entries of zt. In principle, this can be achieved by resetting the state noise precision matrix Qx1 as the following diagonal matrix:

Qx1=[1000r000r]

where r to ensure that the past history of hidden states is transfered without distortion. Practically speaking, this can be approximated by changing the matrix options.priors.iQx{t} as above, with r set to an appriately high value (e.g., 104).

Finally, we recommend that the VBA’s Kalman backward lag be adjusted to the order of the AR(p) process, as follows: options.backwardLag=p. This is necessary to properly account for the delayed influence of the states’ history.

It is straightforward to generalize the above “augmented state-space” trick to multivariate AR(p) proceses, or, in fact, to any form of non-Markovian dynamics (including, e.g., delayed dynamical systems…)!

ARCH models

In brief, autoregressive conditional heteroskedastic (ARCH) processes are such that the variance of the current error term or innovation depends upon the system’s state. In discrete time, an example of a generalized ARCH process would be given by:

xt=a(xt1)+β(xt1)ηt

where a(x) is an arbitrary mapping of system’s states and β(x) acts as the state-dependent standard deviation of state noise η. Note that this dependency can be arbitrary, which endows ARCH models with a great deal of flexibility. Typically, ARCH models are employed in modeling financial time series that exhibit time-varying volatility, i.e. periods of swings (when β(x) is high) interspersed with periods of relative calm (when β(x) is low).

In its native form, VBA’s generative model does not apprently handle state-dependent noise. But in fact, one can use the same trick as for AR(p) models. First, one augments the native state-space with dummy states wt as follows:

zt=[xtwt]

As we will see, the dummy state wt will be composed of “pure” noise.

Second, one defines their evolution function as follows:

f(zt)=[a(xt)+β(xt)wt0]=[a(L1Tzt)+β(L1Tzt)L2Tzt0]

where L1 and L2 are are 2×1 vectors given by:

L1=[10],L2=[01]

Since the deterministic flow of dummy states is null, their stochastic dynamics are solely driven by state noise, i.e. wt=ηt.

Both the native deterministic flow a(x) and the state-dependent standard-deviation β(x) can be parameterized through some evolution parameter set θ, i.e.: a(x)=a(x,θ) and β(x)=β(x,θ). These evolution parameters can then be estimated using VBA.

Third, one resets the state noise precision matrix Qx1 as follows:

Qx1=[r001]

with r. This ensures that state noise η only enters at the level of dummy states w, which is then rescaled by β(x) before perturbing hidden states. Practically speaking, this can be approximated by changing the matrix options.priors.iQx{t} as above, with r set to an appriately high value (e.g., 104).

Finally, one defines the observation function on the augmented state-space as follows:

g(zt)=L1Tzt=xt

with a measure measurement noise precision σ (in practice, 104 or so).

Note that, as for AR(p) processes, we recommend that the VBA’s Kalman backward lag be increased as much as possible (at least options.backwarLag=2, but preferably more…).

The computational cost incurred when emulating ARCH models using homoscedastic state-space models is twofold: (i) some form of nonlinearity in the evolution function, and (ii) an increase in the dimensionality of the system. Note that the latter cost is somehow paid twice here, since the computational load of VBA’s inversion scales with the Kalman backward lag.

Covariance component models

For the sake of simplicity, we will only consider below static generative models of the form y=g(ϕ)+η, where y is the data, ϕ are uknown observation parameters, g is an arbitrary observation function and η are model residuals.

Recall that, by default, VBA can only estimate one variance hyperparameter (namely: σ) per data source. In other terms, the covariance of model residuals η is constrained to be a rescaling of a fixed covariance matrix Qy, i.e.: E[ηηT]=σ1Qy. However, one may have prior information regarding the statistical structure of model residuals, in the form of a mixture of covariance components:

E[ηηT]=iλiQi

where Qi are fixed covariance components and λi are unknown mixture coefficients. Covariance component models simply provide estimates of both model parameters ϕ and hyperparameters λ. Critical here is the fact that structured residuals can (and should) change parameters estimates…

Here as well, one can use a similar trick as above. To begin with, note that the above covariance component mixture would follow from defining model residuals η as the following weighted sum of dummy random variables wi:

{η=iλiwiE[wiwTi]=Qi

Thus, it suffices to augment the native parameter space with two sets of dummy parameters z and λ, and replace the native observation function g by the following augmented observation function h:

h(ϕ,z,λ)=g(ϕ)+iλiUizi

where Ui are the known matricial square root of covariance components Qi, i.e.: Qi=UiUTi (these can be obtained from numerical SVD decompositions).

Setting i.i.d. normal priors on dummy variables z would then emulate covariance component models. On a practical note, parameter estimation would be better behaved if one used some form of hard positivity constraint on the λs.

If the covariance components reduce to channel-specific variances, then one can use VBA’s “multi-source” inversion as a simple and elegant shortcut!

Note that, by construction, native model parameters ϕ and dummy noise variables z compete for explaining variability in observed data y. This is not an artefactucal consequence of our way of treating covariance component models. This competition is simply more implicit in the standard covariance component model, whereby one does not directly derive posterior densities over noise variables, but rather provide estimates of model residuals from the posterior prediction error: ˆη=yg(ˆϕ)

The posterior distribution of variables zi can deviate from their prior distribution. To make sure the dummy residual variables wi have the proper covariance structure, one can constrain the sample mean of zi to be zero, and their sample variance to be one. This can be done by zscoring the zi prior to entering the above augmented observation function h.

Switching dynamical systems

Many natural systems, ranging from neurons firing patterns to collective motion of animal crowds, give rise to time series data with complex dynamics that may not be amenable to explicit modelling. Neverthless, one can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. This induces so-called switching dynamical systems. The state-space of such system is composed of both continuous (x) and discrete (z) states. The latter control the form of the evolution function of the former, eventually capturing potentially abrupt changes in the system’s deterministic flow.

Strictly speaking, a switching dynamical system cannot be derived as a limiting subcase of nonlinear state-space models. The issue here, derives from (mutually exclusive) discrete states, that evolve according to transition probabilities that have no formal equivalent in continuous state-spaces…

Having said this, one can approximate discrete states in terms of continuous states passed through steep sigmoidal mappings. For example, let z be the n×1 indicator vector of a discrete state, i.e. the only non-zero entry of z specifies which element of the discrete space is currently active. Then, there always exists some continuous n×1 vector x that verifies:

z=sβ(x)=expβxni=1expβxi

where sβ() is a softmax mapping with inverse-temperature β.

In switching dynamical systems, such discrete states evolve according to the following transition probability matrix:

P(ztzt1)=Π=[π11π12π1nπ21π22π2nπn1πn2πnn]

where 1=ni=1πij for all j.

Conditional on zt1, the first- and second-order moments of zt are thus given by:

E[ztzt1]=Πzt1

and:

V[ztzt1]=diag(Πzt1)Πzt1ΠzTt1

where diag(x) is, by abuse of notation, the n×n matrix whose diagonal entries is composed of the vector x.

Thus, switching dynamical systems can be seen as some form of multivariate ARCH model, whereby the noise variance-covariance matrix is dependent upon the previous state of the system (and given by the above equation). One would then define an augmented state-space using n continuous states mapped through a sigmoid mapping and n dummy states that are driven by noise but are rescaled approrpiately prior to perturbing the first half of the state-space…