Rui-Yang Zhang - Toy Bayesian Data Assimilation

We do Bayesian data assimilation for a deterministic ODE model using its noisy observations. The Python codes used are here.

Model and Observations

The ODE of interest is defined as:

\[ \frac{dx}{dt} = \exp(-l t) \cos(\alpha t) \]

where $l$ and $\alpha$ are unknown parameters to be inferred. The system is numerically integrated using the Euler method with an initial condition $x(0) = x_0$ over the time interval $[0, 5]$ with time step $\Delta t = 0.01$.

A synthetic trajectory is generated using the true parameter values: $\alpha = 5, l = 0.5, x_0 = 1.0$. Noisy observations are sampled at 100 randomly chosen time points $\{t_i\}_{i=1}^{100}$ following:

\[ y_i = x(t_i) + \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(0, \sigma^2) \]

with observation noise standard deviation $\sigma = 0.05$. The dynamics and observations are shown in Figure below.

Dynamics of $dx/dt = \exp(-l t) \cos(\alpha t)$ with noisy observations.

Bayesian Inference Setup

We assume the observation noise standard deviation is known, and wish to conduct Bayesian inference on the ODE hyperparameters and initial condition $\theta = (\alpha, l, x_0)$. We set a flat prior for all the parameters. The (unnormalised) posterior distribution of the parameters of interest $\theta$ is thus given by the likelihood.

Since we assume the dynamics are deterministic, all the stochasticities are captured in the observation noise. Here, the log-likelihood function is constructed by simulating the system forward (via Euler’s method with stepsize $0.01$) using candidate parameters and comparing the predicted states to observations, thus:

\[ \log L(\theta) = \log p(y \mid \theta) = \sum_{i=1}^{N} \log \phi\left( \frac{y_i - x_\theta(t_i)}{\sigma}\right) \]

where $x_\theta(t_i)$ denotes the simulated trajectory’s value at time $t_i$ under parameters $\theta$ and $\phi$ is the density function of a standard normal distribution.

MCMC Results

We run MCMC on the constructed posterior using Python’s BlackJax package (Cabezas et al. 2024). I first consider a random walk Metropolis. After trying to hand-tune the stepsize (for each parameter) and getting frustrated, I used an adaptive version of RWM with Robbins-Monro adaptation (Andrieu and Thoms 2008) targeting the optimal acceptance rate (around 0.3, given the low posterior dimension (Roberts and Rosenthal 2001)).

RWM with Robbins-Monro adaptation. (Top) Trace plots for sampled parameters with true values shown as red dashed lines. (Bottom) Histogram for the samples with true values shown as red dashed lines.

I initialise the chain at $(\alpha, l, x_0) = (4,1,0.5)$ so they are moderately close to the truth, yet some good MCMC movement is needed. For the chain, I first run 20,000 steps with adaptation as burn-in and warm-up, then use the learned stepsize to run another 20,000 steps without adaptation to obtain samples from the posterior. The resulting trajectories (burn-in removed) are shown in Figure above. The assimilated dynamics with 90% credible interval are shown below, and the result looks great. The assimilated dynamic trajectories are simulated using 1000 sets of parameter values, bootstrapped from the joint posterior samples, which are then averaged and quantiled to produce the mean and credible interval.

Assimilated dynamics with 90% credible interval.

NUTS

I initially tried NUTS but it did not work. Somehow, I tried it again, with the help of Thomas Newman, and it worked! For 1000 steps of warmup and 1000 further steps, we have the following results, which is visually indistinguishable from that of the RWM – seems like it works correctly too.

References

Andrieu, Christophe, and Johannes Thoms. 2008. “A Tutorial on Adaptive MCMC.” Statistics and Computing 18: 343–73.

Cabezas, Alberto, Adrien Corenflos, Junpeng Lao, Rémi Louf, Antoine Carnec, Kaustubh Chaudhari, Reuben Cohn-Gordon, et al. 2024. “Blackjax: Composable Bayesian Inference in JAX.” arXiv Preprint arXiv:2402.10797.

Roberts, Gareth O, and Jeffrey S Rosenthal. 2001. “Optimal Scaling for Various Metropolis-Hastings Algorithms.” Statistical Science 16 (4): 351–67.