Skip to contents

This vignette demonstrates how to use the predict method for calculating k-step-ahead forecasts for the mean and variance, available for the Kalman filtering methods (lkf, ekf and ukf).

Note: These moment predictions assume that the underlying probability density of the SDE remains approximately Gaussian. This will hold for linear and mildly non-linear SDEs. In addition the assumption will be worse the longer the prediction horizon. In those cases the simulate method may be more useful (see the relevant vignette).

Introduction

Notation

  • We use subscript xt=x(t)x_{t} = x(t) to denote time for continuous variables.

  • We use subscript yk=y(tk)y_k = y(t_k) to indicate discrete time-points for discrete variables.

  • We denote the set of observations from the initial time t0t_0 until the “current” time tit_{i} by 𝒴i={yi,yi1,...,y1,y0} \mathcal{Y}_{i} = \left\{ y_{i}, y_{i-1},...,y_{1},y_{0} \right\}

  • We denote mean and variance by μt=E[xt]\mu_{t} = \mathrm{E}\left[x_t\right] and Pt=V[xt]P_{t} = \mathrm{V}\left[x_t\right].

  • We use μkk1\mu_{k \mid k-1} and Pkk1P_{k \mid k-1} to denote the mean/variance at time tkt_{k} conditional on the observations 𝒴k1\mathcal{Y}_{k-1} i.e. μkk=E[xtk𝒴k]Pkk=V[xtk𝒴k] \mu_{k \mid k} = \mathrm{E}\left[x_{t_k} \mid \mathcal{Y}_{k}\right] \qquad\qquad P_{k \mid k} = \mathrm{V}\left[x_{t_k} \mid \mathcal{Y}_{k}\right]

Stochastic State Space System

We consider the following types of stochastic state space systems

xt=f(t,xt,ut,θ)dt+G(t,xt,ut,θ)dBt x_{t} = f(t,x_t,u_t,\theta) \, \mathrm{d}t + G(t,x_t,u_t, \theta) \, \mathrm{d}B_{t}

yk=h(tk,xtk,utk,θ)+εk y_{k} = h(t_k, x_{t_k}, u_{t_k}, \theta) + \varepsilon_{k}

where the observation noise is zero-mean Gaussian εk𝒩(0,Σ(tk,xtk,utk,θ))\varepsilon_{k} \sim \mathcal{N}(0,\Sigma(t_k, x_{t_k}, u_{t_k}, \theta)). In this notation uu are inputs and θ\theta are fixed effects parameters to be estimated.

We refer to the functions ff as the drift, GG as the diffusion, and hh as the link. We may omit arguments and just write e.g. f(xt)f(x_t) for readability.

Observations and Likelihood

The likelihood, which is the joint density of all observations, can be rewritten by using repeated conditioning as

L(θ)=p(𝒴N)=k=1Np(yk𝒴k1)=k=1Np(h(xk)+εk𝒴k1) L(\theta) = p(\mathcal{Y}_{N}) = \prod_{k=1}^{N} p(y_{k} \mid \mathcal{Y}_{k-1}) = \prod_{k=1}^{N} p(h(x_{k}) + \varepsilon_{k} \mid \mathcal{Y}_{k-1})

What is a “Prediction”?

When we say predictions we mean forecasts of the mean and variance (first and second order moments) of the underlying SDE. The fact that these are used to represent the forecast distribution implies the underlying assumption of linearity of the SDE, such that distributions are always Gaussian.

We use the moment differential equations of the SDE to generate these forecasts.

Moment Differential Equations

The moment differential equations (MDEs) are ordinary differential equations that govern the evolution of the moments of xtx_{t}. The first two moments (mean and variance) are governed by

dμdt=E[f(xt)] \dfrac{\mathrm{d}\mu}{\mathrm{d}t} = \mathrm{E}\left[f(x_t)\right]

dPdt=E[f(xt)(xtE[xt])T]+E[(xtE[xt])f(xt)T]+E[G(xt)G(xt)T] \dfrac{\mathrm{d}P}{\mathrm{d}t} = \mathrm{E}\left[ f(x_{t})\left(x_t - \mathrm{E}\left[x_t\right]\right)^{T} \right] + \mathrm{E}\left[\left(x_t - \mathrm{E}\left[x_t\right]\right)f(x_{t})^{T} \right] + \mathrm{E}\left[ G(x_t) G(x_t)^{T} \right]

In the approximate Gaussian setting where, with respect to xx, ff is linear and GG is constant, the MDEs simplify to

dμdt=f(μt) \dfrac{\mathrm{d}\mu}{\mathrm{d}t} = f(\mu_t)

dPdt=[fx(μt)]P+P[fx(μt)]T+G(μt)[G(μt)]T \dfrac{\mathrm{d}P}{\mathrm{d}t} = \left[\dfrac{\partial f}{\partial x}(\mu_t)\right] P + P \left[\dfrac{\partial f}{\partial x}(\mu_t)\right]^{T} + G(\mu_t) \left[G(\mu_t)\right]^{T}

A k-step prediction is a prior estimate of the assumed Gaussian distribution k time-steps ahead, relatively to the current position. This distribution is parametrized by the prior mean μi+ki\mu_{i+k \mid i} and variance Pi+kiP_{i+k \mid i}. These priors are obtained by integrating the MDEs forward in time, starting from the current posterior distribution parametrized by μii\mu_{i \mid i} and PiiP_{i \mid i} i.e.

μi+k|i=μi|i+titi+kf(μt)dt \mu_{i+k|i} = \mu_{i|i} + \int_{t_{i}}^{t_{i+k}} f(\mu_{t}) \, dt

Pi+k|i=Pi|i+titi+k([fx(μt)]P+P[fx(μt)]T+G(μt)[G(μt)]T)dt P_{i+k|i} = P_{i|i} + \int_{t_{i}}^{t_{i+k}} \left( \left[\dfrac{\partial f}{\partial x}(\mu_t)\right] P + P \left[\dfrac{\partial f}{\partial x}(\mu_t)\right]^{T} + G(\mu_t) \left[G(\mu_t)\right]^{T} \right) \, dt

In a Kalman filter setting this amounts to repeatedly performing time-update steps while skipping the data-update step entirely.

Algorithm

When using the predict method the forecast horizon is controlled by the k.ahead argument, i.e. how many time-steps “into the future” forecasts are wanted for. The algorithm returns NN (N = nrow(data) - k.ahead) forecast scenarios. Each of these i=0,1,..,N1i=0,1,..,N-1 scenarios consist of kahead+1k_{\text{ahead}}+1 state values, one for each time point t=ti+jt=t_{i+j} where j=0,1,..,kaheadj=0,1,..,k_{\text{ahead}}.

The algorithm carries out the following step-wise procedure:

  • Filter with the provided data to obtain posterior state and variance estimates for every point in time in the provided data[,"t"] column.

  • Extract the first NN estimates and discard the remainder.

  • For each i=0,1,..,N1i=0,1,..,N-1 posteriors, integrate the MDEs forward from the initial time tit_{i} until time ti+kaheadt_{i+k_{\text{ahead}}} producing moment forecasts at each of the intermediate times ti+jt_{i+j}.

The output of predict is a list with the two entries states and observations. Each entry is a matrix with N(1+kahead)N \cdot (1+k_{\text{ahead}}) rows, obtained by row-binding each forecast scenario together.

Example

We consider the Ornstein Uhlenbeck SDE with time-varying mean and with identify link function:

dxt=θ(atxt)dt+σxdbt \mathrm{d}x_{t} = \theta (a_t - x_{t}) \, \mathrm{d}t \, + \sigma_{x} \, \mathrm{d}b_{t}

ytk=xtk+εtk y_{t_{k}} = x_{t_{k}} + \varepsilon_{t_{k}}

The mean is the “arbitrary” time-varying signal at=tut2cos(tut)a_t = tu_{t}^{2}-\cos(tu_{t}), where utu_{t} is a known input signal.

Create Model

First we create the model as follows:

## Load libraries
library(ctsmTMB)
library(ggplot2) ## plots
library(latex2exp) ## LaTeX plot fonts

## Create model
model <- newModel()
model$addSystem(dx ~ theta * (a*u^2-cos(2*pi*t*u) - x) * dt + sigma_x*dw)
model$addObs(y ~ x)
model$setVariance(y ~ sigma_y^2)
model$addInput(u)

## Set parameter values
## note: not strictly necessary to set lower/upper bounds
model$setParameter(
  theta   = c(initial = 2, lower = 0,    upper = 100),
  a       = c(initial = 1, lower = 1e-5, upper = 100),
  sigma_x = c(initial = 0.2, lower = 1e-5, upper = 5),
  ## fix sigma_y to 0.05 by not giving any upper/lower bounds
  sigma_y = c(initial = 5e-2)
)

## Set initial state mean and covariance
## note: diag(1) is not strictly needed
model$setInitialState(list(1, 1e-1*diag(1)))

Create Data using Simulate

Now we create data to perform predictions on using the simulate method, using the true parameters

θ=20a=1σx=1.00σy=0.05 \theta = 20 \qquad a = 1 \qquad \sigma_{x} = 1.00 \qquad \sigma_{y}=0.05 and the input signal

ut=tanh(sin(Bt)) u_{t} = \tanh\left(\sin\left(B_{t}\right)\right)

## set true parameters, and create data
true.pars <- c(theta=4, a=10, sigma_x=1, sigma_y=0.05)
dt.sim <- 1e-3
t.sim <- seq(0, 1, by=dt.sim)

## seed for input creation
set.seed(20)
u.sim <- cumsum(rnorm(length(t.sim),sd=0.1))
# u.sim <- tanh(sin(u.sim))
df.sim <- data.frame(t=t.sim, y=NA, u=u.sim)

## set rng seeds for C++ states and observations
cpp.seeds <- c(20,20)

## perform simulation
sim <- model$simulate(data=df.sim, 
                      pars=true.pars, 
                      n.sims=1,
                      silent=T,
                      cpp.seeds = cpp.seeds)

## extract simulations
y.sim <- sim$observations$y$i0[,1]

## extract observations as every 10th simulation
iobs <- seq(1, length(t.sim), by=10)
t.obs <- t.sim[iobs]
y.obs <- y.sim[iobs]
u.obs <- u.sim[iobs]

## create data
df.obs <- data.frame(
  t = t.obs,
  u = u.obs,
  y = y.obs
)

We show the simulated and observed inputs and observations in the plot below.

ggplot() + 
  geom_line(aes(x=t.sim, y=y.sim, color="y_t")) +
  geom_point(aes(x=t.obs, y=y.obs, color="y_k")) +
  geom_line(aes(x=t.sim, y=u.sim, color="u_t")) +
  geom_point(aes(x=t.obs, y=u.obs, color="u_k")) +
  scale_color_discrete(
    labels=unname(TeX(c("$y_{t}$","$y_{k}$","$u_{t}$","$u_{k}$"))),
    breaks = c("y_t", "y_k", "u_t", "u_k"),
  ) +
  ctsmTMB:::getggplot2theme() +
  theme(legend.text = ggplot2::element_text(size=15)) +
  labs(color="",x="Time",y="")

Predict

We can now try to predict on the generated data

## set the first observation to NA
df.temp <- df.obs
df.temp$y[1] <- NA

## predict
pred <- model$predict(data=df.temp,
                      pars=true.pars,
                      k.ahead=nrow(df.temp)-1)
## Checking and setting data...
## Predicting with C++...
## Returning results...
## Finished!

A few notes for the code above:

  • We use the default value of argument k.ahead = nrow(data)-1 to get a forecast over the entire time-vector. This means that there is only N=1N=1 forecast scenario, but its very long.

  • We remove the first observation to prevent a posterior update at the initial time-point t[1]. This allows us to fully control the initial state mean and variance via setInitialState.

  • Since k.ahead = nrow(data)-1 and N=1N=1 implies just one forecast scenario starting from the initial time point the remaining observations y[2:nrow(data)] are not used at all, and might as well be NA.

Output

The returned pred object is a list of two matrices named states and observations.

  1. The first five columns of each matrix are time indices and time-values for each of the forecast scenarios:

    • The column i holds the time-index from where the forecast began.
    • The column j holds the time-index for where the forecast is.
    • The column t.i is the numeric time value associated with index i.
    • The column t.j is the numeric time value associated with index j.
    • The column k.ahead is the number of forecasted timesteps since the last posterior update (if any data was available at that time-point).
  2. The next columns in both matrices are mean and covariances for the states and observations respectively, named appropriately.

  3. The final columns in the observations matrix also hold the corresponding observations in the data. The name is suffixed by .data e.g. y.data.

head(pred$states)
##      i. j. t.i  t.j k.ahead        x     var.x
## [1,]  0  0   0 0.00       0 1.000000 0.1000000
## [2,]  0  1   0 0.01       1 0.935318 0.1019221
## [3,]  0  2   0 0.02       2 0.909732 0.1036964
## [4,]  0  3   0 0.03       3 1.009002 0.1053343
## [5,]  0  4   0 0.04       4 1.226051 0.1068463
## [6,]  0  5   0 0.05       5 1.357398 0.1082420

head(pred$observations)
##   i. j. t.i  t.j k.ahead        y     var.y    y.data
## 1  0  0   0 0.00       0 1.000000 0.1025000        NA
## 2  0  1   0 0.01       1 0.935318 0.1044221 0.4999998
## 3  0  2   0 0.02       2 0.909732 0.1061964 0.7304777
## 4  0  3   0 0.03       3 1.009002 0.1078343 0.6074823
## 5  0  4   0 0.04       4 1.226051 0.1093463 0.8235081
## 6  0  5   0 0.05       5 1.357398 0.1107420 1.1861044

Long prediction horizons

We can perform long horizon estimates simply by setting the k.ahead argument very large. The default and maximum allowed value is k.ahead = nrow(.data)-1. Thus, by default, the predict method will generate one single prediction the length of the provided data.

This is useful for instance when searching for good initial guesses for the optimizer before calling the estimate method. Consider the example below where we calculate predictions for a series of differnet values of theta:

## create vectors with different theta's
pars <- lapply(c(1,2,4,8), function(x) c(x, 10, 1, 0.05))

## predict for the various theta values
pred1 = model$predict(df.obs, pars=pars[[1]])
pred2 = model$predict(df.obs, pars=pars[[2]])
pred3 = model$predict(df.obs, pars=pars[[3]])
pred4 = model$predict(df.obs, pars=pars[[4]])

Plotting these state estimates against the data we get a feeling that θ[10,50]\theta \in \left[10,50\right] (with θ=20\theta=20 being the truth here).

plot.df <- data.frame(t = pred$states[,"t.j"],
                      t.obs = t.obs,
                      y.obs = y.obs,
                      x1=pred1$states[,"x"],
                      x2=pred2$states[,"x"],
                      x3=pred3$states[,"x"],
                      x4=pred4$states[,"x"])
ggplot(data=plot.df) +
  geom_line(aes(x=t, y=x1, color="x1")) +
  geom_line(aes(x=t, y=x2, color="x2")) +
  geom_line(aes(x=t, y=x3, color="x3")) +
  geom_line(aes(x=t, y=x4, color="x4")) +
  geom_point(aes(x=t.obs, y=y.obs, color="y")) +
  scale_color_discrete(
    labels=unname(TeX(c("$\\theta=1$","$\\theta=10$","$\\theta=50","$\\theta=100$","$y_{k}$"))),
    breaks = c("x1", "x2", "x3", "x4", "y"),
  ) +
  ctsmTMB:::getggplot2theme() + 
  theme(legend.text = ggplot2::element_text(size=15)) +
  labs(color="",x="Time",y="")

Forecast Evaluation

We can evaluate the forecast performance of our model by comparing predictions against the observed data. We start by estimating the most likely parameters of the model:

fit = model$estimate(df.obs)
## Checking and setting data...
## Constructing objective function and derivative tables...
## Minimizing the negative log-likelihood...
##   0:     12263.953:  2.00000  1.00000 0.200000
##  10:    -48.649843:  3.63327  10.1890  1.17251
##  20:    -49.410101:  3.63423  10.2280  1.30571
##   Optimization finished!:
##             Elapsed time: 0.005 seconds.
##             The objective value is: -4.94101e+01
##             The maximum gradient component is: 3.6e-05
##             The convergence message is: relative convergence (4)
##             Iterations: 20
##             Evaluations: Fun: 23 Grad: 21
##             See stats::nlminb for available tolerance/control arguments.
## Returning results...
## Finished!

print(fit)
## Coefficent Matrix 
##         Estimate Std. Error t value  Pr(>|t|)    
## theta    3.63423    0.27084  13.418 < 2.2e-16 ***
## a       10.22801    0.56680  18.045 < 2.2e-16 ***
## sigma_x  1.30571    0.11602  11.255 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and then predict over an appropriate forecast horizon. In this example we let that horizon be 25-steps:

pred.horizon <- 15
pred = model$predict(df.obs, k.ahead=pred.horizon)
## Checking and setting data...
## Predicting with C++...
## Returning results...
## Finished!

Let’s plot the pred.horizon-step predictions against the observations.

pred.obs = pred$states[pred$states[,"k.ahead"]==pred.horizon,]

The plot shows 25-step predictions with associated 95% prediction interval in grey, against the observations.

Lastly we calculate the mean prediction accuracy using an RMSE-score for all prediction horizons:

pdf <- data.frame(pred$states, y.data = pred$observations[,"y.data"])
rmse <- data.frame(
  k.ahead = 0:pred.horizon,
  rmse = sapply(
    with(pdf, split(pdf, k.ahead)), 
    function(df) {
      sqrt(mean((df$x - df$y.data)^2, na.rm = TRUE))
    })
)

The plot shows the Root Mean Square Error score for prediction horizons from 0 to 25.

Arguments


The predict method accepts the following arguments

model$predict(data,
              pars = NULL,
              method = "ekf",
              ode.solver = "rk4",
              ode.timestep = diff(data$t),
              k.ahead = nrow(data)-1,
              return.k.ahead = 0:min(k.ahead, nrow(data)-1),
              return.variance = c("marginal", "none", "covariance", "correlation"),
              ukf.hyperpars = c(1, 0, 3),
              initial.state = self$getInitialState(),
              estimate.initial.state = private$estimate.initial,
              use.cpp = TRUE,
              silent = FALSE,
              ...)

pars

This argument is a vector of parameter values, which are used to generate the predictions. The default behaviour is to use the parameters obtained from the latest call to estimate (if any) or alternative to use the initial guesses defined in setParameter.


use.cpp

Perform predictions in R-based code (use.cpp=FALSE) or C++-based code or C++ (use.cpp=TRUE, default).


method

See the description in the estimate vignette.


ode.solver

See the description in the estimate vignette.

Note: When the argument use.cpp=TRUE then the only solvers available are euler and rk4.


ode.timestep

See the description in the estimate vignette.


k.ahead

This integer argument determines the number of prediction steps desired.


return.k.ahead

This vector of integers determines which k-step predictions are returned. The default behaviour is to return all prediction steps (as determined by k.ahead).


return.variance

The four options are:

  1. ‘marginal’ - return marginal variances only.
  2. ‘none’ - return no dispersion-related quantities.
  3. ‘covariance’ - return full covariance matrix.
  4. ‘correlation’ - return full correlation matrix.

initial.state

Sets the initial state x̂00\hat{x}_{0 \mid 0} and variance P00P_{0 \mid 0}.


estimate.initial.state

A boolean to indicate whether or not estimate the initial state mean value instead of using the one provided via the model$setInitialState method or the initial.state argument to this method.

The estimation is carried out by root-finding the stationary mean equation equivalent to minimizing

x̂00=minx(f(t0,x,u0,θ))2 \hat{x}_{0 \mid 0} = \min_{x} \left(f(t_0, x, u_{0}, \theta)\right)^{2}

Note: This option is only available when use.cpp=FALSE using the pure R implementation.


silent

See the description in the estimate vignette.