The following public methods are used to construct a stochastic state space model system, consisting of a set of stochastic differential equations (SDEs), and one or more algebraic observation equations (AOEs). The AOEs are used to infer information about the value of the (latent) states governed by the SDEs, and thus must be functions of at least one state.
Value
The function returns an object of class R6
and ctsmTMB
,
which can be used to define a stochastic state space system.
Methods
Method .private()
Extract the private fields of a ctsmTMB model object. Primarily used for debugging.
Method addSystem()
Define stochastic differential equation(s) on the form
d<state> ~ f(t,<states>, <inputs>) * dt + g(t, <states>, <inputs>) * dw
Method addObs()
Define algebraic observation equations on the form
<observation> ~ h(t, <states>, <inputs>) + e)
where h
is the observation function, and e
is normally
distributed noise with zero mean.
This function only specifies the observation name, and its mean through h
.
Arguments
form
a formula specifying an observation equation
...
additional formulas similar to
form
for specifying multiple equations at once.obsnames
character vector specifying the name of the observation. This is used when the left-hand side of
form
consists of more than just a single variable (of class 'call').
Method setVariance()
Specify the variance of an observation equation.
A defined observation variable y
in e.g. addObs(y ~
h(t,<states>,<inputs>)
is perturbed by Gaussian noise with zero mean and
variance
to-be specified using setVariance(y ~ p(t,<states>,<inputs>)
.
We can for instance declare setVariance(y ~ sigma_x^2
where sigma_x
is a fixed effect parameter to be declared through
setParameter
.
Method addInput()
Declare variables as data inputs
Declare whether a variable contained in system, observation or observation
variance equations is an input variable. If e.g. the system equation contains
an input variable u
then it is declared using addInput(u)
.
The input u
must be contained in the data.frame .data
provided
when calling the estimate
or predict
methods.
Method setParameter()
Declare which variables that are (fixed effects) parameters in the specified model, and specify the initial optimizer guess, as well as lower / upper bounds during optimization. There are two ways to declare parameters:
You can declare parameters using formulas i.e.
setParameter( theta = c(1,0,10), mu = c(0,-10,10) )
. The first value is the initial value for the optimizer, the second value is the lower optimization bound and the third value is the upper optimization bound.You can provide a 3-column matrix where rows corresponds to different parameters, and the parameter names are provided as rownames of the matrix. The columns values corresponds to the description in the vector format above.
Method setAlgebraics()
Add algebraic relations.
Algebraic relations is a convenient way to transform parameters in your equations.
In the Ornstein-Uhlenbeck process the rate parameter theta
is always positive, so
estimation in the log-domain is a good idea. Instead of writing exp(theta)
directly
in the system equation one can transform into the log domain using the algebraic relation
setAlgebraics(theta ~ exp(logtheta))
. All instances of theta
is replaced
by exp(logtheta)
when compiling the C++ function. Note that you must provide values
for logtheta
now instead of theta
when declaring parameters through
setParameter
Method setInitialState()
Declare the initial state values i.e. mean and covariance for the system states.
Method setInitialVarianceScaling()
A scalar value that is multiplied onto the estimated initial state covariance matrix. The scaling is only applied when the initial state/cov is estimated, not when it is set by the user.
Method setLamperti()
Set a Lamperti Transformation
If the provided system equations have state dependent diffusion in a few available ways then it is advantageous to perform a transformation to remove the state dependence. This comes at the cost of a more complicated drift function. The following types of state-dependence is currently supported
'identity' - when the diffusion is state-independent (default)
'log' - when the diffusion is proportional to to x * dw
'logit' - when the diffusion is proportional to x * (1-x) * dw
'sqrt-logit' - when the diffusion is proportional to sqrt(x * (1-x)) * dw
Method setModelname()
Set modelname used to create the C++ file for TMB
When calling TMB::MakeADFun
the (negative log) likelihood function
is created in the directory specified by the setCppfilesDirectory
method with name <modelname>.cpp
Method setMAP()
Enable maximum a posterior (MAP) estimation.
Adds a maximum a posterior contribution to the (negative log) likelihood
function by evaluating the fixed effects parameters in a multivariate Gaussian
with mean
and covariance
as provided.
Method estimate()
Estimate the fixed effects parameters in the specified model.
Usage
ctsmTMB$estimate(
data,
method = "ekf",
ode.solver = "euler",
ode.timestep = diff(data$t),
loss = "quadratic",
loss_c = NULL,
control = list(trace = 1, iter.max = 1e+05, eval.max = 1e+05),
use.hessian = FALSE,
laplace.residuals = FALSE,
unconstrained.optim = FALSE,
estimate.initial.state = FALSE,
silent = FALSE
)
Arguments
data
data.frame containing time-vector 't', observations and inputs. The observations can take
NA
-values.method
character vector specifying the filtering method used for state/likelihood calculations. Must be one of either "lkf", "ekf", "laplace".
ode.solver
Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".
ode.timestep
numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided
data
. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.
loss
character vector. Sets the loss function type (only implemented for the kalman filter methods). The loss function is per default quadratic in the one-step residuals as is natural when the Gaussian (negative log) likelihood is evaluated, but if the tails of the distribution is considered too small i.e. outliers are weighted too much, then one can choose loss functions that accounts for this. The three available types available:
Quadratic loss (
quadratic
).Quadratic-Linear (
huber
)Quadratic-Constant (
tukey
)
The cutoff for the Huber and Tukey loss functions are determined from a provided cutoff parameter
loss_c
. The implementations of these losses are approximations (pseudo-huber and sigmoid approximation respectively) for smooth derivatives.loss_c
cutoff value for huber and tukey loss functions. Defaults to
c=3
control
list of control parameters parsed to
nlminb
as itscontrol
argument. See?stats::nlminb
for more informationuse.hessian
boolean value. The default (
TRUE
) causes the optimization algorithmstats::nlminb
to use the fixed effects hessian of the (negative log) likelihood when performing the optimization. This feature is only available for the kalman filter methods without any random effects.laplace.residuals
boolean - whether or not to calculate one-step ahead residuals using the method of oneStepPredict.
unconstrained.optim
boolean value. When TRUE then the optimization is carried out unconstrained i.e. without any of the parameter bounds specified during
setParameter
.estimate.initial.state
boolean value. When TRUE the initial state and covariance matrices are estimated as the stationary solution of the linearized mean and covariance differential equations. When the system contains time-varying inputs, the first element of these is used.
silent
logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.
Method likelihood()
Construct and extract function handlers for the negative log likelihood function.
The handlers from TMB
's MakeADFun
are constructed and returned.
This enables the user to e.g. choose their own optimization algorithm, or just
have more control of the optimization workflow.
Usage
ctsmTMB$likelihood(
data,
method = "ekf",
ode.solver = "euler",
ode.timestep = diff(data$t),
loss = "quadratic",
loss_c = NULL,
estimate.initial.state = FALSE,
silent = FALSE
)
Arguments
data
a data.frame containing time-vector 't', observations and inputs. The observations can take
NA
-values.method
character vector specifying the filtering method used for state/likelihood calculations. Must be one of either "lkf", "ekf", "laplace".
ode.solver
Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".
ode.timestep
the time-step used in the filtering schemes. The time-step has two different uses depending on the chosen method.
Kalman Filters: The time-step is used when numerically solving the moment differential equations.
Laplace Approximation: The time-step is used in the Euler-Maruyama simulation scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.
The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided
data
. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.
loss
character vector. Sets the loss function type (only implemented for the kalman filter methods). The loss function is per default quadratic in the one-step residuals as is natural when the Gaussian (negative log) likelihood is evaluated, but if the tails of the distribution is considered too small i.e. outliers are weighted too much, then one can choose loss functions that accounts for this. The three available types available:
Quadratic loss (
quadratic
).Quadratic-Linear (
huber
)Quadratic-Constant (
tukey
)
The cutoff for the Huber and Tukey loss functions are determined from a provided cutoff parameter
loss_c
. The implementations of these losses are approximations (pseudo-huber and sigmoid approximation respectively) for smooth derivatives.loss_c
cutoff value for huber and tukey loss functions. Defaults to
c=3
estimate.initial.state
boolean value. When TRUE the initial state and covariance matrices are estimated as the stationary solution of the linearized mean and covariance differential equations. When the system contains time-varying inputs, the first element of these is used.
silent
logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.
Method predict()
Perform prediction/filtration to obtain state mean and covariance estimates. The predictions are
obtained by solving the moment equations n.ahead
steps forward in time when using the current step posterior
state estimate as the initial condition.
Usage
ctsmTMB$predict(
data,
pars = NULL,
method = "ekf",
ode.solver = "euler",
ode.timestep = diff(data$t),
k.ahead = nrow(data) - 1,
return.k.ahead = 0:k.ahead,
return.covariance = TRUE,
initial.state = self$getInitialState(),
estimate.initial.state = private$estimate.initial,
silent = FALSE,
use.cpp = FALSE
)
Arguments
data
data.frame containing time-vector 't', observations and inputs. The observations can take
NA
-values.pars
fixed parameter vector parsed to the objective function for prediction/filtration. The default parameter values used are the initial parameters provided through
setParameter
, unless theestimate
function has been run, then the default values will be those at the found optimum.method
The prediction method
ode.solver
Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".
ode.timestep
numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided
data
. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.
k.ahead
integer specifying the desired number of time-steps (as determined by the provided data time-vector) for which predictions are made (integrating the moment ODEs forward in time without data updates).
return.k.ahead
numeric vector of integers specifying which n.ahead predictions to that should be returned.
return.covariance
boolean value to indicate whether the covariance (instead of the correlation) should be returned.
initial.state
a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state
estimate.initial.state
bool - stationary estimation of initial mean and covariance
silent
logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.
use.cpp
a boolean to indicate whether to use C++ to perform calculations
Returns
A data.frame that contains for each time step the posterior state estimate at that time.step (k = 0
), and the
prior state predictions (k = 1,...,n.ahead
). If return.covariance = TRUE
then the state covariance/correlation
matrix is returned, otherwise only the marginal variances are returned.
Method simulate()
Perform prediction/filtration to obtain state mean and covariance estimates. The predictions are
obtained by solving the moment equations n.ahead
steps forward in time when using the current step posterior
state estimate as the initial condition.
Usage
ctsmTMB$simulate(
data,
pars = NULL,
use.cpp = FALSE,
method = "ekf",
ode.solver = "rk4",
ode.timestep = diff(data$t),
simulation.timestep = diff(data$t),
k.ahead = nrow(data) - 1,
return.k.ahead = 0:k.ahead,
n.sims = 100,
initial.state = self$getInitialState(),
estimate.initial.state = private$estimate.initial,
silent = FALSE
)
Arguments
data
data.frame containing time-vector 't', observations and inputs. The observations can take
NA
-values.pars
fixed parameter vector parsed to the objective function for prediction/filtration. The default parameter values used are the initial parameters provided through
setParameter
, unless theestimate
function has been run, then the default values will be those at the found optimum.use.cpp
a boolean to indicate whether to use C++ to perform calculations
method
The natural TMB-style formulation where latent states are considered random effects and are integrated out using the Laplace approximation. This method only yields the gradient of the (negative log) likelihood function with respect to the fixed effects for optimization. The method is slower although probably has some precision advantages, and allows for non-Gaussian observation noise (not yet implemented). One-step / K-step residuals are not yet available in the package.
(Continuous-Discrete) Extended Kalman Filter where the system dynamics are linearized to handle potential non-linearities. This is computationally the fastest method.
(Continuous-Discrete) Unscented Kalman Filter. This is a higher order non-linear Kalman Filter which improves the mean and covariance estimates when the system display high nonlinearity, and circumvents the necessity to compute the Jacobian of the drift and observation functions.
All package features are currently available for the kalman filters, while TMB is limited to parameter estimation. In particular, it is straight-forward to obtain k-step-ahead predictions with these methods (use the
predict
S3 method), and stochastic simulation is also available in the cases where long prediction horizons are sought, where the normality assumption will be inaccurateode.solver
Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".
ode.timestep
numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided
data
. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.
simulation.timestep
timestep used in the euler-maruyama scheme
k.ahead
integer specifying the desired number of time-steps (as determined by the provided data time-vector) for which predictions are made (integrating the moment ODEs forward in time without data updates).
return.k.ahead
numeric vector of integers specifying which n.ahead predictions to that should be returned.
n.sims
number of simulations
initial.state
a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state
estimate.initial.state
bool - stationary estimation of initial mean and covariance
silent
logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.
return.covariance
boolean value to indicate whether the covariance (instead of the correlation) should be returned.
Returns
A data.frame that contains for each time step the posterior state estimate at that time.step (k = 0
), and the
prior state predictions (k = 1,...,n.ahead
). If return.covariance = TRUE
then the state covariance/correlation
matrix is returned, otherwise only the marginal variances are returned.
Examples
library(ctsmTMB)
model <- ctsmTMB$new()
# adding a single system equations
model$addSystem(dx ~ theta * (mu+u-x) * dt + sigma_x*dw)
# adding an observation equation and setting variance
model$addObs(y ~ x)
model$setVariance(y ~ sigma_y^2)
# add model input
model$addInput(u)
# add parameters
model$setParameter(
theta = c(initial = 1, lower=1e-5, upper=50),
mu = c(initial=1.5, lower=0, upper=5),
sigma_x = c(initial=1, lower=1e-10, upper=30),
sigma_y = 1e-2
)
# set the model initial state
model$setInitialState(list(1,1e-1))
# extract the likelihood handlers
nll <- model$likelihood(data=Ornstein)
#> Building model...
#> Checking data...
#> Constructing objective function...
#> ...took: 0.093 seconds.
#> Succesfully returned function handlers
# calculate likelihood, gradient and hessian w.r.t parameters
nll$fn(nll$par)
#> [1] 159.3085
nll$gr(nll$par)
#> [,1] [,2] [,3]
#> [1,] 30.81298 -38.65746 -170.1326
nll$he(nll$par)
#> [,1] [,2] [,3]
#> [1,] 46.88374 -66.66397 -61.73576
#> [2,] -66.66397 19.99982 77.31402
#> [3,] -61.73576 77.31402 908.18058
# estimate the parameters using an extended kalman filter
fit <- model$estimate(data=Ornstein)
#> Checking data...
#> Minimizing the negative log-likelihood...
#> 0: 159.30847: 1.00000 1.50000 1.00000
#> 1: 154.15990: 0.826082 1.71820 1.96028
#> 2: 136.01211: 0.614019 2.04466 1.64652
#> 3: 129.29879: 0.625113 2.06221 1.47868
#> 4: 128.42258: 0.666452 2.12632 1.14915
#> 5: 122.27008: 0.825282 2.37200 1.31892
#> 6: 106.61489: 1.59116 3.48574 1.25982
#> 7: 101.72505: 2.18624 3.49501 1.18917
#> 8: 99.610055: 2.43782 3.00283 0.957522
#> 9: 91.160123: 3.00760 2.93833 1.13182
#> 10: 87.333934: 3.60427 2.95994 1.07975
#> 11: 86.392347: 3.62036 3.05856 1.06026
#> 12: 85.804273: 4.02690 3.03800 1.04788
#> 13: 85.776558: 4.04412 3.02977 1.06498
#> 14: 85.771776: 4.04182 3.03209 1.06035
#> 15: 85.771772: 4.04188 3.03208 1.06018
#> 16: 85.771772: 4.04186 3.03210 1.06019
#> 17: 85.771772: 4.04186 3.03210 1.06019
#> Optimization finished!:
#> Elapsed time: 0.007 seconds.
#> The objective value is: 8.577177e+01
#> The maximum gradient component is: 2.7e-05
#> The convergence message is: relative convergence (4)
#> Iterations: 17
#> Evaluations: Fun: 25 Grad: 18
#> See stats::nlminb for available tolerance/control arguments.
#> Returning results...
#> Finished!
# perform moment predictions
pred <- model$predict(data=Ornstein)
#> Checking data...
#> Predicting with R...
#> Returning results...
#> Finished!
# perform stochatic simulations
sim <- model$simulate(data=Ornstein, n.sims=10)
#> Checking data...
#> Simulating with R...
#> Constructing return data.frame...
#> Finished.