Methods for the 'ctsmTMB' R6 class

The following public methods are used to construct a stochastic state space model system, consisting of a set of stochastic differential equations (SDEs), and one or more algebraic observation equations (AOEs). The AOEs are used to infer information about the value of the (latent) states governed by the SDEs, and thus must be functions of at least one state.

Value

The function returns an object of class R6 and ctsmTMB, which can be used to define a stochastic state space system.

Methods

Method `new()`

Initialize private fields

Usage

ctsmTMB$new()

Method `.private()`

Extract the private fields of a ctsmTMB model object. Primarily used for debugging.

Usage

ctsmTMB$.private()

Method `addSystem()`

Define stochastic differential equation(s) on the form

d<state> ~ f(t,<states>, <inputs>) * dt + g(t, <states>, <inputs>) * dw

Usage

ctsmTMB$addSystem(form, ...)

Arguments

form: a formula specifying a stochastic differential equation
...: additional formulas similar to form for specifying multiple equations at once.

Method `addObs()`

Define algebraic observation equations on the form

<observation> ~ h(t, <states>, <inputs>) + e)

where h is the observation function, and e is normally distributed noise with zero mean.

This function only specifies the observation name, and its mean through h.

Usage

ctsmTMB$addObs(form, ..., obsnames = NULL)

Arguments

form: a formula specifying an observation equation
...: additional formulas similar to form for specifying multiple equations at once.
obsnames: character vector specifying the name of the observation. This is used when the left-hand side of form consists of more than just a single variable (of class 'call').

Method `setVariance()`

Specify the variance of an observation equation.

A defined observation variable y in e.g. addObs(y ~ h(t,<states>,<inputs>) is perturbed by Gaussian noise with zero mean and variance to-be specified using setVariance(y ~ p(t,<states>,<inputs>). We can for instance declare setVariance(y ~ sigma_x^2 where sigma_x is a fixed effect parameter to be declared through setParameter.

Usage

ctsmTMB$setVariance(form, ...)

Arguments

form: formula class specifying the observation equation to be added to the system.
...: additional formulas identical to form to specify multiple observation equations at a time.

Method `addInput()`

Declare variables as data inputs

Declare whether a variable contained in system, observation or observation variance equations is an input variable. If e.g. the system equation contains an input variable u then it is declared using addInput(u). The input u must be contained in the data.frame .data provided when calling the estimate or predict methods.

Usage

ctsmTMB$addInput(...)

Arguments

...: variable names that specifies the name of input variables in the defined system.

Method `setParameter()`

Declare which variables that are (fixed effects) parameters in the specified model, and specify the initial optimizer guess, as well as lower / upper bounds during optimization. There are two ways to declare parameters:

You can declare parameters using formulas i.e. setParameter( theta = c(1,0,10), mu = c(0,-10,10) ). The first value is the initial value for the optimizer, the second value is the lower optimization bound and the third value is the upper optimization bound.
You can provide a 3-column matrix where rows corresponds to different parameters, and the parameter names are provided as rownames of the matrix. The columns values corresponds to the description in the vector format above.

Usage

ctsmTMB$setParameter(...)

Arguments

...: a named vector or matrix as described above.

Method `setAlgebraics()`

Add algebraic relations.

Algebraic relations is a convenient way to transform parameters in your equations. In the Ornstein-Uhlenbeck process the rate parameter theta is always positive, so estimation in the log-domain is a good idea. Instead of writing exp(theta) directly in the system equation one can transform into the log domain using the algebraic relation setAlgebraics(theta ~ exp(logtheta)). All instances of theta is replaced by exp(logtheta) when compiling the C++ function. Note that you must provide values for logtheta now instead of theta when declaring parameters through setParameter

Usage

ctsmTMB$setAlgebraics(form, ...)

Arguments

form: algebraic formula
...: additional formulas

Method `setInitialState()`

Declare the initial state values i.e. mean and covariance for the system states.

Usage

ctsmTMB$setInitialState(initial.state)

Arguments

initial.state: a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state.

Method `setInitialVarianceScaling()`

A scalar value that is multiplied onto the estimated initial state covariance matrix. The scaling is only applied when the initial state/cov is estimated, not when it is set by the user.

Usage

ctsmTMB$setInitialVarianceScaling(scaling)

Arguments

scaling: a numeric scalar value.

Method `setLamperti()`

Set a Lamperti Transformation

If the provided system equations have state dependent diffusion in a few available ways then it is advantageous to perform a transformation to remove the state dependence. This comes at the cost of a more complicated drift function. The following types of state-dependence is currently supported

'identity' - when the diffusion is state-independent (default)
'log' - when the diffusion is proportional to to x * dw
'logit' - when the diffusion is proportional to x * (1-x) * dw
'sqrt-logit' - when the diffusion is proportional to sqrt(x * (1-x)) * dw

Usage

ctsmTMB$setLamperti(transforms, states = NULL)

Arguments

transforms: character vector - one of either "identity, "log", "logit", "sqrt-logit"
states: a vector of the state names for which the specified transformations should be applied to.

Method `setModelname()`

Set modelname used to create the C++ file for TMB

When calling TMB::MakeADFun the (negative log) likelihood function is created in the directory specified by the setCppfilesDirectory method with name <modelname>.cpp

Usage

ctsmTMB$setModelname(name)

Arguments

name: string defining the model name.

Method `setMAP()`

Enable maximum a posterior (MAP) estimation.

Adds a maximum a posterior contribution to the (negative log) likelihood function by evaluating the fixed effects parameters in a multivariate Gaussian with mean and covariance as provided.

Usage

ctsmTMB$setMAP(mean, cov)

Arguments

mean: mean vector of the Gaussian prior parameter distribution
cov: covariance matrix of the Gaussian prior parameter distribution

Method `getSystems()`

Retrieve system equations.

Usage

ctsmTMB$getSystems()

Method `getObservations()`

Retrieve observation equations.

Usage

ctsmTMB$getObservations()

Method `getVariances()`

Retrieve observation variances

Usage

ctsmTMB$getVariances()

Method `getAlgebraics()`

Retrieve algebraic relations

Usage

ctsmTMB$getAlgebraics()

Method `getInitialState()`

Retrieve initially set state and covariance

Usage

ctsmTMB$getInitialState()

Method `getParameters()`

Get initial (and estimated) parameters.

Usage

ctsmTMB$getParameters(type = "all", value = "all")

Arguments

type: one of "all", free" or "fixed" parameters.
value: one of "all", initial", "estimate", "lower" or "upper"

Method `getEstimate()`

Retrieve initially set state and covariance

Usage

ctsmTMB$getEstimate()

Method `getLikelihood()`

Retrieve initially set state and covariance

Usage

ctsmTMB$getLikelihood()

Method `getPrediction()`

Retrieve initially set state and covariance

Usage

ctsmTMB$getPrediction()

Method `getSimulation()`

Retrieve initially set state and covariance

Usage

ctsmTMB$getSimulation()

Method `estimate()`

Estimate the fixed effects parameters in the specified model.

Usage

ctsmTMB$estimate(
  data,
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  loss = "quadratic",
  loss_c = NULL,
  control = list(trace = 1, iter.max = 1e+05, eval.max = 1e+05),
  use.hessian = FALSE,
  laplace.residuals = FALSE,
  unconstrained.optim = FALSE,
  estimate.initial.state = FALSE,
  silent = FALSE
)

Arguments

data

data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

method

character vector specifying the filtering method used for state/likelihood calculations. Must be one of either "lkf", "ekf", "laplace".

ode.solver

Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".

ode.timestep

numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided data. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:

Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

loss

character vector. Sets the loss function type (only implemented for the kalman filter methods). The loss function is per default quadratic in the one-step residuals as is natural when the Gaussian (negative log) likelihood is evaluated, but if the tails of the distribution is considered too small i.e. outliers are weighted too much, then one can choose loss functions that accounts for this. The three available types available:

Quadratic loss (quadratic).
Quadratic-Linear (huber)
Quadratic-Constant (tukey)

The cutoff for the Huber and Tukey loss functions are determined from a provided cutoff parameter loss_c. The implementations of these losses are approximations (pseudo-huber and sigmoid approximation respectively) for smooth derivatives.

loss_c

cutoff value for huber and tukey loss functions. Defaults to c=3

control

list of control parameters parsed to nlminb as its control argument. See ?stats::nlminb for more information

use.hessian

boolean value. The default (TRUE) causes the optimization algorithm stats::nlminb to use the fixed effects hessian of the (negative log) likelihood when performing the optimization. This feature is only available for the kalman filter methods without any random effects.

laplace.residuals

boolean - whether or not to calculate one-step ahead residuals using the method of oneStepPredict.

unconstrained.optim

boolean value. When TRUE then the optimization is carried out unconstrained i.e. without any of the parameter bounds specified during setParameter.

estimate.initial.state

boolean value. When TRUE the initial state and covariance matrices are estimated as the stationary solution of the linearized mean and covariance differential equations. When the system contains time-varying inputs, the first element of these is used.

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.

Method `likelihood()`

Construct and extract function handlers for the negative log likelihood function.

The handlers from TMB's MakeADFun are constructed and returned. This enables the user to e.g. choose their own optimization algorithm, or just have more control of the optimization workflow.

Usage

ctsmTMB$likelihood(
  data,
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  loss = "quadratic",
  loss_c = NULL,
  estimate.initial.state = FALSE,
  silent = FALSE
)

Arguments

data

a data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

method

character vector specifying the filtering method used for state/likelihood calculations. Must be one of either "lkf", "ekf", "laplace".

ode.solver

ode.timestep

the time-step used in the filtering schemes. The time-step has two different uses depending on the chosen method.

Kalman Filters: The time-step is used when numerically solving the moment differential equations.
Laplace Approximation: The time-step is used in the Euler-Maruyama simulation scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided data. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:

Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

loss

Quadratic loss (quadratic).
Quadratic-Linear (huber)
Quadratic-Constant (tukey)

loss_c

cutoff value for huber and tukey loss functions. Defaults to c=3

estimate.initial.state

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.

Method `predict()`

Perform prediction/filtration to obtain state mean and covariance estimates. The predictions are obtained by solving the moment equations n.ahead steps forward in time when using the current step posterior state estimate as the initial condition.

Usage

ctsmTMB$predict(
  data,
  pars = NULL,
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  k.ahead = nrow(data) - 1,
  return.k.ahead = 0:k.ahead,
  return.covariance = TRUE,
  initial.state = self$getInitialState(),
  estimate.initial.state = private$estimate.initial,
  silent = FALSE,
  use.cpp = FALSE
)

Arguments

data

data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

pars

fixed parameter vector parsed to the objective function for prediction/filtration. The default parameter values used are the initial parameters provided through setParameter, unless the estimate function has been run, then the default values will be those at the found optimum.

method

The prediction method

ode.solver

ode.timestep

Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

k.ahead

integer specifying the desired number of time-steps (as determined by the provided data time-vector) for which predictions are made (integrating the moment ODEs forward in time without data updates).

return.k.ahead

numeric vector of integers specifying which n.ahead predictions to that should be returned.

return.covariance

boolean value to indicate whether the covariance (instead of the correlation) should be returned.

initial.state

a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state

estimate.initial.state

bool - stationary estimation of initial mean and covariance

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.

use.cpp

a boolean to indicate whether to use C++ to perform calculations

Returns

A data.frame that contains for each time step the posterior state estimate at that time.step (k = 0), and the prior state predictions (k = 1,...,n.ahead). If return.covariance = TRUE then the state covariance/correlation matrix is returned, otherwise only the marginal variances are returned.

Method `simulate()`

Usage

ctsmTMB$simulate(
  data,
  pars = NULL,
  use.cpp = FALSE,
  method = "ekf",
  ode.solver = "rk4",
  ode.timestep = diff(data$t),
  simulation.timestep = diff(data$t),
  k.ahead = nrow(data) - 1,
  return.k.ahead = 0:k.ahead,
  n.sims = 100,
  initial.state = self$getInitialState(),
  estimate.initial.state = private$estimate.initial,
  silent = FALSE
)

Arguments

data

data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

pars

use.cpp

a boolean to indicate whether to use C++ to perform calculations

method

The natural TMB-style formulation where latent states are considered random effects and are integrated out using the Laplace approximation. This method only yields the gradient of the (negative log) likelihood function with respect to the fixed effects for optimization. The method is slower although probably has some precision advantages, and allows for non-Gaussian observation noise (not yet implemented). One-step / K-step residuals are not yet available in the package.
(Continuous-Discrete) Extended Kalman Filter where the system dynamics are linearized to handle potential non-linearities. This is computationally the fastest method.
(Continuous-Discrete) Unscented Kalman Filter. This is a higher order non-linear Kalman Filter which improves the mean and covariance estimates when the system display high nonlinearity, and circumvents the necessity to compute the Jacobian of the drift and observation functions.

All package features are currently available for the kalman filters, while TMB is limited to parameter estimation. In particular, it is straight-forward to obtain k-step-ahead predictions with these methods (use the predict S3 method), and stochastic simulation is also available in the cases where long prediction horizons are sought, where the normality assumption will be inaccurate

ode.solver

ode.timestep

Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.
TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

simulation.timestep

timestep used in the euler-maruyama scheme

k.ahead

integer specifying the desired number of time-steps (as determined by the provided data time-vector) for which predictions are made (integrating the moment ODEs forward in time without data updates).

return.k.ahead

numeric vector of integers specifying which n.ahead predictions to that should be returned.

n.sims

number of simulations

initial.state

a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state

estimate.initial.state

bool - stationary estimation of initial mean and covariance

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.

return.covariance

boolean value to indicate whether the covariance (instead of the correlation) should be returned.

Returns

Method `print()`

Function to print the model object

Usage

ctsmTMB$print()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

ctsmTMB$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

library(ctsmTMB)
model <- ctsmTMB$new()

# adding a single system equations
model$addSystem(dx ~ theta * (mu+u-x) * dt + sigma_x*dw)

# adding an observation equation and setting variance
model$addObs(y ~ x)
model$setVariance(y ~ sigma_y^2)

# add model input
model$addInput(u)

# add parameters
model$setParameter(
  theta   = c(initial = 1, lower=1e-5, upper=50),
  mu      = c(initial=1.5, lower=0, upper=5),
  sigma_x = c(initial=1, lower=1e-10, upper=30),
  sigma_y = 1e-2
)

# set the model initial state
model$setInitialState(list(1,1e-1))

# extract the likelihood handlers
nll <- model$likelihood(data=Ornstein)
#> Building model...
#> Checking data...
#> Constructing objective function...
#> ...took: 0.093 seconds.
#> Succesfully returned function handlers

# calculate likelihood, gradient and hessian w.r.t parameters
nll$fn(nll$par)
#> [1] 159.3085
nll$gr(nll$par)
#>          [,1]      [,2]      [,3]
#> [1,] 30.81298 -38.65746 -170.1326
nll$he(nll$par)
#>           [,1]      [,2]      [,3]
#> [1,]  46.88374 -66.66397 -61.73576
#> [2,] -66.66397  19.99982  77.31402
#> [3,] -61.73576  77.31402 908.18058

# estimate the parameters using an extended kalman filter
fit <- model$estimate(data=Ornstein)
#> Checking data...
#> Minimizing the negative log-likelihood...
#>   0:     159.30847:  1.00000  1.50000  1.00000
#>   1:     154.15990: 0.826082  1.71820  1.96028
#>   2:     136.01211: 0.614019  2.04466  1.64652
#>   3:     129.29879: 0.625113  2.06221  1.47868
#>   4:     128.42258: 0.666452  2.12632  1.14915
#>   5:     122.27008: 0.825282  2.37200  1.31892
#>   6:     106.61489:  1.59116  3.48574  1.25982
#>   7:     101.72505:  2.18624  3.49501  1.18917
#>   8:     99.610055:  2.43782  3.00283 0.957522
#>   9:     91.160123:  3.00760  2.93833  1.13182
#>  10:     87.333934:  3.60427  2.95994  1.07975
#>  11:     86.392347:  3.62036  3.05856  1.06026
#>  12:     85.804273:  4.02690  3.03800  1.04788
#>  13:     85.776558:  4.04412  3.02977  1.06498
#>  14:     85.771776:  4.04182  3.03209  1.06035
#>  15:     85.771772:  4.04188  3.03208  1.06018
#>  16:     85.771772:  4.04186  3.03210  1.06019
#>  17:     85.771772:  4.04186  3.03210  1.06019
#> 	 Optimization finished!:
#>             Elapsed time: 0.007 seconds.
#>             The objective value is: 8.577177e+01
#>             The maximum gradient component is: 2.7e-05
#>             The convergence message is: relative convergence (4)
#>             Iterations: 17
#>             Evaluations: Fun: 25 Grad: 18
#>             See stats::nlminb for available tolerance/control arguments.
#> Returning results...
#> Finished!

# perform moment predictions
pred <- model$predict(data=Ornstein)
#> Checking data...
#> Predicting with R...
#> Returning results...
#> Finished!

# perform stochatic simulations
sim <- model$simulate(data=Ornstein, n.sims=10)
#> Checking data...
#> Simulating with R...
#> Constructing return data.frame...
#> Finished.

Methods for the 'ctsmTMB' R6 class

Value

Methods

Public methods

Method new()

Usage

Method .private()

Usage

Method addSystem()

Usage

Arguments

Method addObs()

Usage

Arguments

Method setVariance()

Usage

Arguments

Method addInput()

Usage

Arguments

Method setParameter()

Usage

Arguments

Method setAlgebraics()

Usage

Arguments

Method setInitialState()

Usage

Arguments

Method setInitialVarianceScaling()

Usage

Arguments

Method setLamperti()

Usage

Arguments

Method setModelname()

Usage

Arguments

Method setMAP()

Usage

Arguments

Method getSystems()

Usage

Method getObservations()

Usage

Method getVariances()

Usage

Method getAlgebraics()

Usage

Method getInitialState()

Usage

Method getParameters()

Usage

Arguments

Method getEstimate()

Usage

Method getLikelihood()

Usage

Method getPrediction()

Usage

Method getSimulation()

Usage

Method estimate()

Usage

Arguments

Method likelihood()

Usage

Arguments

Method predict()

Usage

Arguments

Returns

Method simulate()

Usage

Arguments

Returns

Method print()

Usage

Method clone()

Usage

Method `new()`

Method `.private()`

Method `addSystem()`

Method `addObs()`

Method `setVariance()`

Method `addInput()`

Method `setParameter()`

Method `setAlgebraics()`

Method `setInitialState()`

Method `setInitialVarianceScaling()`

Method `setLamperti()`

Method `setModelname()`

Method `setMAP()`

Method `getSystems()`

Method `getObservations()`

Method `getVariances()`

Method `getAlgebraics()`

Method `getInitialState()`

Method `getParameters()`

Method `getEstimate()`

Method `getLikelihood()`

Method `getPrediction()`

Method `getSimulation()`

Method `estimate()`

Method `likelihood()`

Method `predict()`

Method `simulate()`

Method `print()`

Method `clone()`