Extracting the Likelihood Function and Changing Optimizer

In this document we show how to use the likelihood method to obtain function handlers for the objective function, and gradient, (and hessian if using a Kalman filter), for instance to use another optimization algorithm than stats::nlminb.

Simulate from the Ornstein-Uhlenbeck process

We use the common Ornstein-Uhlenbeck process to showcase the use of likelihood.

$\mathrm{d}X_{t} = \theta (\mu - X_{t}) \, \mathrm{d}t \, + \sigma_{X} \, \mathrm{d}B_{t}$

$Y_{t_{k}} = X_{t_{k}} + e_{t_{k}}, \qquad e_{t_{k}} \sim \mathcal{N}\left(0,\sigma_{Y}^{2}\right)$ We first create data by simulating the process

# Simulate data using Euler Maruyama
set.seed(10)
theta=10; mu=1; sigma_x=1; sigma_y=1e-1
# 
dt.sim = 1e-3
t.sim = seq(0,1,by=dt.sim)
dw = rnorm(length(t.sim)-1,sd=sqrt(dt.sim))
x = 3
for(i in 1:(length(t.sim)-1)) {
  x[i+1] = x[i] + theta*(mu-x[i])*dt.sim + sigma_x*dw[i]
}

# Extract observations and add noise
dt.obs = 1e-2
t.obs = seq(0,1,by=dt.obs)
y = x[t.sim %in% t.obs] + sigma_y * rnorm(length(t.obs))

# Create data
.data = data.frame(
  t = t.obs,
  y = y
)

Construct model object

We now construct the ctsmTMB model object

# Create model object
obj = ctsmTMB$new()

# Add system equations
obj$addSystem(
  dx ~ theta * (mu-x) * dt + sigma_x*dw
)

# Add observation equations
obj$addObs(
  y ~ x
)

# Set observation equation variances
obj$setVariance(
  y ~ sigma_y^2
)

# Specify algebraic relations
obj$setAlgebraics(
  theta ~ exp(logtheta),
  sigma_x ~ exp(logsigma_x),
  sigma_y ~ exp(logsigma_y)
)

# Specify parameter initial values and lower/upper bounds in estimation
obj$setParameter(
  logtheta   = log(c(initial = 5,    lower = 0,    upper = 20)),
  mu         = c(    initial = 0,    lower = -10,  upper = 10),
  logsigma_x = log(c(initial = 1e-1, lower = 1e-5, upper = 5)),
  logsigma_y = log(c(initial = 1e-1, lower = 1e-5, upper = 5))
)

# Set initial state mean and covariance
obj$setInitialState(list(x[1], 1e-1*diag(1)))

Estimation

We are in principle ready to call the estimate method to run the optimization scheme using the built-in optimization which uses stats::nlminb i.e.

fit = obj$estimate(.data)

## Checking model components...

## Checking and setting data...

## Constructing objective function and derivative tables...

## Minimizing the negative log-likelihood...

##   0:     921.77031:  1.60944  0.00000 -2.30259 -2.30259
##  10:    -38.611349:  2.53386  1.03053 0.151828 -2.34627

##   Optimization finished!:
##             Elapsed time: 0.004 seconds.
##             The objective value is: -3.934664e+01
##             The maximum gradient component is: 1.3e-04
##             The convergence message is: relative convergence (4)
##             Iterations: 19
##             Evaluations: Fun: 27 Grad: 20
##             See stats::nlminb for available tolerance/control arguments.

## Returning results...

## Finished!

Inside the package we optimise the objective function with respect to the fixed parameters using the construction function handlers from TMB::MakeADFun and parsing them to stats::nlminb i.e.

nll = TMB::MakeADFun(...)
opt = stats::nlminb(start=nll$par, objective=nll$fn, grad=nll$gr, hessian=nll$he)

Extract function handlers

The likelihood method allows you to retrieve the nll object that holds the negative log-likelihood, and its derivatives. The method takes arguments similar to those of estimate.

nll = obj$likelihood(.data)

## Finished!

The initial parameters (supplied by the user) are stored here

nll$par

##   logtheta         mu logsigma_x logsigma_y 
##   1.609438   0.000000  -2.302585  -2.302585

The objective function can be evaluated by

nll$fn(nll$par)

## [1] 921.7703

The gradient can be evaluated by

nll$gr(nll$par)

##          [,1]      [,2]      [,3]      [,4]
## [1,] 1382.854 -1557.574 -1191.374 -820.9253

The hessian can be evaluated by

nll$he(nll$par)

##           [,1]       [,2]       [,3]       [,4]
## [1,]  2226.705 -2859.2069 -1636.0560 -1136.3013
## [2,] -2859.207  1663.1457  2251.5926   865.5807
## [3,] -1636.056  2251.5926   905.1284  1486.3607
## [4,] -1136.301   865.5807  1486.3607   346.7711

We can now use these to optimize the function using e.g. stats::optim instead.

Extract parameter lower/upper bounds

You can extract the parameter bounds specified when calling setParameter() method by using the getParameters method (note that nll$par and pars$initial are identical).

pars = obj$getParameters()
print(pars)

##            type   estimate   initial     lower     upper
## logtheta   free  2.8028461  1.609438      -Inf  2.995732
## mu         free  1.0774793  0.000000 -10.00000 10.000000
## logsigma_x free  0.2024819 -2.302585 -11.51293  1.609438
## logsigma_y free -2.3269231 -2.302585 -11.51293  1.609438

Optimize manually using `stats::optim`

We supply the initial parameter values, objective function handler and gradient handler, and parameter bounds to optim.

opt = stats::optim(par=nll$par, 
                   fn=nll$fn, 
                   gr=nll$gr, 
                   method="L-BFGS-B", 
                   lower=pars$lower, 
                   upper=pars$upper)

Compare results between the two optimizers

Lets compare the results from using stats::optim with the extracted function handler versus the internal optimisation that uses stats::nlminb stored in fit:

# Estimated parameters
data.frame(external=opt$par, internal=fit$par.fixed)

##              external   internal
## logtheta    2.8028428  2.8028461
## mu          1.0774788  1.0774793
## logsigma_x  0.2024796  0.2024819
## logsigma_y -2.3269222 -2.3269231

# Neg. Log-Likelihood
data.frame(external=opt$value, internal=fit$nll)

##    external  internal
## 1 -39.34664 -39.34664

# Gradient components
data.frame(external=t(nll$gr(opt$par)), internal=t(nll$gr(fit$par.fixed)))

##        external      internal
## 1 -5.284383e-05 -2.269770e-06
## 2 -1.491876e-04 -1.337658e-04
## 3 -4.592109e-05  2.628658e-05
## 4 -2.656532e-05 -2.239568e-06