Dynamic occupancy models

This R Markdown Notebook accompanies a SEEC Toolbox Seminars given by Res Altwegg and David Maphisa at the University of Cape Town on 28 November 2019. Use this notebook together with the slides and the recorded seminar.

Occupancy is the state variable describing where a species occurs and is a key quantity in ecology and conservation. The problem with occupancy is that it is usually not directly observable because non-detection of a species at a site does not establish absence of that species at this site. Two earlier SEEC Toolbox seminars showed how occupancy models can be used to deal with this problem by explicitly accounting for the detection probability. Occupancy models assume closure, i.e. that a site is either occupied or not for the entire duration of the study. However, unoccupied sites do get colonised and species can go extinct from formerly occupied sites. Such changes in occupancy are often the main study focus, i.e. interest is in the dynamics rather than just static patterns. In this Toolbox Seminar, we show how dynamic occupancy models can be used to examine changes in occupancy, by estimating colonsation and local extinction.

Simulation

We demonstrate the basic ideas using a simulated data set. This code is adapted from the unmarked Vignette on dynamic occupancy models, written by Marc Kéry and Richard Chandler. We simulate data for 250 sites that were visited twice per year for 5 years. Initial occupancy \(\Psi_1\) was 0.6. We drew values for the \(\phi_t\) (1 minus the extinction probability, \(\epsilon\)) from a uniform distribution U(0.5, 0.7), and the \(\gamma_t\) (colonisation probabilities) from U(0.1, 0.2). Extinction was thus quite a bit higher than colonisation. We chose some arbitrary values for the detection probabilities, \(p_t\).

M <- 250 # Number of sites
J <- 2 # num secondary sample periods
T <- 5 # num primary sample periods
psi <- rep(NA, T) # Occupancy probability
muZ <- z <- array(dim = c(M, T)) # Expected and realized occurrence
y <- array(NA, dim = c(M, J, T)) # Detection histories
set.seed(13973) 
psi[1] <- 0.6 # Initial occupancy probability
p <- c(0.3,0.1,0.5,0.5,0.1) 
phi <- runif(n=T-1, min=0.5, max=0.7) # Survival probability (1-epsilon)

gamma <- runif(n=T-1, min=0.1, max=0.2) # Colonization probability 
# Generate latent states of occurrence 
# First year 
z[,1] <- rbinom(M, 1, psi[1])
# Initial occupancy state
# Later years 
for(i in 1:M){
  # Loop over sites
  for(k in 2:T){
    # Loop over years
    muZ[k] <- z[i, k-1]*phi[k-1] + (1-z[i, k-1])*gamma[k-1] 
    z[i,k] <- rbinom(1, 1, muZ[k])
  } }
# Generate detection/non-detection data 
 for(i in 1:M){ for(k in 1:T){ prob <- z[i,k] * p[k] 
 for(j in 1:J){ y[i,j,k] <- rbinom(1, 1, prob)
  } } }
# Compute annual population occupancy 
 for (k in 2:T){ psi[k] <- psi[k-1]*phi[k-1] + (1-psi[k-1])*gamma[k-1] }

Then, we need to get the data into the correct format for the colext function in unmarked. The unmarked package has a really useful helper function for that, unmarkedMultFrame.

## Loading required package: lattice
## Loading required package: parallel
## Loading required package: Rcpp
## Loading required package: reshape2
yy <- matrix(y, M, J*T)

year <- matrix(c(' 01' ,' 02' ,' 03' ,' 04' ,' 05'), nrow(yy), T, byrow=TRUE)

head(yy)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    0    0    0    0    0    0    0    0    0     0
## [2,]    0    1    0    0    0    0    0    1    0     0
## [3,]    0    0    0    0    0    0    0    0    0     0
## [4,]    1    0    0    0    0    0    0    0    0     0
## [5,]    0    0    0    0    0    0    0    0    0     0
## [6,]    0    0    0    0    0    0    0    0    0     0
dim(yy)
## [1] 250  10
simUMF <- unmarkedMultFrame( y = yy, 
                             yearlySiteCovs = list(year = year), 
                             numPrimary=5)
summary(simUMF)
## unmarkedFrame Object
## 
## 250 sites
## Maximum number of observations per site: 10 
## Mean number of observations per site: 10 
## Number of primary survey periods: 5 
## Number of secondary survey periods: 2 
## Sites with at least one detection: 136 
## 
## Tabulation of y observations:
##    0    1 
## 2242  258 
## 
## Yearly-site-level covariates:
##   year    
##   01:250  
##   02:250  
##   03:250  
##   04:250  
##   05:250

Fitting a model with constant parameters

The first model we fit assumes constant \(\Psi\), \(\gamma\), \(\epsilon\) and \(p\).

# Model with all constant parameters 
m0 <- colext(psiformula= ~1, 
             gammaformula = ~ 1, 
             epsilonformula = ~ 1, 
             pformula = ~ 1, 
             data = simUMF, method="BFGS")
summary(m0)
## 
## Call:
## colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, 
##     pformula = ~1, data = simUMF, method = "BFGS")
## 
## Initial (logit-scale):
##  Estimate    SE       z P(>|z|)
##  -0.00337 0.236 -0.0143   0.989
## 
## Colonization (logit-scale):
##  Estimate    SE     z  P(>|z|)
##      -1.7 0.195 -8.71 3.01e-18
## 
## Extinction (logit-scale):
##  Estimate    SE     z P(>|z|)
##     0.214 0.318 0.672   0.502
## 
## Detection (logit-scale):
##  Estimate   SE     z  P(>|z|)
##    -0.654 0.17 -3.86 0.000115
## 
## AIC: 1556.162 
## Number of sites: 250
## optim convergence code: 0
## optim iterations: 46 
## Bootstrap iterations: 0

We see that the estimate for \(\Psi_1\), occupancy in the first year, is -0.00337. How is that related to the 0.6 we used in the simulation? This estimate is on the logit scale so we have to use the inverse logit function to transform it back to the probability scale. This can easily be done ‘by hand’ but unmarked has helperfunctions that become very convenient when backtransformation is a bit more complicated. Using this function, it is also easy to get confidence intervals.

# parameter estimates

exp(-0.00337)/(1+exp(-0.00337))
## [1] 0.4991575
names(m0)
## [1] "psi" "col" "ext" "det"
backTransform(m0, type="psi")
## Backtransformed linear combination(s) of Initial estimate(s)
## 
##  Estimate     SE  LinComb (Intercept)
##     0.499 0.0591 -0.00337           1
## 
## Transformation: logistic
confint(backTransform(m0, type="psi"))
##      0.025     0.975
##  0.3854436 0.6129562

We estimated occupancy in the first year, \(\Psi_1\). But what about occupancy probabilities in subsequent years? They are derived parameters, calculated as \(\Psi_t = \Psi_{t-1} \times (1-\epsilon_{t-1}) + (1-\Psi_{t-1}) \times \gamma_{t-1}\).

We see that we need the estimates for \(\gamma\) and \(\epsilon\).

# derived parameters
backTransform(m0, type="ext")
## Backtransformed linear combination(s) of Extinction estimate(s)
## 
##  Estimate     SE LinComb (Intercept)
##     0.553 0.0786   0.214           1
## 
## Transformation: logistic
backTransform(m0, type="col")
## Backtransformed linear combination(s) of Colonization estimate(s)
## 
##  Estimate     SE LinComb (Intercept)
##     0.154 0.0255    -1.7           1
## 
## Transformation: logistic

When fitting a dynamic occupancy model with the colext function, the output already has the derived parameter estimates for the \(\Psi_t\). They are stored in the slot projected.mean.

m0@projected.mean
##                    1         2         3         4        5
## unoccupied 0.5008437 0.6997875 0.7580149 0.7750571 0.780045
## occupied   0.4991563 0.3002125 0.2419851 0.2249429 0.219955

What about confidence intervals? unmarked has a function that uses parametric bootstrap to work out confidence intervals. You need to choose the number of iterations (B=...), which should definitely be more than 10 but this can take a lot of time… For a publication, you probably want closer to 1000 iterations.

# confidence intervals for projected occupancy

m0 <- nonparboot(m0, B = 10)
cbind(projected=projected(m0)[2,], SE=m0@projected.mean.bsse[2,])
##   projected         SE
## 1 0.4991563 0.03243629
## 2 0.3002125 0.02345218
## 3 0.2419851 0.01802916
## 4 0.2249429 0.01764645
## 5 0.2199550 0.01811982

We saw that extinction probabilities are higher than colonisation probabilities. Does this mean that occupancy will keep declining and the species will eventually go extinct? Not necessarily. As fewer and fewer sites are occupied, there are fewer sites left from which the species can go extinct. On the other hand, there number of sites that are unoccupied and could be colonised increases. With constant extinction and colonisation probabilities, the expected number of sites from which the species goes extinct and the expected number of sites that it colonises will eventually be equal to each other. So occupancy will reach an equilibrium of sorts. It is given by \(\Psi_E = \frac{\gamma}{\gamma + \epsilon}\).

# equilibrium occupancy
# --------------------

backTransform(m0, type="col")@estimate/(backTransform(m0, type="col")@estimate + backTransform(m0, type="ext")@estimate)
## [1] 0.217891

So estimated occupancy in year 5 is actually rather close to the equilibrium occupancy probability and if \(\epsilon\) and \(\gamma\) remain constant, we would not expect occupancy to change much further over time.

A model with year-dependent extinction, colonisation and detection probabilities

When we simulated the data, we used time-varying extinction, colonisation and detection probabilities. Let’s fit the model that corresponds to the one we used to simulate the data.

# Model with gamma, epsilon and p year specific 
# ------------------------------------
m1 <- colext(psiformula= ~1, 
             gammaformula = ~ year, 
             epsilonformula = ~ year, 
             pformula = ~ year, 
             data = simUMF, method="BFGS")
summary(m1)
## 
## Call:
## colext(psiformula = ~1, gammaformula = ~year, epsilonformula = ~year, 
##     pformula = ~year, data = simUMF, method = "BFGS")
## 
## Initial (logit-scale):
##  Estimate    SE    z P(>|z|)
##      1.31 0.879 1.49   0.137
## 
## Colonization (logit-scale):
##             Estimate   SE       z P(>|z|)
## (Intercept)    -5.50 11.9 -0.4616   0.644
## year 02         1.72 19.7  0.0871   0.931
## year 03         3.10 11.9  0.2597   0.795
## year 04         5.07 12.0  0.4233   0.672
## 
## Extinction (logit-scale):
##             Estimate    SE       z P(>|z|)
## (Intercept)   -1.026  1.72 -0.5968   0.551
## year 02        0.876  2.21  0.3965   0.692
## year 03       -0.127  1.83 -0.0695   0.945
## year 04       -5.166 42.84 -0.1206   0.904
## 
## Detection (logit-scale):
##             Estimate    SE     z  P(>|z|)
## (Intercept)    -1.32 0.264 -4.99 5.98e-07
## year 02        -1.66 0.551 -3.02 2.54e-03
## year 03         1.19 0.375  3.17 1.51e-03
## year 04         1.32 0.387  3.40 6.64e-04
## year 05        -1.91 0.545 -3.50 4.74e-04
## 
## AIC: 1444.535 
## Number of sites: 250
## optim convergence code: 0
## optim iterations: 68 
## Bootstrap iterations: 0

What are the estimated occupancy probabilities over the five years (\(\Psi_t\)) and corresponding confidence intervals now?

# parameter estimates

m1@projected.mean
##                    1         2         3         4         5
## unoccupied 0.2129186 0.4198092 0.6789418 0.6996572 0.4244612
## occupied   0.7870814 0.5801908 0.3210582 0.3003428 0.5755388
m1 <- nonparboot(m1, B = 10)
cbind(projected=projected(m1)[2,], SE=m1@projected.mean.bsse[2,])
##   projected         SE
## 1 0.7870814 0.14062734
## 2 0.5801908 0.25232466
## 3 0.3210582 0.04982608
## 4 0.3003428 0.06936172
## 5 0.5755388 0.16971123

Which model is better? For some reason, we can’t use the familiar AIC function but unmarked has its own functions that calculate the model selection table.

AIC(m0,m1)
## Error in UseMethod("logLik"): no applicable method for 'logLik' applied to an object of class "c('unmarkedFitColExt', 'unmarkedFit')"
models <- fitList(' psi(.)gam(.)eps(.)p(.)' = m0,
                  ' psi(.)gam(Y)eps(Y)p(Y)' = m1)
(ms <- modSel(models))
##                         nPars     AIC  delta   AICwt cumltvWt
##  psi(.)gam(Y)eps(Y)p(Y)    14 1444.53   0.00 1.0e+00     1.00
##  psi(.)gam(.)eps(.)p(.)     4 1556.16 111.63 5.8e-25     1.00