index.Rmd

---
title: "Interrupted Times Series Regression"
author: "Chuck Cleland"
date: '`r format(Sys.time(), "%B %d, %Y")`'
output:
  html_document:
    toc: true
    toc_float: 
      collapsed: false
    toc_depth: 4
    df_print: paged
    theme: simplex
    highlight: textmate
---  

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, comment=NA, message=FALSE, warning=FALSE, fig.align = 'center', fig.width = 12, fig.height = 10)
options(width=110)
```

This illustration is based on the following paper and its supplementary material:

Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: a tutorial. *International journal of epidemiology*, 46(1), 348-355.

<https://academic.oup.com/ije/article/46/1/348/2622842>

# Load Required R Packages
```{r}
library(lmtest)
library(vcd)
library(Epi)
library(tsModel)
library(splines)
library(tidyverse)
```

# Read in Comma-Delimited Data
```{r}
data <- read.csv("sicily.csv")
data %>%
  knitr::kable()
```

This dataset includes the following variables:

 * **year**
 * **month**
 * **time** = elapsed time since the start of the study
 * **aces** = count of acute coronary episodes in Sicily per month (the outcome)
 * **smokban** = smoking ban (the intervention) coded 0 before intervention, 1 after
 * **pop** = the population of Sicily (in 10000s)
 * **stdpop** =  age standardised population

# Descriptive Analysis
Examining the data is an important first step.
Looking at the pre-intervention trend can give an indication of how stable the trend is over time, whether a linear model is likely to be appropriate, and whether there appears to be a seasonal trend.

## Scatter plot
```{r}
data$rate <- with(data, aces/stdpop*10^5)

plot(data$rate,type="n",ylim=c(00,300),xlab="Year", ylab="Std rate x 10,000", bty="l",xaxt="n")
rect(36,0,60,300,col=grey(0.9),border=F)
points(data$rate[data$smokban==0],cex=0.7)
axis(1,at=0:5*12,labels=F)
axis(1,at=0:4*12+6,tick=F,labels=2002:2006)
title("Sicily, 2002-2006")
```

## Summary Statistics
```{r}
data %>%
  gather(Variable, Value) %>%
  group_by(Variable) %>%
  summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()
```

## Acute Coronary Event Rates Before and After Smoking Ban
```{r}
data %>%
  mutate(Period = case_when(smokban == 0 ~ "1. Before Ban",
                            smokban == 1 ~ "2. After Ban")) %>%
  select(Period, aces, rate) %>%
  gather(Variable, Value, -Period) %>%
  group_by(Period, Variable) %>%
    summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()
```

# Poisson Regression Model
In step 2 (main paper) we chose a step change model and we also used a Poisson model as we are using count data.
In order to do this we model the count data directly (rather than the rate which doesn't follow a Poisson distribution), using the population (log transformed) as an offset variable in order to transform back to rates.

```{r}
model1 <- glm(aces ~ offset(log(stdpop)) + smokban + time, family=poisson, data)
summary(model1)
summary(model1)$dispersion
round(ci.lin(model1,Exp=T),3)
```

* The exponentiated intercept is the ACE rate in December 2001, and in this example is about 200 per 10,000. 
* The exponentiated smokban effect is a rate ratio, and in this example the ban multiplied the ACE rate by 0.89 (a reduction).
* The expected ACE rate after the ban would be 0.89 * 200 = 178 per 10,000.

## Add predictions from the Poisson model to the graph
```{r}
datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))

pred1 <- predict(model1, type="response", datanew) / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
title("Sicily, 2002-2006")
```

## To plot the counterfactual scenario we create a data frame as if smokban (the intervention) had never been implemented
```{r}
datanew <- data.frame(stdpop=mean(data$stdpop),smokban=0,time=1:600/10,
  month=rep(1:120/10,5))

pred1b <- predict(model1, datanew, type="response") / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
lines(datanew$time, pred1b, col=2, lty=2)
title("Sicily, 2002-2006")
```

## Return the data frame to the scenario including the intervention
```{r}
datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))
```

# Methodological Issues
## Overdispersion: Quasi-Poisson model 

In the model above we have not allowed for overdispersion - in order to do this we can use a quasipoisson model, which allows the variance to be proportional rather than equal to the mean.

```{r}
model2 <- glm(aces ~ offset(log(stdpop)) + smokban + time, 
              family=quasipoisson, data)
summary(model2)
summary(model2)$dispersion
round(ci.lin(model2,Exp=T),3)
```

## Model checking and autocorrelation
### Check the residuals by plotting against time
```{r}
res2 <- residuals(model2,type="deviance")

plot(data$time, res2, ylim=c(-5,10), pch=19, cex=0.7, col=grey(0.6), main="Residuals over time", ylab="Deviance residuals", xlab="Date")

abline(h=0, lty=2, lwd=2)
```

### Further check for autocorrelation by examining the autocorrelation and partial autocorrelation functions
```{r}
acf(res2)
pacf(res2)
```

## Adjusting for seasonality
There are various ways of adjusting for seasonality - here we use harmonic terms specifying the number of sin and cosine pairs to include (in this case 2) and the length of the period (12 months)

```{r}
model3 <- glm(aces ~ offset(log(stdpop)) + smokban + time + harmonic(month,2,12), 
              family=quasipoisson, data)
summary(model3)
summary(model3)$dispersion
round(ci.lin(model3,Exp=T),3)
```

### EFFECTS
```{r}
ci.lin(model3,Exp=T)["smokban",5:7]
```

* Taking into account seasonality of the ACE rate, the smoking ban effect was a rate ratio of 0.885 (slightly bigger reduction)

### TREND
```{r}
exp(coef(model3)["time"]*12)
```

* Taking into account seasonality of the ACE rate, the ACE rate was being multiplied by about 1.07 each year, a long-term temporal trend.

### We again check the model and autocorrelation functions
```{r}
res3 <- residuals(model3,type="deviance")
plot(res3,ylim=c(-5,10),pch=19,cex=0.7,col=grey(0.6),main="Residuals over time",
  ylab="Deviance residuals",xlab="Date")
abline(h=0,lty=2,lwd=2)
acf(res3)
pacf(res3)
```

### Predict and plot of the seasonally adjusted model
```{r}
pred3 <- predict(model3, type="response", datanew) / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
title("Sicily, 2002-2006")
```

It is sometimes difficult to clearly see the change graphically in the seasonally adjusted model, therefore it can be useful to plot a straight line representing a 'deseasonalised' trend this can be done by predicting all the observations for the same month, in this case we use June.

```{r}
pred3b <- predict(model3, type="response", transform(datanew,month=6)) / mean(data$stdpop) * 10^5
```

### This can then be added to the plot as a dashed line
```{r}
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
lines(1:600/10, pred3b, col=2, lty=2)
title("Sicily, 2002-2006")
```

# Additional Material
## Add a change-in-slope
## We parameterize it as an interaction between time and the ban indicator
```{r}
model4 <- glm(aces ~ offset(log(stdpop)) + smokban*time + harmonic(month,2,12),
  family=quasipoisson, data)
summary(model4)
round(ci.lin(model4,Exp=T),3)
```

## Predict and plot the 'deseasonalised' trend and compare it with the step-change only model
```{r}
pred4b <- predict(model4, type="response", transform(datanew, month=6)) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3b, col=2)
lines(1:600/10, pred4b, col=4)
title("Sicily, 2002-2006")
legend("topleft", 
       c("Step-change only", "Step-change + change-in-slope"),
       lty=1, col=c(2,4), inset=0.05, bty="n", cex=0.7)
```

Test if the change-in-slope improve the fit the selected test here is an F-test, which accounts for the overdispersion, while in other cases a likelihood ratio or wald test can be applied

```{r}
anova(model3, model4, test="F") %>%
  knitr::kable()
```

* Not surprisingly, the p-value is similar to that of the interaction term.
* The ban reduced the ACE rate, but did not change the longer-term temporal trend in the rate (it continued to grow over time).