Time-series analysis allows us to investigate trends over time at a population level and how two or more series co-vary, as well as the impact of population level policies and interventions. Several analyses are available which take into account seasonality and autocorrelation (where measurements taken closer in time are more similar). One popular analysis is ARIMA (and ARIMAX) modelling. A major assumption of these is that of stationarity which means that the mean, variance and autocorrelation structure do not change over time. For this to be achieved, underlying trends must be removed through what is known as differencing (and perhaps a transformation). Differencing involves using the difference between a value and the one for the time period immediately previous to it. Example data is shown in the graphs above before and after differencing. ARIMA then assesses how one series changes over time as a function of another without confounding from any underlying changes.

]]>

Causality (also referred to as causation or cause and effect) occurs when a process is responsible for a second process (or state). In other words, something is causal if it gives rise under certain conditions to something else.

Causality can only really be readily inferred from randomised controlled trials (RCTs) i.e where participants are randomly assigned to either the intervention or control group and strict procedures are put in place, including blinding of participants and research staff. Causality is difficult to infer from observational studies due to the inability to control both measured and unmeasured confounds.

There are ways to reduce these biases through statistical methods (e.g. regression adjustment, propensity score matching and multi-level modelling for cluster data) and study design (e.g. prospective data collection). In the BMJ paper we adjusted for population level policies which may have impacted on both the input and output series and removed underlying trends that may have created spurious associations. However, you can never be 100% confident that there is no residual confounding and that other biases haven't been introduced.

The English epidemiologist Sir Austin Bradford in 1965 devised a set of criteria for inferring causation from observational studies (these are now known as the Bradford Hill Criteria):

- Strength (effect size): the larger the association the more likely it is to be causal
- Consistency (reproducibility): if the observation is observed in different studies this strengthens the likelihood of an effect
- Specificity: the more specific an association, i.e. lack of another explanation, the bigger the probability of causality
- Temporality: the effect has to occur after the cause
- Biological gradient: greater exposure should lead to greater incidence of an effect
- Plausibility: the mechanism between cause and effect is plausible
- Coherence: similar findings in RCTs are found
- Experiment: experimental studies shows that removal of the input variable leads to removal of the output variable (or at least a decrease)
- Analogy: is there similar evidence from analogous situations?

Is the association reported in the BMJ causal on the basis of these? The answer is perhaps so. The strength of the association was moderate (for every 10% increase in use of e-cigarettes prevalence of successful quit attempts increased by 0.58%), other observational studies and RCTs have found an association between use of e-cigarettes and quitting behaviour, I am not aware of other more plausible explanations for the association and the association appears to be temporal in nature (see discussion below on Granger causality).

Although these criteria are widely used they have also been heavily criticised. To be fair, Hill never intended them to be necessary nor sufficient to infer causation, but they have been taught in that way on both graduate and undergraduate courses.

Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. According to Granger, a series X causes Y if it can be shown that X values provide statistically significant information about future values of Y. The Granger causality test is now commonly applied during time series analysis if one is using ARIMAX modelling (the technique adopted in the BMJ paper). This is because ARIMAX has the assumption of weak exogeneity. That is, that Y can depend on lagged values of X but the reverse must not be true, i.e. X cannot depend on lagged values of Y. In other words, there should be no feedback from the output series. In our paper, this assumption was not violated, we found that prevalence of e-cigarette use was good at predicting prevalence of quit success, but that quit success was not good at predicting prevalence of e-cigarette use.

Does this imply causation? Yes, if we assume that for causality to be present one only needs to prove temporality. Inferring causality from temporality according to Hume (1748-1975) is a natural psychological phenomenon. Numerous studies have shown that people are more inclined to draw causal conclusions if Y occurs after x, and that causal inference and event timing are tightly coupled (see the work of Sloman, Rottman and Lagnado). Many, including Rothman and Greenland (1998), place a great deal of emphasis on temporality in assessing causality, seeing it as a necessity and possibly sufficient. For those who argue that temporality is not enough, the Granger test only gives “predictive” causality.

If we can infer predictive causality from our time series analysis, what does this mean? It means, that we can use the statistical models to make valid predictions about the pattern of time series data in the future. For example, if we were to hypothetically assume that e-cigarette use will increase to 50% among smokers, we could project the likely prevalence of successful quit attempts. It may be possible then to infer causation with greater confidence if our predicted value matches the actual value obtained (assuming the 50% prevalence occurs). However, a mediating variable cannot be ruled out. A rise in e-cigarette use may cause a change in time series m which itself causes a change in successful quitting activity. If this is the case, we don't have direct causality but mediated causality.

]]>

This study used a methodology called time-series analysis. The aim of this blog is to provide a brief introduction to this type of analysis for those who are unfamiliar.

Time-series data generally occur when measurements of a variable(s) are taken over a period of time, most often at regular intervals (e.g., month, year or quarter). An example is the data used in the BMJ article which was collected on a monthly basis from 2006 to 2015. Such data allow us to assess interventions and associations between behaviours in a quasi-experimental manner. This is important for a number of reasons. First, RCTs (randomised controlled trials) can take many years to run, are expensive and often require a large amount of resources. Secondly, it is important to determine 'effectiveness' as well as 'efficacy' i.e. to assess use and impact in the real world. Thirdly, it is often unethical to assign people to certain conditions (such as smoking versus non-smoking) and so behaviours must be studied in an ecological manner. Fourthly, it is often of interest to assess associations while adjusting for the impact of population level policies and interventions.

Most research does not use time-series data. This might be due to its lack of availability, the fact that you need a large number of data points (some have suggested 50-100 months worth) or perhaps as the analysis is complex.

Statistical methods used to analyse time-series data needs to take into account underlying trends, seasonality (for example, smoking is often higher in summer months and drinking in winter months), and the internal structure of the data. The latter of which involves things such as the presence of autocorrelation. In very simple terms, autocorrelation occurs when measurements taken closer in time tend to be more similar. For example, if I were to ask you your weight today and tomorrow and in a years time, it is likely that the measurements taken today and tomorrow will be more similar. This is obvious, weight fluctuates in many people over a 12 month period.

We have statistical methods which can take these things into account. The one used in the BMJ study was called autoregressive integrated moving average modelling with exogenous variables (ARIMAX). The autoregessive and moving average components relate to the internal structure of the data (i.e. the autocorrelation discussed above), integrated refers to the part of the model taking into account the underlying trends (it does this by removing them), while exogenous indicates that we are including another variable to predict our outcome variable and both are time-series data (for example, we will use e-cigarette use to predict quit attempts).

In the near future I hope to be publishing a methodological paper on this. I will keep you posted.]]>

Full published paper |