This study develops an operational algorithm for the detection of the (re-)emergence of infectious disease. The authors illustrate its utility by successfully applying it to four (re-)emerging threats—mumps, pertussis, dengue and plague, providing early warning that could enable intervention measures.
Outbreaks of infectious diseases continue to surprise and evade public health control policy. This is due to a combination of (1) the reemergence of familiar vaccine-preventable infectious diseases, such as mumps , measles , and pertussis ; (2) the evolution of resistance to antimicrobials, including methicillin-resistant Staphylococcus aureus (MRSA) , malaria , and extensively drug-resistant tuberculosis (XDR TB) ; (3) pathogen range expansion driven by anthropogenic changes in land use  and climate ; and (4) the emergence of novel pathogens from a zoonotic reservoir, such as HIV , severe acute respiratory syndrome coronavirus (SARS-CoV) , and Ebola virus . In addition to their burden on human morbidity, mortality, and the associated social and economic toll, the existential threat posed by (re-)emerging infectious diseases is increasingly recognized .
To foreshadow such threats, field and laboratory approaches have focused on surveillance of potential zoonotic hosts , the detection of "viral chatter" in sequence data collected from putative emergence hotspots , laboratory characterization of viruses with pandemic potential , biogeographic approaches to identify risk zones , and the use of phylogenetics to pinpoint animal reservoirs . We submit that an important dimension to predicting pathogen (re-)emergence is to exploit epidemiological incidence reports. In reality, a diversity of mechanisms can drive increases in transmission that underpin disease emergence or resurgence. These include pathogen evolution leading to evasion of immunity [18, 19], host adaptation , immune waning , changes in population immune profile , environmental change , declining vaccine uptake , and changes in contacts . This mechanistic uncertainty, coupled with sparsity of data, impedes the prospects for inference-based forecasts (e.g., by fitting a transmission model). Previously, statistical approaches have been developed focusing on characteristics of the outbreak size distribution [2, 25–27]. Though promising, generalizing these methods requires overcoming the need for (1) a sufficiently large number of independent outbreaks for reliable statistical estimation and (2) well-defined transmission chains, which is often not possible. Here, we propose a mechanism-agnostic approach that harvests information contained in longitudinal epidemiological data.
In general, disease (re-)emergence requires a systematic increase in the expected number of secondary cases due to an infectious individual, which is quantified by the effective reproductive number (Reff ) . Specifically, as the threshold Reff = 1 is crossed, the system undergoes a transcritical bifurcation, and sustained chains of transmission become possible (Fig 1A). Dynamical systems theory identifies statistical footprints of such a critical transition ("critical slowing down") . These footprints are reflected in trends in the statistical moments of time series data, such as the autocorrelation and standard deviation [30, 31], as the transition is approached. Prior theoretical findings [30–32] and tests on simulated data [33, 34] support the premise of this approach and identify candidate statistical moments. The key challenge, however, is operationalizing these statistical features to serve as early warning signals (EWSs). In particular, given a time series, we need to (1) quantify emergence risk through time from a collection of EWS and (2) establish a threshold for detection of emergence. Here, we accomplish these by use of transfer learning, i.e., training a learning algorithm on simulated time series data to create a classifier that can subsequently detect emergence in incidence data (Fig 2; see Methods). The advantage of using a transfer learning approach is the identification of a generic measure of emergence risk that is robust to uncertainties in the underlying epidemiological dynamics.
We used a stochastic transmission model to generate 10,000 emerging and nonemerging time series (Fig 2B). To ensure robustness to parametric uncertainty, each time series was the result of a unique parameterization according to Latin hypercube sampling (Fig 2A; see S4 Table for ranges). For each trajectory, 8 time-varying EWSs () were calculated (Fig 2C). To classify disease emergence, logistic regression was carried out on the ensemble of EWSs to assign a weight to each signal (Fig 2D). We defined a summary measure of time-dependent emergence risk as the logistic transform of the weighted sum of our EWS, , with a range between 0 and 1 (see Methods for details). In this algorithm, emergence is predicted at any time t when Dt > c, where c is a threshold (Fig 2I). We identified this threshold by minimizing classification error, using the receiver-operator characteristic (ROC) curve (Fig 2E). We evaluated the performance of the detection algorithm as a function of lead time (the period of time before the outbreak) using the area under the ROC curve (AUC) statistic (Fig 2E). Further details of the learning algorithm are given in the Methods. The learning algorithm is designed such that the weighted EWS and detection threshold (Fig 2F–2I) may be applied to incidence data without further fitting.
We found that a composite of EWSs was a better predictor of emergence than any individual EWS (S3A Fig). Most weight was assigned to the skewness, the kurtosis, and coefficient of variation. Indeed, training on just these 3 features yields near-optimal performance (S4 Fig). Interestingly, individually, these are not the best performing EWS (in fact, the kurtosis and coefficient of variation are the 2 worst; S3A Fig). The learning algorithm exploits imbalances in these 3 EWS to detect emergence, assigning negative weight to the kurtosis and coefficient of variation (Fig 2H and S5 Table). A practical implication of this is that the performance of the algorithm is insensitive to population size (S3 Fig). That is, the outcome of application to case counts or incidence data (normalized by population size) is identical.
We examined whether performance of our algorithm was sensitive to mechanisms (e.g., waning immunity and pathogen evolution) underlying (re-)emergence that are associated with different patterns of increase of Reff (for details, see S1 Text). We retrained the learning algorithm on simulated datasets consisting of only (1) concave (d2Reff / dt2 < 0; waning immunity) and (2) convex (d2Reff / dt2 > 0; evolution) trends. Surprisingly, the EWS weights obtained by fitting to simulated data that comprised both mechanisms of increase performed best, comparable to training on concave/convex data alone (S5 and S6 Figs).
We performed a similar comparison using data simulated from a model with multiple time-varying parameters, in addition to Reff (which drives the transition), to assess their confounding effects (S7 Fig). As might be expected, the presence of covariates reduced performance; however, we again found that the EWS weights obtained by fitting to simulated data without such covariates performed comparably to the optimal fit with covariates.
To test the performance of our detection algorithm, we carried out 2 case studies on re-emerging vaccine-preventable childhood diseases. Our first challenge was to anticipate the reemergence of mumps. In England, infant mumps vaccination started in 1988 and coincided with a rapid reduction in incidence (Fig 1B). This period of low transmission was interrupted in 2004 to 2005 by outbreaks reported across the country, primarily among university-aged individuals . We examined whether our mechanism-agnostic approach could have anticipated these outbreaks.
We used the EWS weights (trained on simulated data) to calculate our emergence risk measure, Dt. We additionally explored the potential impact of spatial scale on the predictability of emergence by calculating Dt at both the national level (Fig 3A) and for each local authority (LA) (Fig 3B and S8–S16 Figs). At the national level, the 2004 to 2005 outbreak was successfully anticipated, with a lead time of approximately 4 years (Fig 3A). At the local level, from 1998 onwards, we observed an increasing number of LAs exceeding the detection threshold, shifting from a baseline average of 4.6 LAs per week before 2000 to 23 detections per week at the start of 2004 (S20 Fig).
We additionally categorized localities into those that experienced a sizeable outbreak in 2004 to 2005 and those with small epidemics (S17 Fig). As shown in Fig 3C, locations with small outbreaks had a much lower emergence detection frequency, which we interpret to mean a low false positive rate. To dissect whether differences across spatial scales may result from data aggregation alone, we generated a simulated time series for each LA assuming spatial independence, ensuring the number of emerging and nonemerging time series matched the number of LAs with large and small outbreaks, respectively. As shown in Fig 3D, there was qualitative agreement between simulated and mumps data. One discrepancy was that the number of LAs above the threshold in 2000 was lower for mumps. We speculate this is due to additional spatial heterogeneity in mumps transmission in England (beyond our emerging/nonemerging categorization), perhaps with a small set of LAs (e.g., those with large urban student populations) serving as foci.
Our measure of emergence risk should not be conflated with a prediction of future outbreak size or its imminence. Dt quantifies whether the system is approaching Reff = 1. The final outbreak size is determined by additional factors, such as the susceptible population size and the number initially infected . We found no association between Dt and epidemic size for large outbreak LAs (S18 Fig). Similarly, although there is no theoretically derived relationship between Dt and the time of an expected outbreak, we observed a negative association between an LA’s outbreak size and the detection time, defined as the last time that Dt < c prior to the outbreak (Spearman’s ρ = −0.66; S19 Fig). This may be because larger outbreaks occur in LAs with larger susceptible populations, which are more likely to experience repeated "sparks" prior to the outbreak, hence providing a more reliable probe of the system’s state.
Our second case study focused on the resurgence of pertussis in the US. In most states, pertussis incidence declined throughout the 1950s and 1960s until it reached a nadir in the mid-1970s . Since then, however, this trend has reversed. By the late 2000s, annual reported incidence in many states had reached levels not seen since the 1960s (Fig 4A). The mechanisms underlying this resurgence remain contested [21, 37, 38]. A striking feature of pertussis reemergence has been its geographic unevenness . In some states, reemergence did not take place until the mid-2000s (Fig 4C), whereas in others, resurgence occurred early, and incidence has plateaued (Fig 4B). We restrict our analysis to the period from 1980 to 2000. We were prevented from performing a similar analysis to mumps because of the substantial variation in the timing of the first large outbreak in each state (S24–S26 Figs), which precludes the aggregation of detections in the same manner (for details, see S27 Fig). Instead, we used regression analysis to identify which states experienced reemergence (37 states, including Washington, DC) and which did not (12 states). We challenged our algorithm—based on the EWS weights fitted to the simulated data—to predict whether resurgence occurred in each state (Fig 4D and 4E and S24–S26 Figs). Earlier detections of resurgence imply better performance. In Fig 4D, it is shown that from 1990, almost 100% of states experiencing resurgence exceed the detection threshold. For those states not experiencing resurgence, from 30% to 50% were above the threshold; reasons for these detections likely vary on a state-by-state basis. Some detections (such as in Delaware and Oklahoma) can be attributed to isolated sporadic outbreaks in under-vaccinated communities, not associated with the national trends . In other states (e.g., Wyoming, see panel Q of S26 Fig), the algorithm may be detecting a late resurgence not identified by the linear regression.
To quantify algorithm performance, we calculated the time-varying AUC (a measure of diagnostic ability), which crossed the nominal value of 0.8 around 1992 and continued to increase as the year 2000 was approached (Fig 4E). As with simulated data, Dt outperforms any individual EWS, as depicted in Fig 4D and 4E using the mean (the performance of the remaining 7 EWS is shown in S28 Fig).
To obtain an upper bound on the ability of EWS to classify pertussis emergence, we retrained the learning algorithm on pertussis data (Fig 4E). There is remarkable similarity in performance relative to the model fitted to simulated data (Fig 4E). Reassuringly, the fit to pertussis data assigns most weight to the same 3 EWSs: the skewness, kurtosis, and coefficient of variation (S29 Fig). Taken together, these findings suggest that our algorithm trained on simulated data can be reliably applied to incidence data. Note that there were more positives (true and false) when the algorithm is fitted to simulated data, indicating a lower detection threshold (Fig 4D). This likely arose because demographic parameter ranges in the simulated data were chosen to mimic those of England rather than the US.
In addition to the reemergence of these 2 vaccine-preventable childhood diseases, we tested the performance of our algorithm (fitted to the synthetic dataset described in the Methods) on outbreaks of 2 vector-borne diseases, bubonic plague, and dengue. Compared with the mumps and pertussis examples, the 2017 Madagascar plague outbreak took place over a much shorter timescale, driven by increasingly favorable climatic factors [40, 41]. Examining the daily case reports of bubonic plague, the emergence risk crossed the detection threshold 27 days after the first reported case, a lead time of around 30 days before the outbreak in late September (Fig 5A).
The epidemiological dynamics of dengue virus (DENV) serotypes are complex; dengue infection leads to lifelong serotype-specific immunity and a transient period of serotype-transcending protection [42, 43]. In Puerto Rico, these interactions led to a sequence of extinctions and recolonizations for DENV-1 (2000 to 2008), DENV-3 (1995 to 1998), and DENV-4 (2000 to 2008), with only DENV-2 in constant circulation over the entire period (Fig 5B–5E). Our algorithm, which is designed to detect trends in transmission from incidence data, successfully anticipated outbreaks of DENV-2 (2006) and DENV-3 (2008) that followed periods of modest but continuous transmission. These outbreaks were the result of shifts in the immunological profile of the population following the replenishment of the serotype-specific susceptible pool. This caused a systematic increase in Reff culminating in a transcritical bifurcation. For DENV-2, our algorithm made 2 sustained detections: 1 in 1998 that was associated with a small outbreak and 1 at the start of 2004 that preceded the takeoff of the 2006 outbreak by 18 months (Fig 5C). For DENV-3, the 2008 outbreak was anticipated approximately 6 months ahead of time (Fig 5D). As expected, because of the absence of transmission during extinction periods, our algorithm was unable to anticipate sudden reintroduction events (e.g., DENV-1 in 2007).
As a final test of our transfer learning approach, we applied our detection method fitted to pertussis data (using only the 3 most important EWS discussed earlier) to mumps, plague, and dengue on the national level (S31 and 32 Figs). Remarkably, there was next to no change in performance when compared with fitting to simulated data. The plague outbreak was detected on the same day, whereas for both the mumps and dengue outbreaks, the detections were within 2 months of each other. There was a reduction in positives (true and false) for mumps at the local level; however, there was still an appreciable increase in detections in LAs with large outbreaks before 2004 (S31C Fig). This robustness to the choice of training data underscores that our transfer learning–based approach is not reliant on the specifics of the simulated dataset. Instead, its success stems from the generic statistical properties of incidence data across disease emergence contexts.
At first glance, it may appear surprising that a singular detection method is able to detect emergence in the diverse contexts studied in this paper. However, underlying all these systems are dynamical commonalities inherent to the disease transmission process: as the reproductive number increases, the feedback effect of each infectious case on subsequent transmission is enhanced ("critical-slowing down"). In a similar vein to a recent study on the elimination of measles , our work shows that there is a "canonical path" for diseases emerging via increases in the reproductive number and that it can be found using statistical learning methods.
Although we have endeavored to design our detection method such that it is broadly applicable, specific usage of our algorithm necessitates decisions that cannot be made in a context-agnostic manner. In particular, all detection methods face a trade-off between reducing false positives (via a higher detection threshold) and false negatives (a lower threshold). For the mumps example, because of the rarity of outbreaks, our optimized threshold resulted in a relatively large total number of false positives prior to the 2004 outbreak, which could conceivably result in detection fatigue among end users. This threshold was arrived at by minimizing the classification error, assigning an equal cost to the false negative and false positive rates. The number of false positives can be reduced dramatically by assigning a greater cost to false positives than false negatives (S21 Fig), however, with an unavoidable reduction in the lead time provided by detections of the 2004 outbreak. The appropriate detection threshold is conditional on the potential human, economic, and political costs of a missed outbreak (which would be greater for dengue than for mumps, for example), and requires an assessment that can only be made by public health authorities.
The early warning system proposed here is likely to operate successfully for acute infectious diseases in which the approach to the critical transition (i.e., Reff → 1) is gradual. This may result from (1) steady shifts in a population’s immune profile due to either waning immunity  or turnover in “antigenic seniority,” as documented in influenza , and (2) the accumulation of mutations that facilitate immune evasion  or host adaptation . Instances in which the transition is abrupt (for example, the introduction of a reassortant influenza virus  or the de novo spillover of an easily human-to-human transmissible pathogen from a zoonotic reservoir , such as Ebola virus) cannot conceivably be predicted using approaches that rely on statistical trends in incidence data.
Here, we have demonstrated how ideas from the science of critical slowing down, implemented via a machine-learned detection method, point the way forward for early warning of disease (re-)emergence. Although our fitted model performed remarkably well in each case study, the gravity of confident declaration of disease (re-)emergence demands further scrutiny on the ideal choice of training data and the predictive impact of alternative fitting methodologies. Given the importance of anticipating such events and identifying appropriate preemptive steps to mitigate their toll, the adoption of a multiplicity of approaches is warranted [13–17]. Progress will likely require a combination of activities including pathogen discovery, characterization, and increased zoonotic surveillance, allied to cutting edge data analytics.
EWSs are indicators of approaching critical transitions in dynamical systems. Mathematically, they are defined as the moments and correlation functions of the fluctuations away from a stable equilibrium . As the critical point is approached, the strength of restorative forces decreases, and the magnitude of the fluctuations increases. These changes are captured in various different unique EWSs, Θi,t, which are indexed by the subscript i = 1…n. We considered n = 8 EWS: the mean, standard deviation, coefficient of variation, index of dispersion, skewness, kurtosis, and autocorrelation at lags 1 and 2. For R0 < 1, the disease-free equilibrium is stable; previous theoretical studies have shown that as R0 approaches 1, there are detectable trends in the EWS [30–32].
Operationally, EWSs are calculated from a single epidemiological time series (; either case reports or incidence). We grouped weekly case reports into 4-week totals, informed by previous findings . Estimators for each EWS were constructed by substituting any expectations, , in its mathematical definition with an exponentially weighted moving window average,
The decay rate λ was specified by the half-life (the length of time for the exponential weight to half in magnitude), t1/2 = ln(2)/λ . Preliminary studies found using exponential weighting in the moving averaging, rather than the more common uniform window, performed better at detecting emergence in the synthetic dataset. The estimators for each EWS are listed in S1 Table.
This dataset was generated using a stochastic SEIR model . The model incorporates demographic and environmental stochasticity as well as reporting error. Simulations were performed using the next reaction method . The population size fluctuated via births and deaths, with a stable mean population size N0 and a mean life expectancy of 75 years. Transmission due to external sources occurred at rate ζ . Latent and infectious periods were assumed to be exponentially distributed with a mean of 13 days and 6 days, respectively, values appropriate for mumps . Infection-derived immunity was assumed to be complete and lifelong. Time series were of length 20 years: a 10-year transient period with fixed parameters (not used for training the learning algorithm), followed by T = 10 years with varying parameters. For each time series, i, R0 followed a unique stochastic trajectory with initial value . We generated 2 types of data. Time series for an emerging pathogen were generated using a Brownian bridge process, with curvature determined by parameter κ. For these data, . We also generated data with no average trend in R0, such that ; these data were generated using an Ornstein–Uhlenbeck process. To ensure that for all t < T for emerging and nonemerging time series and that for the emerging time series, we ignored seasonality in transmission (a feature of all the diseases studied in this paper). The simulation algorithm returned time series of the reported number of new cases. These case counts were aggregated into weekly case reports, mimicking the practices of public health bodies. A negative binomial reporting error was applied to each weekly case report, with mean ρ . Model symbols and reaction rates are listed in S2 and S3 Tables, respectively. A total of 10,000 stochastic trajectories were generated, with an even split between emerging and nonemerging. We used Latin hypercube sampling so that each simulated time series had unique values for 5 parameters: initial population size (N0), reporting probability (ρ), import rate (ζ), the initial R0 (), and the volatility of the Brownian random walk (κ ). Parameter ranges for the Latin hypercube are given in S4 Table.
As a measure of emergence risk, we usedwhere wt is the weight applied to the i-th EWS, w0 is the intercept, and n = 8. We fitted Dt to the synthetic dataset using logistic regression with an ℓ1 -penalty ("lasso regression" ), treating each time point of each time series as an independent data point (see S1 Text for more details). Each data point was assigned equal importance in the fit—i.e., we did not prioritize classification accuracy for data points closer to the time of emergence. We used an ℓ1 -penalty both to prevent overfitting to the training data and as a means of feature selection . Our learning algorithm has 2 hyperparameters, the penalty strength p and the half-life t1/2 (used to calculate the EWSs), which were optimized using the AUC via 10-fold cross-validation  (see S1 Text). Using the optimized hyperparameters, we trained on the full synthetic dataset to get the optimum set of weights and intercept w0 for Dt. We selected a detection threshold c by calculating the ROC curve—a parametric plot of type I errors against type II errors as a function of detection threshold—and finding the threshold that minimizes the sum of the type I and type II error rates. We calculated the AUC through time by grouping the dataset by reporting week and then calculating the ROC for each group separately. The AUC was calculated from the ROC curve using the trapezoidal rule. We fitted Eq 2 to both case reports data and incidence data (calculated by scaling each simulated time series by its associated population size).
We sourced laboratory-confirmed mumps cases from Public Health England. Cases were disaggregated by specimen collection week and the respective LA. To preserve patient anonymity, each LA was assigned a unique integer identifier and every specimen week was shifted by the same constant. Because of the formation of new LAs during the time period, we restricted our analysis to the 157 LAs with cases of mumps prior to vaccination in 1988. We calculated emergence risk for each LA by grouping cases into 4-week reporting intervals and applying Eq 2, using the fit to the synthetic case reports. Emergence risk was calculated nationally by aggregating 4-week case reports from all LAs before calculating the EWSs.
For each LA, we calculated the outbreak size (total cases) in 2004 to 2005. To assign LAs as those with large and small outbreaks, we modeled outbreak size using a general mixture model. The model used was a mixture of 2 exponential distributions, , with rate parameters λ1 ≤ λ2. We fitted the model to the outbreak size data using maximum likelihood. An observed outbreak of size o was classified as large if ; otherwise, we classified it as small.
We obtained monthly pertussis case reports for each of the 48 contiguous states plus the District of Columbia from 1980 to 2000 . Using state-level population data , we converted case reports data into incidence data. We performed a linear regression on the log-transformed monthly incidence data. States were classified as either emerging or not emerging based on the significance of the slope using a 1-sided t test. We used a significance level of 0.05.
EWSs were calculated for each state. Emergence risk was then calculated with Eq 2, using the weights fitted to the synthetic incidence dataset. Performance was assessed by calculating the AUC, using the linear regression classification as the true classification. Emergence risk was also calculated using Eq 2 with weights fitted to the pertussis data instead of simulated data. For this fit, the linear regression classification of each state was used as the target in the logistic regression.
Daily case counts of bubonic plague in Madagascar were obtained from . Weekly serotype-resolved confirmed cases of dengue were made publicly available by the NOAA as part of the Dengue Forecasting project.
We thank E. B. O’Dea, J. M. Drake, and A. W. Park for helpful discussions informing the study, and J. M. Drake for valuable comments on the manuscript.