ARBO: Arbovirus modeling and uncertainty quantification toolbox

The ongoing pandemic of COVID-19 has highlighted the importance of mathematical tools to understand and predict outbreaks of severe infectious diseases, including arboviruses such as Zika. To this end, we introduce ARBO, a package for simulation and analysis of arbovirus nonlinear dynamics. The implementation follows a minimalist style, and is intuitive and extensible to many settings of vector-borne disease outbreaks. This paper outlines the main tools that compose ARBO, discusses how recent research works about the Brazilian Zika outbreak have explored the package’s capabilities, and describes its potential impact for future works on mathematical epidemiology.


Introduction
Arboviruses have been emerging in different parts of the world over the past several decades. Tropical countries seem to be more susceptible to these diseases due to weather favoring the proliferation of mosquitoes [1]. The last update by the Pan American Health Organization reported that more than 1.6M cases of arboviral diseases occurred in the Americas between the first and 42nd epidemiological weeks of 2020 [2]. Even though COVID-19 has of late dominated the agendas of global health institutions, the grave and harmful effects of arboviruses should not be neglected [3,4]. Indeed, after the connection between Zika infection and microcephaly in newborns was revealed, the World Health Organization declared the disease a global medical emergency The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals. * Corresponding author. To simulate infectious diseases transmitted by mosquitoes, double population compartmental models represent the cross-infection dynamics between humans and vectors [5][6][7][8]. Widely explored in the literature, this kind of homogeneous model provides a good balance between interpretability and simplicity. In this vein, the ARBO package was developed to model and analyze the Zika virus outbreak that occurred in Brazil in 2016 [9], but can be easily adapted to guide studies in other regions and for other arboviruses such as Dengue or Chikungunya. Moreover, the package is written and organized in a didactic form, allowing it to also be used for educational purposes.

Software details
ARBO provides a toolbox for vector-borne dynamics on a host population, encompassing: (i) an initial value problem (IVP) solver; (ii) an inverse problem module for (deterministic) model calibration, potentially using real data; (iii) a model enrichment module which constructs a discrepancy operator to compensate for epistemic deficiencies (uncertainties) in the structure of the mathematical model; and (iv) an uncertainty quantification module which propagates the aleatory uncertainties from the model parameters and initial conditions through the mathematical model to its response. The package is a didactic and intuitive tool written in Matlab and C++ languages. Each module in the package is briefly described in Fig. 1.

Initial Value Problem:
In the IVP module, an SEIR-SEI double population compartmental model [9,10] displays the time series evolution for several groups of interest in the host and vector populations. One-population models, like the classic SIR model, could also be implemented and would be compatible with the other modules; however, arbovirus studies require that both populations (human and vectors) be modeled since cross-infection causes most transmission. Additionally, to easily compare with real data, the cumulative number of infectious humans and the number of new cases are shown. The routines allow for simple user control of the time unit, so that the response can match available real data as usually displayed in epidemiological bulletins (days, weeks, months, etc.).

Model Calibration:
This module solves an inverse problem that takes one model response-number of new cases per unit time-and compares it with experimental data. The best set of parameters (inputs), i.e., the parameter values for which the model response most closely reproduces the epidemiological data, is found via a nonlinear least squares minimization of the errors. In particular, the optimization uses the Trust-Region-Reflective method [11], which requires initial guesses for the parameters and allows for user-provided minimum and maximum values of each input.

Model Enrichment :
To address the (epistemic) modeling errors present in the SEIR-SEI compartmental model, an embedded discrepancy operator improves the model response using extra information about the existing state variables (such as state or derivative information). This strategy was first established for chemical kinetics systems [12] and then adapted for epidemiological applications [7]. Note that the quantity of interest is only modified indirectly; that is, the model equations are enriched, which in turn affects any model outputs. This method is in contrast to many discrepancy models which directly adjust the model output to the data, and is consistent with the philosophy that model errors should be corrected via modeling hypotheses.
Uncertainty Quantification: This module analyzes, through a set of statistical measures, how parameter and initial condition uncertainties are propagated to the model output [13][14][15]. The input PDF and possible correlations therein can thus be observed, while the output PDF reveals critical information about the outbreak, such as a specified confidence interval for the total number of cases. Such analysis has significant importance for prediction and quantitative assessments, since it is very difficult to correctly estimate biological values for arbovirus parameters. Furthermore, outbreaks usually start to be documented long after their actual onset, so initial conditions are typically subject to large uncertainty as well.

Impact overview
The recent worldwide outbreaks of arboviruses increased the demand for relevant applied research on mathematical epidemiology [16]. The ARBO package may help in the quantitative and qualitative simulation of possible scenarios for future outbreaks. Easy-to-use, the code can readily adapt to a wide range of model configurations [6,7,9]. While these studies focused on the Brazilian outbreak, they showed how to construct robust analyzes for a general epidemic under several sources of uncertainties. The current and the above articles will guide users to extend this methodology to future epidemics, whether through real, predicted, or exploratory simulations.
Besides the research contributions, the ARBO package is currently adopted by courses on Uncertainty Quantification and Nonlinear Dynamics at Rio de Janeiro State University, in Brazil. Furthermore, a previous version of this code was used as a baseline structure for the modeling module of EPIDEMIC: Epidemiology Educational Code [17], a pedagogical toolbox for teaching introductory mathematical epidemiology. In this way, the ARBO package has inspired the development of additional tools for mathematical epidemiology research, each tailored to complement the existing background of mathematicians, physicists, engineers, and many other STEM professionals who may be interested in this area.

Final remarks
ARBO is a computational package for vector-borne dynamics of infectious diseases that provides a start-to-finish analysis of an outbreak model, including simulation, model calibration, model enrichment, and uncertainty quantification. Built in a cohesive and intuitive way, the package can be adapted for a diverse range of two-population compartmental disease models. Observational data can be directly imported for investigation and selection of suitable parameters (inputs). The different modules account for multiple types of uncertainties, as well as errors stemming from model hypotheses, in a comprehensive manner, allowing for robust predictions and analysis over multiple epidemic scenarios. The package has been used successfully in previous studies of real epidemics and also as a didactic tool for students of mathematical epidemiology and nonlinear dynamics.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.