## Abstract

Identifying the key factors underlying the spread of a disease is an essential but challenging prerequisite to design management strategies. To tackle this issue, we propose an approach based on sensitivity analyses of a spatiotemporal stochastic model simulating the spread of a plant epidemic. This work is motivated by the spread of sharka, caused by *plum pox virus*, in a real landscape. We first carried out a broad-range sensitivity analysis, ignoring any prior information on six epidemiological parameters, to assess their intrinsic influence on model behaviour. A second analysis benefited from the available knowledge on sharka epidemiology and was thus restricted to more realistic values. The broad-range analysis revealed that the mean duration of the latent period is the most influential parameter of the model, whereas the sharka-specific analysis uncovered the strong impact of the connectivity of the first infected orchard. In addition to demonstrating the interest of sensitivity analyses for a stochastic model, this study highlights the impact of variation ranges of target parameters on the outcome of a sensitivity analysis. With regard to sharka management, our results suggest that sharka surveillance may benefit from paying closer attention to highly connected patches whose infection could trigger serious epidemics.

## 1. Introduction

Many factors impact the spread of a plant disease. Such factors include the biology of the different species involved in the pathosystem, the processes governing primary and secondary infections and, generally, landscape structure and composition [1]. Precisely identifying the key factors for disease spread is of prime interest for disease management, because they constitute a target for research efforts and control strategies [2,3]. However, the identification of the key factors of an epidemic through experiments rapidly becomes intractable, because there can be a huge number of candidate parameters, not counting their interactions. In addition, epidemics must be considered at a large spatiotemporal scale, which is rarely possible in experiments (but see e.g. [4]). Expert opinions are often simple, efficient and quick alternatives, but are by their nature prone to error. Finally, epidemiological models are an interesting approach because of their ability to test several scenarios, while circumventing the difficulties associated with experiments.

Global sensitivity analysis of a simulation model aims to assess the fraction of the variability of the output that can be attributed to each input parameter, and thus to rank these parameters by influence on the model output [2,5,6]. This method is based on a numerical experiment designed to explore the parameter space using numerous parameter combinations. In ecological modelling, possible values of each input parameter traditionally come from the literature or via expert knowledge, and are often summarized by bounded intervals [6]. Global sensitivity analysis has been very useful in identifying key factors behind the spread of different animal pathogens [7,8] and their vectors [9], as well as of invasive plants [10]. In plant epidemiology, some studies dealing with plant viruses [11–13] and fungi [14,15] used formal sensitivity analyses; however, the developed approaches did not allow the inclusion of all candidate epidemiological processes and interactions.

A panel of statistical methods has been proposed for global sensitivity analysis of deterministic models during the last 30 years. Available approaches mainly include graphical methods to explore marginal effects, the elementary effects method [16], variance-based methods or metamodelling, i.e. fitting of a mathematical function which approximates the model [5,17,18]. By contrast, few methods have been proposed for stochastic models [7,8,15], despite their wide use in ecology and epidemiology. Model stochasticity accounts for variability and uncertainty in the different biological processes [1]. Stochastic models are thus essential when performing epidemiological risk assessment.

The first aim of this study was to present a rigorous framework for the sensitivity analysis of a stochastic spatially explicit model simulating the spread of a plant disease, and to better understand the influence of different epidemiological parameters on model outputs. The second objective was to uncover the key parameters of epidemic spread. As a case study, this work focused on sharka, caused by *plum pox virus* (PPV, genus *Potyvirus*, family *Potyviridae*), one of the most damaging diseases affecting trees of the genus *Prunus* (e.g. peach, apricot and plum) [19,20]. Depending on virus–host (or virus–cultivar) interactions, fruits of symptomatic hosts can be unmarketable due to considerable alterations or premature drop [21]. The disease has dispersed worldwide owing to commercial exchanges of contaminated plant material [20]. In addition, PPV is naturally transmitted from infectious sources to healthy trees by many aphid species in a non-persistent manner [20,22,23]. According to this transmission mechanism, an aphid vector can acquire viral particles from an infected host and may inoculate the virus to a susceptible host for a short period of time [24]. Once inoculated—and after a latent period—new infected hosts constitute in turn viral sources for aphid-mediated transmission. For PPV, it is commonly assumed, and was recently shown with young peach plants, that the beginning of the infectious period occurs at the same time as symptom expression (i.e. the end of the incubation period) on the infected host [25,26]. Prevalence at introduction in orchards can be extremely variable depending on the type of contamination in nurseries. If only some plants intended for planting are contaminated by infectious aphids, few infected trees will be planted in orchards. On the contrary, if a tree used as propagation stock is infected, the whole batch of progeny plants intended for planting will be infected, resulting in a massive introduction at orchard planting.

An explicit spatiotemporal simulation model was previously developed and used to estimate biological parameters of sharka spread in a real landscape using data from PPV (M strain) epidemics in a French peach-growing area [27]. It is based on an SEIR architecture, i.e. each host can be classified in one of the following states: *S* (susceptible: healthy), *E* (exposed: infected but not yet infectious nor symptomatic), *I* (infectious and symptomatic), *R* (removed). Transitions between states are defined by stochastic equations. Through Markov chain Monte Carlo simulations within a Bayesian framework, this model enabled estimating the transmission coefficient (i.e. the transmission intensity per infectious tree), the duration of the latent period of PPV infection, and the dispersal kernel of infectious aphids [27]. In a landscape (i.e. in two dimensions), a dispersal kernel can be defined as the probability distribution of the position of a particle (e.g. an insect) after dispersal from the origin [28].

In the present study, this simulation-based estimation model was modified in order to perform two different sensitivity analyses, destined to highlight the impact of the variation ranges of the target parameters. Global sensitivity analyses generally only focus on a specific system and use narrow variation ranges for well-known parameters and wider variation ranges to encompass all possible values of less-known parameters (e.g. [29,30]). In contrast, here we intended to treat equally all target parameters by using wide and narrow variation ranges in two separate analyses. Thus, the first analysis (called ‘broad-range sensitivity analysis’ hereafter) was designed to assess the intrinsic influence of different epidemiological parameters on the overall behaviour of the model, ignoring any prior information on parameter values. The range of variation for each target parameter was very wide and covered, when possible, the whole definition domain. The second, sharka-specific, analysis benefited from available knowledge on the epidemiological parameters of PPV in southeastern France (published data and expert opinions were jointly used to obtain biologically reasonable bounds for the variation ranges of each parameter). On the one hand, the first analysis demonstrates the interest of sensitivity analyses based on Sobol's indices and metamodelling to explore the properties of a stochastic model. The second, sharka-specific, sensitivity analysis uncovers the most influential epidemiological parameters, which may require further knowledge from research work, or further action for sharka management in peach orchards.

## 2. Model description

The model is stochastic, orchard-based, with a discrete time step of 1 week denoted *t* (*t* = 1, … , *T*). Simulations run for a period of 35 years (from *a*_{0} = 1 to *a*_{f} = 35), which is a reasonable duration to assess the long-term impact of an epidemic in cultivated perennial plants.

### 2.1. Landscape and orchard dynamics

Orchard data were collected in a peach-growing area in southeastern France. The resulting database includes 34 years of information on 553 patches (i.e. pieces of land); the distance between the centroids of neighbouring patches ranges from 16 to 496 m (median: 55 m). On these patches, 597 peach orchards (i.e. homogeneous crop of a single cultivar with the same planting date) have been continuously cultivated (i.e. removed orchards are replanted during the next winter). In our model, we use the real landscape of 553 patches (electronic supplementary material, figure S1) which has been used previously to estimate most parameters of the model [27]. However, we simulate new orchard dynamics on the patches. To stabilize the mean age of the orchards at year *a*_{0} = 1, an initial orchard planting year is uniformly drawn between −39 and −20 for each patch. Then the following algorithm is iterated: (i) the orchard duration (after which the orchard is removed) is drawn from a Poisson distribution whose mean is *ψ* = 15 years; (ii) the year after orchard removal gives the planting year of the subsequent orchard on the same patch. For each orchard, a planting density is randomly drawn from the empirical distribution of the planting densities of the 597 orchards in the database.

### 2.2. SEI architecture

Each host is in one of the following three different health states (figure 1): *S* (susceptible), *E* (exposed) or *I* (infectious). In orchard *i* at time step *t*, let the variables *S _{i,t}*,

*E*and

_{i,t}*I*describe the respective number of hosts in each state (

_{i,t}*S*+

_{i,t}*E*+

_{i,t}*I*is the total number of living hosts). Based on the characteristics of non-persistent transmission, a viruliferous vector is considered non-infectious once it has probed (and may have transmitted the pathogen) on a first healthy host. Thus, the number of hosts being infected (i.e. moving from

_{i,t}*S*to

*E*) between

*t*and

*t*+ 1 in orchard

*i*is defined as follows:

*λ*the infectious potential incurred by each tree of the orchard due to the presence of infectious hosts inside the orchard as well as in other source orchards. This infectious potential is calculated as:

_{i,t}*β*is the transmission coefficient (i.e. the number of vectors leaving a given infectious host over one year and able to initiate an infection on a healthy host);

*α*is the normalized flight density (i.e. the proportion of annual flights that occur at time step

_{t}*t*); and

*m*is the connectivity between orchards

_{i′i}*i′*and

*i*. This connectivity gives the probability that a vector disperses from orchard

*i′*to orchard

*i*, and depends on the dispersal kernel of the vector. The division by

*S*+

_{i,t}*E*+

_{i,t}*I*allows the expression of

_{i,t}*λ*as a mean infectious potential per host in the recipient orchard.

_{i,t}Infected hosts become infectious (i.e. move from state *E* to state *I*) after a latent period whose duration is given by *A*_{2} the duration between the inoculation date and the end of the year. The minimal delay imposed by the term *A*_{2} was introduced because, in natural conditions, the latent period of sharka does not end within the growing season when the tree is inoculated [21,31,32].

It is assumed that trees in state *I* are not productive any more, due either to the absence of yield induced by the disease, or to the ban on fruit sales from symptomatic trees. However, the disease is not supposed to affect host lifespan (no available data reported any increase in peach mortality due to PPV), and this work is focused on disease spread in the absence of any control action (e.g. removal of diseased trees). Consequently, this model does not include an *R* (removed) state. Nevertheless, as defined by the algorithm used to simulate orchard turnover (see ‘Landscape and orchard dynamics’), orchards are regularly renewed (due to agricultural reasons, regardless of the disease) with trees in state *S* (or possibly *I*, see ‘Pathogen introduction’) (figure 1).

### 2.3. Pathogen dispersal

The calculation of the connectivity matrix, *m*, requires the integration of the dispersal kernel, *f*, over all the points *p*_{1} in orchard *i* (whose patch is represented by polygon *A _{i}*) and the points

*p*

_{2}in orchard

*i′*(whose patch is represented by polygon

*A*):

_{i′}*p*

_{1}and

*p*

_{2}. For a given dispersal kernel, this calculation is performed using the CaliFloPP algorithm [33]. However, its computational cost is huge, and consequently not affordable if the dispersal kernel varies among the different simulations of a sensitivity analysis. Therefore, the dispersal kernel, which is isotropic in this work (i.e. we assume that aphid dispersal is homogeneous in all directions), is defined as a weighted mixture of

*J*exponential distributions using the same approach as in [27]:

*J*= 20, the equation governing

*h*enables the use of exponential distributions whose means increase exponentially from 3 m to 4.5 km, i.e. from approximately the mean distance between two trees in an orchard to the dimension of the simulated landscape. To manipulate the weights of these

_{j}*J*exponential distributions via just two parameters (

*s*

_{1}and

*s*

_{2}), and to guarantee that they sum to 1, they are calculated by integrating the probability density

*g*(

*x; s*

_{1},

*s*

_{2}) of the variable

*J*sub-intervals consisting in a partition of the interval [0; 1] (noting that the probability density of a beta distribution integrates to 1 on the interval [0; 1], see also electronic supplementary material, figure S2). The diversity of possible shapes of the beta distribution allows a lot of flexibility to define the different weights (see electronic supplementary material, figure S3C). Thus, the connectivity matrix used in the simulations is a weighted mixture of

*J*= 20 matrices calculated beforehand with their respective exponential functions. The mean dispersal distance is:

### 2.4. Pathogen introduction

At *t* = 1, the pathogen is introduced for the first time at orchard planting. In addition, starting from *t* = 1, every orchard planting (whatever the patch) is subjected to a risk *Φ* of new introduction. For each introduction, the probability that a given host is infected at orchard planting, *τ*, is described by a weighted mixture of two beta distributions:
*p*_{MI} indicates the relative probability of massive introduction. Infected hosts at planting are placed directly in state *I*. This assumption is more parsimonious than those needed to simulate the latent period duration of these trees, whose inoculation date (in nurseries) and process (from viruliferous aphids or from infected propagation stock) are unknown. The mean number of infectious trees at planting (in case of introduction) is: *τ*_{1} and *τ*_{2} is such that

To control the connectivity of the patch where the first introduction occurs, this patch is selected based on its theoretical ‘outgoing connectivity’. For each patch *i′*, this parameter, denoted *κ _{i′}*, describes the mean number of infectious aphids that would leave patch

*i′*if all trees in this patch were infectious, and land in any other patch of the landscape:

*i′*of area

*A*, calculated using the reference planting density estimated from expert opinion:

_{i′}*μ*

^{th}= 666 trees.ha

^{−1}. Thus, the patch of first introduction is selected by choosing a quantile

*q*of the outgoing connectivity calculated beforehand for all patches of the landscape. One can note that the ranking of the patches according to

_{κ}*κ*depends on their respective area and connectivity with neighbouring patches, and that

*κ*reflects the potential of patch

_{i′}*i′*to initiate an epidemic.

### 2.5. Output variable

The age-dependent fruit yield of the hosts in *S* and *E* states in orchard *i* at year *a*, denoted *y _{i,a}*, is relative to the maximum yield (i.e.

*y*is a proportion). The impact of disease spread in the landscape between the beginning of the epidemic (at year

_{i,a}*a*

_{0}) and the end of the simulation (at year

*a*

_{f}) is summarized by the mean equivalent number of fully productive trees per hectare and per year (figure 1):

*A*

_{tot}= 523 ha, the total area covered by the patches.

## 3. Sensitivity analyses

A sensitivity analysis generally consists of four steps: (i) definition of the target parameters and their respective variation ranges and probability distributions; (ii) generation of a numerical experimental design to explore the parameter space; (iii) simulation and (iv) computation of sensitivity indices [5]. These four steps can be complemented with metamodelling, which provides further insights into the relation between the model input and output.

### 3.1. Target parameters and variation ranges

The sensitivity analyses targeted six parameters: the quantile of the outgoing connectivity of the patch of first introduction (*q _{κ}*), the probability of introduction (

*Φ*), the relative probability of massive introduction (

*p*

_{MI}), the transmission coefficient (

*β*), the expected value of the dispersal weighting variable (

*W*

_{exp}) and the expected duration of the latent period (

*θ*

_{exp}).

Only one parameter per epidemiological process was targeted in the sensitivity analyses, in order to easily rank these processes by influence. Moreover, this approach helps target parameters which vary independently from each other. Thus, the beta distribution of the dispersal weighting variable (*W*) was re-parametrized with its expected value, *W*_{exp}, and its variance, *W*_{var}, in order to vary only *W*_{exp} in the sensitivity analyses (*W*_{var} was kept fixed at its reference value; table 1). The shape parameters of *W* were then calculated by: *θ*_{exp}, and its variance, *θ*_{var}, and only *θ*_{exp} was targeted in the sensitivity analyses. The shape and scale parameters were, respectively, calculated by: *θ*_{var} was not held constant, because data on incubation periods of many infections show a strong correlation between the median or mean and the standard deviation (electronic supplementary material, Methods S1, table S1 and figure S10). With regard to PPV infection, we obtained a coefficient of 0.35 from the mean and standard deviation estimates of the latent period duration [27]. This is within the range of values for various diseases (electronic supplementary material, table S1) so the variance of the latent period duration was calculated by *θ*_{var} *=* (0.35 *×* *θ*_{exp})^{2}.

In the broad-range sensitivity analysis, target parameters were varied within their whole definition domain, or, if impossible, were bounded by values below and above which the epidemic process does not fundamentally change any more. Therefore, the variation ranges of *q _{κ}*,

*Φ*and

*p*

_{MI}were easily defined by the bounds of their definition domain (i.e. [0; 1]) (table 1). For

*W*

_{exp}this interval had to be restricted to generate only unimodal beta distributions for

*W*(see electronic supplementary material, figure S3C), and thus smooth dispersal kernels (see electronic supplementary material, figure S4). Parameter

*θ*

_{exp}was bounded by 1 day (i.e. a value close to 0 time step in our model) and 33 years (see electronic supplementary material, Methods S2). At this upper bound, less than 5% (on average) of the hosts infected at planting would become infectious before orchard removal. Finally, bounds for

*β*were calculated using a deterministic susceptible-infectious model in a single orchard where only one host was infected at planting (see electronic supplementary material, Methods S2). When

*β*was set at its lower bound, only one host was newly infected after 30 years; at its upper bound, 95% of the orchard was infected after 3 years (i.e. when the orchard starts to be productive).

In the sharka-specific sensitivity analysis, the variation ranges of the target parameters were informed by available knowledge on sharka epidemics in French peach orchards (table 1). Previous work enabled the estimation of *β*, *θ*_{exp} and *W*_{exp} [27]; thus, the bounds of these parameters were defined by their 99% credibility intervals. Since the first introduction could occur in any patch of the landscape, *q _{κ}* was varied inside its whole definition domain, as in the previous analysis. Bounds of

*Φ*and

*p*

_{MI}were assessed by expert opinion based on the available field data on introductions.

Because *κ* is calculated using *m*, *q _{κ}* also depends on

*W*

_{exp}. In order to independently vary

*q*and

_{κ}*W*

_{exp},

*κ*was calculated using a different connectivity matrix, denoted

*m′*, as follows. In the broad-range sensitivity analysis,

*m′*was generated using a simple uniform function:

^{2}zone centred on patch

*i′*and occupied by other orchards. In the sharka-specific sensitivity analysis, because

*W*

_{exp}varied slightly around its reference value,

*m′*was generated using the dispersal kernel

*f*(see ‘Pathogen dispersal’) with

*W*

_{exp}= 0.486 (i.e. the reference value). Thanks to these procedures, the six target parameters were sampled independently.

One can note that the variation of some of the target parameters (*p*_{MI}, *W*_{exp}, *θ*_{exp}) led to variation of whole probability distributions (electronic supplementary material, figures S3 and S4) used, in turn, as input in the model.

### 3.2. Calculation of sensitivity indices

Simulations were performed with 15 000 different parameter combinations generated with Sobol's sequences [34,35]. To take stochasticity into account, each combination was replicated 50 times. Let *Y* denote the output variable (i.e. the mean or standard deviation of these replicates), and *X _{i}* (

*i*= 1, … ,

*p*) the input parameters of the model. Sobol's indices are calculated for each

*X*as follows:

_{i}*X*denotes the whole set of parameters except

_{-i}*X*.

_{i}*X*alone;

_{i}*X*including all its interactions with other parameters (for details on the computation of Sobol's sensitivity indices, see electronic supplementary material, Methods S3 and [17,18,36]). First-order indices were estimated with Sobol–Saltelli's method [37,38] whereas total indices were estimated with Sobol–Jansen's method [37,39], because these two methods were shown to be good estimators of small first-order indices and (large and small) total indices, respectively. The 95% confidence intervals (CI

_{i}_{95}) of the sensitivity indices were estimated using 10 000 bootstrap replicates [40].

In each sensitivity analysis, key interactions were identified using polynomial regression. A third-degree polynomial including interactions restricted to polynomial terms of degree two was fitted to the means and standard deviations of the model output. Then, the most parsimonious model was obtained using a stepwise selection algorithm based on the Bayesian information criterion (BIC) [41]. Before model fitting, the target parameters were standardized to the mean and standard deviation of their sampling distribution, in order to easily interpret the estimated coefficients as the effect of one standard deviation change in each parameter on the output variable [2]. The first-order sensitivity indices of the metamodel are deduced from Sobol's decomposition (see details in electronic supplementary material, Methods S3):
*Z _{i}* is the standardized version of

*X*;

_{i}*b*

_{0}denotes the intercept;

*Z*,

_{i}*b*and

_{i,j}*b*denote the estimated coefficients of the products

_{i,j,k}*Z*and

_{i}.Z_{j}*Z*, respectively.

_{i}.Z_{j}.Z_{k}### 3.3. Simulations with fixed parameters

In order to better visualize the impact of the introduction parameters on epidemic spread, 100 simulations were performed for each of the following scenarios:

A. ‘Disease-free’: the pathogen is not introduced in the landscape;

B. ‘Single introduction of one tree’: the pathogen is introduced once at

*t*= 1 with only one infected host;C. ‘Single introduction of several trees’: the pathogen is introduced once at

*t*= 1 with possibly several infected hosts;D. ‘Weakly connected first introduction’: the pathogen is first introduced at

*t*= 1 in a weakly connected patch (*q*= 0.01), and with a risk_{κ}*Φ*at every subsequent orchard planting;E. ‘Moderately connected first introduction (reference)’: same as scenario D, but the patch of first introduction is moderately connected (

*q*= 0.50);_{κ}F. ‘Highly connected first introduction’: same as scenario D, but the patch of first introduction is highly connected (

*q*= 0.99)._{κ}

Except the introduction parameters, all the model parameters were fixed at reference values associated with sharka epidemics in French peach orchards, estimated from epidemiological data or expert opinion (table 1). For scenarios C–F, the probability that a tree is infected at introduction is given by *τ*, whose mean is 2.6% (using reference values for *ξ*_{11}, *ξ*_{12}, *ξ*_{21}, *ξ*_{22} and *p*_{MI}).

### 3.4. Software

The model was written in R and C languages. Within the R software v. 3.0.3 [42], Sobol's sequences and indices were obtained using the packages *fOptions* (v. 3010.83) and *sensitivity* (v. 1.11), respectively. The stepwise model-selection algorithm used the function *regsubsets* of the package *leaps* (v. 2.9). A simulation takes approximately 4 s with a regular computer (Intel^{®} Core™ i7-4600M). The overall exploration of the model was carried out on a computer cluster, which enabled the parallelization of the 50 stochastic replicates for each sensitivity analysis, each replicate requiring roughly one day to explore the 15 000 parameter combinations.

## 4. Results

### 4.1. Strong influence of latent period duration on the simulation model

In the broad-range analysis, the mean output over the stochastic replicates varied from 47 to 567 equivalent fully productive plants per hectare and per year. The sensitivity analysis showed that the expected duration of the latent period (*θ*_{exp}) had the strongest impact (figure 2*a*), with a total Sobol's index (SI_{tot}) of 0.81 (CI_{95}: 0.75–0.87). Next came the probability of introduction (*Φ*), the transmission coefficient (*β*) and the relative probability of massive introduction (*p*_{MI}), with SI_{tot} of 0.12 (0.11–0.14), 0.09 (0.07–0.10) and 0.08 (0.07–0.08), respectively. The last two parameters, i.e. the expected value of the dispersal weighting variable (*W*_{exp}), and the quantile of the outgoing connectivity of the patch of first introduction (*q _{κ}*), had a negligible influence on the output as shown by their very small SI

_{tot}. The first-order indices (SI

_{1}) were close to (and mostly not significantly different from) SI

_{tot}. In addition, the total variability associated with the main effects (assessed by the sum of the SI

_{1}) explained 94% of the variability in the output. These elements indicate that the simulated process involves few interactions. In order to characterize the relationship between parameters and the mean response, a polynomial metamodel was adjusted to the model output. According to its coefficient estimates, the most important interactions (figure 3

*a*) involved firstly

*β*and

*θ*

_{exp}(SI

_{2}= 0.03) and, secondly,

*Φ*and

*p*

_{MI}(SI

_{2}= 0.01). In addition to the good coefficient of determination (

*R*

^{2}= 0.96), the goodness-of-fit of this metamodel was confirmed by the estimates of the SI

_{1}, which did not differ from the indices estimated with Sobol's method by more than 0.01 (figures 2

*a*and 3

*a*).

The estimated metamodel was useful to predict the mean number of productive plants for each value of the target parameters (all other parameters being fixed at their mean value, figure 4*a*). The number of productive plants greatly increased with the duration of the latent period, until approaching the maximum (i.e. 565 fully productive plants per hectare and per year, in the absence of disease) for expected durations of the latent period above 15 years. Because this saturation phenomenon occurred for half of the simulations, graphics associated with the other parameters showed a lot of noise. Despite this noise, it is possible to see that increasing values of *β*, *Φ* and *p*_{MI} had a negative impact on the output, whereas *W*_{exp} and *q _{κ}* had a negligible effect, as already indicated by the sensitivity indices.

The analysis performed on the standard deviation of the output revealed that the model stochasticity was mainly due to many interactions of low influence (except the one between *θ*_{exp} and *Φ*), as shown by the low values of SI_{1} and of the interaction indices calculated through the metamodel (see electronic supplementary material, figures S5*a* and S6*a*). The model output was more variable when the latent period duration and the probability of introduction were low (see electronic supplementary material, figure S7a).

### 4.2. Importance of the patch of first introduction for sharka epidemics

In the analysis specific to sharka epidemics in peach orchards, the number of productive trees varied from 469 to 559 per hectare and per year. Parameter *q _{κ}* was the most influential, with an SI

_{tot}of 0.56 (0.52–0.60). Parameters

*θ*

_{exp},

*Φ*,

*β, p*

_{MI}and

*W*

_{exp}followed with SI

_{tot}of 0.30 (0.28–0.33), 0.17 (0.16–0.19), 0.13 (0.12–0.14), 0.10 (0.09–0.11) and 0.06 (0.06–0.07), respectively. Interactions had a small effect, the main effects of the target parameters explaining 87% of the variability in the mean number of productive trees (figure 2

*b*). These results were similar to the indices calculated through the metamodel (SI

_{1}did not differ from Sobol's indices by more than 0.09). The main (although weak) interaction was between

*θ*

_{exp}and

*q*, with an SI

_{κ}_{2}of 0.005 (figure 3

*b*,

*R*

^{2}= 0.83). Similarly to the broad-range analysis, the graphical representation of the influence of each target parameter on the output showed that the epidemic impact increased when

*q*,

_{κ}*Φ*,

*p*

_{MI}and

*β*increased, and when

*θ*

_{exp}decreased (figure 4

*b*).

Stochasticity in the sharka-specific analysis was due to the main effects of *θ*_{exp}, *q _{κ}*,

*p*

_{MI}and

*β*, and to many interactions, as revealed by the large differences between SI

_{tot}and the SI

_{1}for the six target parameters. Although these interactions were numerous, the metamodel showed that each one had a small sensitivity index (see electronic supplementary material, figure S5

*b*and S6

*b*). The number of productive trees was more variable when

*q*,

_{κ}*p*

_{MI}and

*β*increased, and when

*θ*

_{exp}decreased, i.e. when the epidemic was stronger (see electronic supplementary material, figure S7

*b*).

### 4.3. Simulation of different epidemic scenarios

Because the sharka-specific sensitivity analysis highlighted the role of introductions in epidemic spread, several epidemic scenarios were simulated to illustrate the impact of introduction parameters. In the absence of disease, a median number of 565 (558–572) fully productive trees were grown per hectare and per year (figure 5, scenario A). The number of productive trees was almost the same, 564 (554–572), when PPV was introduced only once through the planting of one infectious tree (scenario B). Epidemics slightly impacted the number of productive trees (556; CI_{95} = 538–568) when several infected trees were allowed to be planted (scenario C), showing the importance of prevalence at introduction. In more realistic scenarios, corresponding to repeated introductions of PPV [27], the mean number of fully productive trees dropped to 547 (499–568), 541 (498–564) and 509 (462–551) when the first introduction occurred in a weakly (scenario D), moderately (scenario E) or highly (scenario F) connected patch, respectively. These results illustrate not only the impact of multiple introductions on sharka epidemic spread, but also and most importantly of the location of the first introduction. Videos of epidemic spread in the landscape in scenarios B to F are available as electronic supplementary material, Videos S1–S5.

## 5. Discussion

### 5.1. Sobol's method and metamodelling complement each other

The first aim of this study was to investigate the combined use of Sobol's method [44,45] and polynomial metamodelling to identify key factors and interactions in a stochastic spatiotemporal model simulating epidemics among host plants. The parameter space was explored through a quasi Monte Carlo sampling, generated using Sobol's sequences because of their good space-filling and fast convergence properties. Sobol's method to calculate sensitivity indices benefited from numerous improvements of its precision and computational cost [37–39,44,46,47]. In our work, Sobol's method proved to be an efficient tool to rank parameters by influence on the output variable, and thus to identify potential targets for further research or disease management. Additionally, in the case of large differences between first-order and total sensitivity indices, this method reveals the importance of interactions, which must be further investigated for a careful management. In our approach, Sobol's method and polynomial metamodelling complemented each other. Both methods gave extremely close estimates of the first-order sensitivity indices. Sobol's method is more straightforward in the estimation of the total indices, as well as the CIs (figure 2). Polynomial metamodelling helped identify specifically which interactions are involved in the epidemic process, and visualize model behaviour in response to the variation of input parameters (figure 4).

The approach can be applied to many simulation models, and is especially useful for stochastic models. To account for stochasticity, some authors replicated simulations for a few selected parameter combinations to characterize the variance or the distribution of the output variable [48,49]. In another approach, every parameter combination of the simulation design was replicated in such way that an extra sensitivity index could be calculated by ANOVA to describe the proportion of the total variability attributable to the variability among parameter combinations [8,15]. Here, we adopted an approach developed originally with non-spatial epidemic models [7,29]. Stochasticity was accounted for by replicating simulations for each parameter combination and by performing sensitivity analyses on the mean and standard deviation of the replicates. However, the total number of simulations (including the different parameter combinations and the stochastic replicates per combination) grows as a power of the number of parameters targeted in the sensitivity analysis. This curse of dimensionality necessitates a compromise between the resolution of parameter space exploration, the precision of the model output and the total computational cost [37,50]. In this study, the parameter space was thoroughly explored using 15 000 different parameter combinations, and each combination was replicated 50 times because this number was largely sufficient to get robust estimates of the mean and standard deviation associated with each combination (not shown).

### 5.2. Impact of the variation range of the target parameters

Our second aim was to uncover the key drivers of plant epidemics, using sensitivity analysis. The choice of variation ranges for the target parameters has a great impact on the results of such analysis. In the first sensitivity analysis, we used the broadest possible ranges of values for the target parameters, ignoring any prior information. This analysis is thus indicative of the relative intrinsic influence of the epidemiological processes on model behaviour in the absence of knowledge on parameter values. Our results showed that the latent period duration is the most influential parameter of the model. Depending on the latent period duration, the impact of the simulated epidemic may be completely different, from insignificant and stable (when the latent period lasts more than 15 years) to important and highly variable (if it lasts less than 5 years). It is very likely that the threshold distinguishing insignificant epidemics from others (i.e. approximately 15 years) is linked to the mean orchard duration. In order to test whether the high influence of the latent period in the broad-range sensitivity analysis was just due to its very wide variation range, we replicated this analysis with smaller upper bounds for the mean duration of the latent period (from 5 to 33 years, see electronic supplementary material, figure S8). This additional information demonstrates that the results are qualitatively robust to changes in this upper bound. One may also note that the most significant interaction involves the mean latent period and the transmission coefficient (figure 3*a*), which may explain why the latter becomes the most influential parameter for epidemics whose latent periods are below 7 years (see electronic supplementary material, figure S8).

In the second sensitivity analysis, the variation range of all target parameters focused on the best available values regarding sharka epidemiology in peach orchards. These variation ranges defined a parameter space which had been poorly explored in the previous sensitivity analysis (figure 4*a*, thick line on the *x*-axis), in spite of the large number of tested parameter combinations. Within this sharka-specific analysis, whatever the parameter combination the mean equivalent number of productive trees was significantly lower than what could be obtained in the absence of disease. This encourages the application of control measures. Here, in contrast with the broad-range analysis, we show that the connectivity of the patch of introduction is the most influential parameter on sharka epidemics. Consequently, the outgoing connectivity of a given patch of the landscape can be considered as an index of the risk to initiate an epidemic, and sharka control may include for example preventive and regular surveys of these patches.

Because this index of epidemic risk depends on patch connectivity, it was surprising to observe a low influence of the dispersal kernel (represented by the parameter *W*_{exp}) in both the broad-range and sharka-specific analyses. On the contrary, mean dispersal distance was shown to be a key driver of spread of fungal pathogens [14] and invasive plants [10]. In our broad-range analysis, one may think that the low impact of the dispersal kernel might be due to the fact that the variance of the weighting variable (*W*_{var}) was held constant, which may result in dispersal kernels with similar shapes. Nevertheless, variation of the dispersal kernel covered a wide range of distances (see electronic supplementary material, figure S4). Thus, another explanation may be the patchiness of the real agricultural landscape that we used for the simulations (see electronic supplementary material, figure S1). Indeed, short-distance dispersal resulted in very slow disease spread, and long-distance dispersal resulted in a higher proportion of dispersal events ending outside the orchards (where hosts are absent). This is an interesting property of fragmented agricultural landscapes. In the sharka-specific analysis, the low influence of the dispersal kernel can be explained in part by the narrow range of variation in *W*_{exp}. To investigate the impact of the parameter variation ranges, we performed a sensitivity analysis similar to the sharka-specific analysis except that *W*_{exp} varied as in the broad-range analysis. This resulted in a very high sensitivity index for *W*_{exp} (see electronic supplementary material, figure S9). This exemplifies how the sensitivity indices depend on the extent of the variation ranges of the target parameters. As another illustration of the influence that the variation range of a parameter can have, a model simulating *tomato yellow leaf curl virus* (TYLCV) epidemics was almost insensitive to the latent period duration [11], whereas it was the first and the second most influential parameter in our broad-range and sharka-specific analyses, respectively. This difference can be explained by the fact that our output variable (the number of productive hosts) included latently infected hosts whereas the output in [11] did not, but also by the fact that the variation range of the latent period duration was narrower in the TYLCV study. These elements, and the strong differences between the two analyses performed in the present study, highlight the importance of identifying appropriate ranges of variation for the target parameters in a sensitivity analysis. In particular, our results show that the parameters with the greatest intrinsic influence on overall behaviour of the model are not necessarily the key drivers of epidemic spread in a specific situation. Therefore, in different pathosystems and epidemiological contexts, the variation ranges, and especially their width, must be re-examined in order to identify relevant targets for research efforts and management strategies.

### 5.3. Genericity of the model

In this study, an existing spatiotemporal stochastic model of sharka spread, based on an SEIR architecture, was modified in order to perform sensitivity analyses. This model corresponds to vector-borne diseases of perennial plants (small turnover of crops, long latent period, pathogen introductions at crop planting), but its parametrization and flexibility enable the study of various horizontally transmitted plant infections, including wind-dispersed epidemics of annual crops. Nevertheless, some assumptions of the model are specific to sharka and should be re-examined for other diseases. In particular, for vector-borne diseases transmitted in a persistent manner, infectious vectors are able to transmit the pathogen to several healthy hosts. Thus, modelling such pathosystem needs to account for vector movement throughout their infectious period, which generally does not start immediately after acquisition. The resulting dispersal function may thus have a completely different shape. We also considered that latently infected hosts were still fully productive, whereas infectious individuals had a null productivity (no yield, or sales ban), and that the disease did not trigger host death (which influences the duration of the infectious period); such assumptions do not apply to all diseases.

Our study uncovered the key role played by highly connected introduction patches on sharka spread. This result highlights that paying close attention to these patches might prevent serious epidemics. Our next aims are to enrich this model with various control strategies and simulate various realistic landscapes in order to more directly help improve disease management.

## Data accessibility

Data and code are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.b8kb0 [51].

## Authors' contributions

L.R., S.D., E.J., S.S. and G.T. designed the project; L.R. and D.R.J.P. performed the research; L.R., C.B., E.J., S.S. and G.T. analysed the data; and L.R., S.D., E.J., S.S. and G.T. wrote the manuscript. All authors gave final approval for publication.

## Competing interests

All authors declare they have no competing interests.

## Funding

L.R. was supported by a DGA-MRIS scholarship, and this work was partly funded by the FP7 SharCo programme and FranceAgriMer.

## Acknowledgements

The authors thank G. Labonne and S. Grizard for constructing the reference database, SRAL RA, and FREDON 30 (especially J. Aymard) for providing surveillance data and for their expertise in prunus cultivation, F. Bonnot for stimulating statistical discussions, L. Barrett for reviewing the manuscript, A. Bouvier for technical support with CaliFloPP, the INRA BioSP bioinformatics platform and particularly L. Houde for providing support.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3957709.

- Received September 22, 2017.
- Accepted December 1, 2017.

- © 2018 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.