## Abstract

In this work, we explore epidemiological dynamics by the example of tuberculosis in Russian Federation. It has been shown that the epidemiological dynamics correlates linearly with the virulence of *Mycobacterium tuberculosis* during the period 1987–2012. To construct an appropriate model, we have analysed (using LogLet decomposition method) epidemiological World Health Organization (WHO) data (period 1980–2014) and obtained, as result of their integration, a curve approximated by a bi-logistic function. This fact allows a subdivision of the whole population into parts, each of them satisfies the Verhulst-like models with different constant virulences introduced into each subsystem separately. Such a subdivision could be interconnected with the heterogeneous structure of mycobacterial population that has a high ability of adaptation to the host and strong mutability.

## 1. Introduction

Tuberculosis remains a main problem for public health resulting in new TB cases and deaths [1] due to high development of multi- and extensively drug resistant (MDR and XDR) *Mycobacterium tuberculosis* (*MbT*). The scale of the tuberculosis epidemic with multi-drug resistance (MDR-TB) is huge; according to the World Health Organization (WHO) report, 450 000 patients with MDR-TB were registered in 2012 year. Also, according to the WHO, in 92 countries were registered the cases with extensively drug-resistant TB (XDR-TB) that is the most hard form of disease leading to the lethal outcome. As was reported by the Department of Public Health and Communications of Ministry of Healthcare of the Russian Federation, the incidences with sensitive form of tuberculosis decreased in 2015 comparing with 2014, but MDR-TB cases were observed to rise among patients for those whose tuberculosis was detected for the first time. Thus, MDR-TB incidence increased from 4.6 to 5.2 per 100 thousand of population, and the proportion of patients with MDR-TB among persons with bacterial excretion at pulmonary tuberculosis was 47.3%. At the same time, the bacterial load of respiratory material of patients with MDR/XDR-TB does not differ from that in the drug sensitivity tuberculosis, indicating the viability of the pathogen, even with a wide range of its resistance to anti-TB drugs and high epidemiological risks of such strains [2].

Mathematical modelling of tuberculosis development and expansion could be helpful at the diagnosis and prognoses at the TB treatment on the population-level (for review, see [3]). The basis for most models recently developed is the SIR model [4] which includes different types of subpopulations (susceptible, infected and recovered) and describes transmission dynamics between them. There are relatively simple models taking into account uninfected, latent TB and active TB persons and more detailed models that include subdivisions to TB strains: drug sensitive and drug resistant allowing to describe different ways for the infection and the recovery in population in different countries [3,5–8].

To describe a trend in tuberculosis mortality using data for a long period of time, Holloway *et al.* [9] have proposed an approach based on the data fitting by logistic curves without introduction of any detailed mathematical model. Such technique allowed to characterize the dynamics of tuberculosis decline and to show that social aspects in TB prevention, namely, nutrition, hygiene, education about TB, etc., have an impact on reducing the incidence of tuberculosis in a population in most countries and could be a serious strategy in the fight against the disease.

At the same time, a not so long-term dynamics, which has non-monotone character, is still a very challenging problem. One of the key questions here is not only social but microbiological factors affecting such dynamics as well as their combination (for review, see [10,11]). In fact, the key parameter, a knowledge of which allows the conclusion, whether or not an epidemic outbreak occurs, is the basic reproduction number

Thus, the primary goal of our work is to reveal a possible time-dependent functional composition which adequately approximates actual time dynamics of tuberculosis outbreak in Russia during last three decades reflected in WHO data; to discuss a model, which provides such dynamics, as well as possible control factors, which govern the multimodal TB epidemic activity in Russia detected over several last decades; as well as to discuss an origin of the same, qualitatively noted for *M. tuberculosis* in laboratory studies of the same period.

## 2. Model and method

First of all, we need to determine a class of functional dependencies, which are suitable for the data fitting. As a most natural candidate we consider logistic curves, which can be obtained as certain approximate solutions of the the three-compartmental Kermack–McKendrick SIR model [4]
*S* and *I* mean the number of susceptible and infected persons, respectively. The constant *k* is the parameter determined by the infection properties, e.g. by virulence of *MbT* strains, and social (a contact rate) reasons.

The variable *R*, which can be defined as
*t*]. Note that we do not connect this variable with a number of recovered/removed persons, as has been done in the original model interpretation, but rather as a generalized number of persons who cannot be further considered as susceptible to *primary* TBC infection.

In fact, we will use below the data (WHO and laboratory data) obtained for patients with tuberculosis detected for the first time, which usually is accompanied by active bacterial excretion to environment. Therefore, the main goal of the hospital’s treatment is to stop an excretion and then the patient is transferred to out-patient treatment. After cessation of bacterial excretion, the patient is either completely cured—in this case, complete decomposition of the inflammation foci or fibrosis of the destruction areas occur in the lungs—or passes into the state of the chronic process, which is characterized by the unclosed inflammation foci that can result in the tuberculosis relapse. The relapse is not considered as first time detected. Whence, the parameter *τ* is attributed to the average duration of an active phase of tuberculosis, as it is conventionally interpreted while using SIR model and its generalizations for TB modelling [7,8,12]. Its combination with the parameter *k*, described above, gives the basic reproduction number

The system (2.1)–(2.3) is supplied with the conservation law
*R*_{0}=*R*(0)=0 at the very start of an epidemic.

As a result, the system (2.1)–(2.3) with (2.5)–(2.6) substituted, reduces to one nonlinear ordinary differential equation

Following Kermack & McKendrick [4], it is possible to introduce the expansion
*kτ* (see, for example, the discussion of applicability of such simplification to modelling plague and influenza outbreaks in [17]).

However, in contrast to the original [4], where an arbitrary *S*_{0}=*S*(0) is considered that is used now as a conventional approach [18], we use the normalization *const*.=1 in equation (2.5) and take into account that *I*_{0}≪*S*_{0}, i.e. *S*_{0}≅1 for this norm. As a result, equation (2.7) reduces to Verhulst’s logistic equation

Such approach has several advantages useful for the analysis of realistic medical data. It operates with lesser number of adjustment constants that results in better stability of numerical algorithms; the solution of equation (2.8) is a relatively simple logistic function, in contrast to the conventional case of an arbitrary *S*_{0} [4,17], which requires the hyperbolic tangent function.

At the same time, this reduced representation keeps the basic features, which are required to reproduce an epidemic outbreak: equation (2.8) has two stationary points: the unstable *R*_{s1}=0 and the stable *R*_{s2}=2(*kτ*−1)/(*kτ*)^{2}. As a result, *R* takes its values within the interval *R*∈[*R*(0)>0,*R*_{s2}] and monotonously grows since d*R*/d*t*≥0 within this interval.

In particular, the second derivative
*R*=*R*_{s2}/2, which implies that *R* curve has an inflection point and the number of actively infected persons *I* has a unique maximum there due to its definition as a derivative of *R*, see (2.3). Thus, a shape of outbreak will be described.

We need to note that one needs to take a non-zero (although it can be arbitrary small) initial condition *R*_{0}=*R*(0) to solve (2.8). It is a consequence of the simplification applied to the exact equation (2.7). However, this fact does not prevent usage of the logistic curve

Thus, we will use further the logistic function as main building blocks for the analysis of recorded medical statistics. But these data (see blue circles in figure 1; their origin will be described in detail below) differ from a single pure logistic growth. For this reason, we will apply the so-called LogLet transform proposed in the work [19] for the decomposition of complex non-stationary growth processes into a series of saturated partial components. Such an approach was proposed in the work cited for a decomposition of the data representing some economic process, such as a sequential replacement of product models on a market. This situation has a certain similarity to the case of change of prevailing microbial strains presented among a population within the concept of so-called ‘Evolution of microbial markets’ [20].

If the full cumulative number of people suffering from the infection will be considered as composed by several subgroups subjected to different *k*_{i}, *τ*_{i}, then the total evolution of *R*(*t*) can be represented as a sum of solutions (2.9) supplied with indices *i* denoting individual logistic components,
*d*_{i} is determined by the choice of initial background epidemic level mentioned above.

Note also that the standard basic solutions (2.9) allow the coordinate alteration, which gives a straight line in semi-logarithmic coordinates
*F*_{i}(*t*)=*R*_{i}(*t*)/*κ*_{i} in the fraction above and expressing the logarithmic term in the denominator there as

Being applied to the full processed data

## 3. Data and results of their processing

All data explored were taken from the European health for all database (HFA-DB) presented on WHO/Europe’s portal [21]: incidence of tuberculosis per 100 000 for Russia within the time range 1980–2014. They are shown in figure 1 as blue circles. To evaluate their LogLet decomposition, the two-stage numerical procedure was used. At the first step, we applied the specialized online software `Loglet Lab 3.0` [22] to obtain first estimations for all coefficients in the series (2.10). The input data were represented as the cumulative sum
*j* denotes a time point of the current time *t*_{j}.

At the second step, the obtained coefficients *d*_{i}, *a*_{i}, *b*_{i} *k*_{i} were additionally adjusted pointwise to the processed data. The final values obtained at this stage are given in the caption to figure 1, which shows the function (2.10) with these parameters as the red solid curve. One can note quite accurate visual reproduction of the data dynamics that is confirmed by the statistical analysis: the correlation coefficient between actual and fitted data is equal to 98.8%.

Figure 1 demonstrates also two individual components of (2.10): green and black dashed lines correspond to indices *i*=1 and *i*=2, respectively, as well as the decomposition of the processed data into the partial processes obtained as difference of the total *R*_{j} and the complementary partial fitted components (they are shown as markers).

One can see that these partial processes actually resemble logistic growth with saturation (the correlations between the component’s data and their fits are 98.47% and 99.97% from these components.) This conclusion is supported by the Fisher–Pry transform too (figure 2): both components are fitted by straight lines. Moreover, this representation allows clear distinguishing between the time ranges for the prevalence of each partial logistic process.

This time-separated linear components show the coexistence time period between components is quite small. This behaviour may originate from the fact that tuberculosis is a persistent infection widely represented in population but subjected to mutations of *MbT* that correspond to the larger values of the parameters *a*_{2} and *κ*_{2} in comparison with *a*_{1} and *κ*_{1} characterizing ‘old traditional’ tuberculosis. The former is introduced into the population relatively recently and is more aggressive, while the dynamics of previously prevailing strain is practically saturated, see the green curve in figure 1. Some medical evidence of these conclusions will be given in the next section.

Thus, the results of the direct bi-logistic fitting medically recorded data argue that we deal with a replacement of the main infection agent rather than with the competitive process.

Finally, figure 3 presents the dynamics model incidences obtained via the definition (2.3) with the substituted fitting function (2.10). One can see that it catches the principal bimodal character of the actual data. Owing to the construction of the model, these feature can be explained as a presence of two sequential outbreaks. The first one practically decays within the considered time interval, while the second one only started before 1980 and reached its maximum around 2005.

## 4. Discussion and conclusion

The data processing of TB incidences evaluated based on the bi-logistic combination of the simplified SIR models shows that this models reproduces the observed time dynamics with sufficiently high accuracy. It reveals the detailed background of the non-monotonic dynamics as a coexistence of two SIR processes with a time lag and different basic reproduction numbers

It should be pointed out that such transient behaviour is not unique among the time courses of infectious diseases revealed by mathematical modelling. For example, a study [23] of the classic example of localized epidemic outbreak, the plague in Eyam village in 1665–1666, shows the picture quite similar to our figure 1, which has been interpreted as a two-phase process, where the second phase corresponded to the more deadlier (i.e. more virulent) form of the disease. This also allows the discussion of the change in virulence data of TB obtained in laboratory investigations (they are shown in figure 3 as asterisks).

It should be noted that experimental data on the correlation between strains with high and low virulence were obtained using various models, both in animals (subcutaneous, intracranial infection of guinea pigs and also intravenous infection of mice) and on the THP-1 cell line. Each value (asterisks in figure 3) is the percentage of strains with high virulence with respect to the total number of strains of the pathogen investigated during this period; these data have been obtained from patients of the clinics of the St Petersburg Research Institute of Phthisiopulmonology in 1987–2012.

Recently, it is argued [13] that the pathogen virulence has a multifaceted interconnection with the values of the basic reproduction number

Moreover, in addition to the classic coevolution model of host–virus interaction [25] (see also [10]), which connects virulence and

It should be noted that tuberculosis is the result of the dynamic and multiple-factor interaction of the host and the pathogen with specific genetic and phenotype characteristics for each of them. At the same time, the virulence of a particular strain population isolated from a specific host (macroorganism) during disease is determined by the realization of the pathogenic potential of the cells of this population, their adaptation to the internal environment of the given host. The host–pathogen relationships are realized at different stages: from molecular (genes, proteins, receptors, etc.) to population level. Each level has its own spatial volume and duration of interaction of macro- and microorganisms, while each subsequent level is characterized by the extension of time and space. For example, adhesion of *M. tuberculosis* on the surface of a human cell caused by the interaction of surface molecules of the cell membrane of the pathogen and receptors of the epithelium of the respiratory tract or macrophages is carried out simultaneously and in a volume determined by the molecular level. Bacteria phagocytosis, inflammation and then destruction in the lung or other tissue and finally systemic changes in macro-organism can last over a long period of time (days–months). Coevolution, or interaction between the population of strains of the pathogen circulating in a certain region and the host population, characterized by certain socio-, geo- and ethnic characteristics, continues for decades; *Homo sapiens–Mycobacterium tuberculosis* relationship lasts a thousand years. Of course, the host–pathogen relationships are realized at all levels simultaneously, where all levels interact in a mutual manner. Hence, we would like to clarify that the definition of ‘virulence’ is more correct to apply for the host–pathogen relationship only at the level of the macroorganism/pathogen, but for the population level, which is studied by the methods of molecular epidemiology, it would be better to suggest the concept of pathogen virulent potential of phylogenetic line/subpopulation/cluster. It should be emphasized that the host–pathogen relationship in tuberculosis is affected by the geno- and phenotypic heterogeneity of the populations of both players of this process [27,28].

This is our case, when two subpopulations are considered susceptible to two possible groups of mycobacteria characterized by different virulence reflected in *M. tuberculosis* belonging to the Beijing genetic line, one of the most representative lines on the territory of the Russian Federation. In the years 2004–2006, their population reached 44.4% in the Pskov region [29], 34.0% in central Russia [30], 50.0% in Tuva republic [31], 71.0% in Eastern Siberia [32]. In general, members of this family are distinguished by a high transmission [33], association with multi-drug resistance [34,35] and the ability to cause disease, where it should be noted that the variability of pathogenic characteristics are associated with genetic heterogeneity of strains within bacterial subpopulations [36]. The most successful clone (subline) of the Beijing family on the territory of Russia is B0/W148 [37,38]. For example, in the territory of the Republic of Karelia has been observed an insignificant decrease of the strains fraction of the Beijing family from 67.0% in 2006 to 55.1% in 2012, while the fraction of clonal cluster B0/W148 has increased from 22.9% to 34.9%, respectively. In St Petersburg (the neighbouring region), these ratios remained unchanged. At the same time, the strains of the B0/W148 cluster were 18.0% (2006) and 37.6% (2012) of the number of drug-resistant strains isolated in the territory Karelia. A high degree of association of this subline with drug resistance has been obtained also for strains in St Petersburg [37,39].

Note, respectively, that the ratio of these subgroups (black and green lines in figures 1–3) presented in the whole population changes with time that gives time-varying data for the laboratory determined virulence [40]. Thus, such subdivision opens the question for more detailed characterization and classification of pathogens and subjects of their action in microbiological studies that can be considered as a future perspective challenge.

## Data accessibility

There is no supporting material or special data since all equations, parameters used and algorithm are presented in the main text of the work and could be reproduced by anyone. Thus, the paper contains complete self-sufficient information and does not need any additional data files.

## Authors' contributions

A.I.L. and E.B.P. developed model, simulated it numerically and wrote the manuscript. O.A.M. and B.I.V. took part in the problem statement and interpreted results from medical point of view.

## Competing interests

The authors declare no competing interests.

## Funding

A.I.L. and E.B.P. are partially supported the Ministry of Education and Science of the Russian Federation within research project no. 3.9499.2017/8.9 included in the basic part of research funding assigned to Kursk State University.

- Received August 1, 2017.
- Accepted August 10, 2017.

- © 2017 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.