## Abstract

The rapid growth of population in urban areas is jeopardizing the mobility and air quality worldwide. One of the most notable problems arising is that of traffic congestion. With the advent of technologies able to sense real-time data about cities, and its public distribution for analysis, we are in place to forecast scenarios valuable for improvement and control. Here, we propose an idealized model, based on the critical phenomena arising in complex networks, that allows to analytically predict congestion hotspots in urban environments. Results on real cities’ road networks, considering, in some experiments, real traffic data, show that the proposed model is capable of identifying susceptible junctions that might become hotspots if mobility demand increases.

## 1. Introduction

Urban life is characterized by a huge mobility, mainly motorized. Amidst the complex urban management problems, there is a prevalent one: traffic congestion. Several approaches exist to efficiently design road networks [1] and routing strategies [2]; however, the establishment of collective actions, given the complex behaviour of drivers, to prevent or ameliorate urban traffic congestion is still at its dawn. Usually, congestion is not homogeneously distributed around all city areas but there are salient locations where congestion is settled. We call this locations congestion hotspots. These hotspots usually correspond to junctions and are problematic for the efficiency of the network as well as for the health of pedestrians and drivers. It has been shown [3] that drivers queuing in a traffic jam are the most affected individuals to car exhaust pollution inhalation. In addition, these hotspots are usually located in the city centre, magnifying the problem [4]. Assuming that congestion is an inevitable consequence of urban motorized areas, the challenge is to develop strategies towards a sustainable congestion regime at which delays and pollution are under control. The first step to confront congestion is the modelling and understanding of the congestion phenomena.

The modelling of traffic flows has been a prevalent hot topic since the late 1970s when Gipps’ model appeared [5]. Gipps’ model and other car-following models [6,7] have evidenced the necessity of modelling traffic flows to improve road network efficiency and also have shown how congestion severely affects the traffic flows. Since 10 years ago the complex networks community has also proposed stylized models to analyse the problem of traffic congestion in networks and design optimal topologies to avoid it [8–21]. The focus of attention of the previous works was the onset of congestion, which corresponds to a critical point in a phase transition, and how it depends on the topology of the network and the routing strategies used. However, the proper analysis of the system after the onset of congestion has remained analytically slippery. It is known that when a transportation network reaches congestion, the system becomes highly nonlinear, large fluctuation exists and the travel time and the number of vehicles queued at a junction diverge [16]. This phenomenon is equivalent to a phase transition in physics, and its modelling is challenging [22–24]. Here, we propose an idealized model to predict the behaviour of transportation networks after the onset of congestion. The presented model is analytically tractable and can be iteratively solved up to convergence. To the best of our knowledge, this is the first analytical model that is able to give predictions beyond the onset of congestion. We present the model in terms of road transportation networks but it could also be applied to analyse other types of transportation networks, such as computer networks, business organizations or social networks.

## 2. Transportation balance equations

To identify congestion hotspots in urban environments, we propose a model based on the theory of critical (congestion) phenomena on complex networks. The model, that we call the *microscopic congestion model* (MCM), is a mechanistic model (yet simple) and analytically tractable. It is based on assuming that the growth of vehicles observed at each congested node of the networks is constant. This usually happens in real transportation networks at the stationary state. The assumption allows us to describe, with a set of balance equations (one for each node), the increment of vehicles in the junction queues and the number of vehicles arriving or traversing each junction from neighbouring junctions. Mathematically, the increment of the vehicles per unit time at every junction *i* of the city, Δ*q*_{i}, satisfies the following balance equation
*g*_{i}(*t*) is the average number of vehicles entering junction *i* from the area surrounding *i* at time *t*, *σ*_{i}(*t*) is the average number of vehicles that arrive to junction *i* from the adjacent links of that junction and *d*_{i}(*t*)∈[0,*τ*_{i}] corresponds to the average number of vehicles that actually finish in junction *i* or traverse towards other junctions. Note that the value of *d*_{i} is upper-bounded by the maximum number of vehicles *τ*_{i} that can traverse junction *i* in a time step. This simulates the physical constraints of the road network. A graphical explanation of the variables of the model is shown in figure 1.

The system of equations (2.1) defined for every node *i*, is coupled through the incoming flux variables *σ*_{i}(*t*), that can be expressed as
*P*_{ji}(*t*) accounts for the routing strategy of the vehicles (probability of going from *j* to *i*), *p*_{j}(*t*) stands for the probability of traversing junction *j* but not finishing at *j* and *S* is the number of nodes in the network.

For each junction *i*, the onset of congestion is determined by *d*_{i}=*τ*_{i}, meaning that the junction is behaving at its maximum capability of processing vehicles. Thus, for any flux generation (*g*_{i}), routing strategy (*P*_{ij}) and origin–destination probability distribution, equations (2.1) can be solved using an iterative approach to predict the increase of vehicles per unit time at each junction of the network (Δ*q*_{i}(*t*)) (see §3). The only hypothesis we use is that the system dynamics has reached a stationary state in which the growth of the queues is constant. It is worth commenting here that the MCM model considers a fixed average of new vehicles entering the system *g*_{i}. However, *g*_{i} certainly changes during daytime, with increasing values in rush hours and lower values during off-peak periods. MCM can easily consider evolving values of *g*_{i} provided the time scale to reach the stationary state in the MCM (which is usually of the order of minutes in real traffic systems) is shorter than the rate of change in the evolution of *g*_{i} (which is usually of the order of hours for the daily peaks).

## 3. Microscopic congestion model

Let node *i* denote a road junction, edge *a*_{ij} the road segment between junctions *i* and *j*, *N*^{in}_{i} and *N*^{out}_{i} the sets of ingoing and outgoing neighbouring junctions of junction *i*, respectively, and *S* the number of junctions in the road network of the city. Incoming vehicles to node *i* at each time step can be of two types: those coming from other junctions *N*^{in}_{i} and those that start their trip with node *i* as their first crossed junction. We consider this second type of incoming vehicles as generated in node *i*. Our MCM describes at the stationary state the increment of the vehicles per unit time at every junction *i* of the city, Δ*q*_{i}, as
*σ*_{i} to node *i* in the stationary state as

As vehicles just generated in a certain node are not affected by the congestion in the rest of the network, we separate their contributions in the computation of probabilities *p* and *P*. Thus, we decompose *p*_{i} as
*i* (*p*^{gen}_{i}) whose destination is not *i* (*p*^{loc}_{i}) and the second term accounts for vehicles not generated in *i* whose destination is not *i* (*p*^{ext}_{i}). Supposing trips consist in travelling through two or more junctions, we have that *p*^{loc}_{i}=1. Probability *p*^{gen}_{i} is equal to the fraction of vehicles generated in *i* with respect to the total number of incoming vehicles
*p*^{ext}_{i} can be expressed in terms of the effective node betweenness *i* that arrive to node *i* at each time step)
*i* accounts for the expected number of vehicles each node *i* receives per unit time considering the routing algorithm and the overall congestion of the network. See §3.2 for an extended description and computation of the effective node betweenness

In the same spirit, we decompose the probability *P*_{ji} that a vehicle waiting in node *j* goes to node *i* as
*j* (*p*^{rgen}_{j}) that go to node *i* (*P*^{loc}_{ji}) and the second term to the routed vehicles not generated in *j* that go to node *i* (*P*^{ext}_{ji}). Similarly as before, *p*^{rgen}_{j} can be expressed as the rate between the vehicles generated in *j* and the total number of routed vehicles
*P*^{loc}_{ji} and *P*^{ext}_{ji} can be computed in terms of the normalized effective edge betweenness of the network
*j* and *j*. Equivalently to the effective node betweenness *E*^{loc}_{ji} and *E*^{ext}_{ji} corresponds to the classical edge betweenness. Moreover, *P*_{ji} is an exact expression before and after the onset of congestion.

Eventually, the MCM is composed of a set of *S* equations (Δ*q*_{i}=*g*_{i}+*σ*_{i}−*d*_{i}), one for each junction, and, in principle, a set of 2*S* unknowns, Δ*q*_{i} and *d*_{i} for each junction. To see that the system is indeed determined, we need to note that for congested junctions Δ*q*_{i}>0 and, thus, after the transient state, *d*_{i}=*τ*_{i}. For the non-congested junctions, we have that Δ*q*_{i}=0 and consequently *d*_{i}=*g*_{i}+*σ*_{i}. In conclusion, for any node *i*, either *d*_{i}=*τ*_{i} or *d*_{i}=*g*_{i}+*σ*_{i} which reduces the number of unknowns to *S*.

To solve the model given a fixed generation rate *g*_{i}, we start by considering that no junction is congested and we solve the set of equations equations (3.1)–(3.9) by iteration. It is possible that some nodes exceed their maximum routing rate. If this is the case, we set the node with maximum *d*_{i} as congested and we solve the system again. This process is repeated until no new junction exceeds its maximum routing rate.

### 3.1. Onset of congestion using the microscopic congestion model

Most of the works that consider static routing strategies assume that the generation rate of vehicles is the same for all nodes, *g*_{i}=*ρ*. In that case, it is possible to compute the critical generation rate *ρ*_{c} such that for any generation rate *ρ*>*ρ*_{c} the network will not be able to route or absorb all the traffic [25–30]. After this point is reached, the number of vehicles *Q*(*t*) in the network will grow proportionally with time, *Q*(*t*)∝*t*, as some of the vehicles get stacked in the queues of the nodes. This transition to the congested state is characterized using the following order parameter:
*Q*〉 represents the average increment of vehicles per unit of time in the stationary state. Basically, the order parameter measures the ratio between in-transit and generated vehicles.

In the non-congested phase, the number of incoming and outgoing vehicles for each node can be computed in terms of the node’s algorithmic betweenness *B*_{i} [25]. In particular,
*q*_{i}=0 for all nodes and consequently
*i* becomes congested when it is required to process more vehicles than its maximum processing rate, *d*_{i}>*τ*. Thus, the critical generation rate at which the first node, and so the system, reaches congestion is

This is one of the most important analytical results on transportation networks with static routing strategies. In the following, we show that we can recover equation (3.11) before the onset of congestion using our MCM approach. After substitution of the expression of the probabilities in equation (3.2)
*d*_{j}=*ρ*+*σ*_{j}), it simplifies to
*i* and then go to *i*. Each neighbour contributes with two terms, the paths that go through *j* coming from other nodes, and the paths that start in *j*.

### 3.2. Effective betweenness in congested transportation networks

The effective betweenness *i*, as defined in [25], accounts for the expected number of vehicles each node *i* receives per unit time. When the network is not congested and the vehicle generation rate *g*_{i} is equal for all nodes, *g*_{i}=*ρ*, the number of vehicles each node receives can be obtained using equation (3.11). However, if the network is congested, the traffic dynamics becomes highly nonlinear and the value of *σ*_{i} computed in equation (3.11) becomes a poor approximation.

Suppose we focus on a particular congested node *j** of the network. For *j**, being congested means that it is receiving more vehicles that the ones it can process and route. In particular, from the *σ*_{j*}+*g*_{j*} vehicles that arrive to the node, only *τ*_{j*} can be processed at each time step.

Therefore, the contribution to the effective betweenness *s*,*t*), that traverse the congested node *j** before reaching *i*, must be rescaled by the fraction of processable vehicles
*j**,*k**,…, the remaining fraction of paths that will reach the target node will be the result of the application of the multiple rescalings *s*_{x*}.

The computation of *s*_{j*} is not straightforward. In general, *σ*_{i} is not known after the onset of congestion and depends on the effective betweenness that requires, at the same time, to know the *s*_{j*} fraction for all congested nodes. Thus, an iterative calculation is needed to fit all the parameters at the same time as we do in our MCM.

The effective arrivals *i* that arrive at node *i* at each time step. This value in the non-congested phase can be obtained, considering homogeneous source and destination nodes, as
*e*_{i} and needs to be corrected accordingly using the same procedure presented above.

## 4. Results

To simulate the traffic dynamics of the road network, we assign a first-in-first-out queue to each junction that simulates the blocking time of vehicles before they are allowed to cross it and continue their trip. We suppose these queues have infinite capacity and a maximum processing rate that simulates the physical constraints of the junction. Vehicle origins and destinations may follow any desired distribution. In this work, we have considered two distributions: a random uniform distribution for the synthetic experiments, and one obtained considering the ingoing and outgoing flux of vehicles of the city of Milan. At each time step (of 1 min duration) vehicles are generated and arrive to their first junction. During the following time steps, vehicles navigate towards their destination following any routing strategy. Here, we have used two different routing strategies: shortest-path and random local search.

### 4.1. Congestion on synthetic networks

To evaluate the MCM, we have conducted experiments on several synthetic networks and with two different routing strategies: local search strategy and shortest path strategy. In both routing strategies we assume, for simplicity, that all vehicles randomly choose the starting and ending junctions of their journey uniformly within all junctions of the network. Thus, each junction generates new vehicles with the same rate *g*_{i}=*ρ*. For shortest path strategy, vehicles follow a randomly selected shortest path towards the destination. Without loss of generality, we fix *τ*=1 and analyse the performance of MCM for different values of *ρ*.

Figure 2 shows the accuracy on predicting the values of the order parameter *d*_{i} for shortest paths routing strategy. As in [25,30], this order parameter *η* corresponds to the ratio between in-transit and generated vehicles. In the electronic supplementary material, we extend the evaluation considering local search routing strategies and also evaluate the accuracy on predicting other variables of the model. All experiments show that the MCM achieves high accuracy in predicting the macroscopic and microscopic variables of the stylized transportation dynamics.

### 4.2. Application to real scenarios

INRIX Traffic Scorecard (http://www.inrix.com/) reports the rankings of the most congested countries worldwide in 2014. USA, Canada and most of the European countries are in the top 15, with averages that range from 14 to 50 h per year wasted in congestion, with their corresponding economic and environmental negative consequences. To demonstrate that the MCM model can be applied to real scenarios to obtain real predictions, in the following we apply the MCM model to the nine most congested cities according to the INRIX Traffic Scorecard (table 1).

We first focus on the city of Milan, the city with largest INRIX value. To evaluate the outcome of the MCM model, we first gather data about the road network topology using Open Street Map (OSM). OSM data represent each road (or way) with an ordered list of nodes which can either be road junctions or simply changes of the direction of the road. We have obtained the required abstraction of the road network building a simplified version of the OSM data which only account for road junctions (nodes). Then, for each pair of adjacent junctions we have queried the real travel distance (i.e. following the road path) using the API provided by Google Maps. The resulting network corresponds to a spatial weighed directed network [31] where the driving directions are represented and the weight of each link indicates the expected travelling time between two adjacent junctions (see electronic supplementary material, figure S21).

Second, we build up the dynamics of the model analysing real traffic data provided by Telecom Italia for their Big Data Challenge. The data provide, for every car entering the cordon pricing zone in Milan during November and December 2013, an encoding of the car’s plate number, time and gate of entrance (a total of 9 183 475 records). This allows us to obtain the (hourly) average incoming and outgoing traffic flow, for each gate of the cordon taxed area.

Given the previous topology and traffic information, we generated traffic compatible with the observations, and evaluated the outcome of the MCM model. Specifically, the simulated dynamics is as follows: for each vehicle entering the Area-C we fix a randomly selected location as destination and use the shortest path route towards it. After the vehicle has arrived to its destination, it randomly chooses an exit door and travels to it also using the shortest path route. This is similar to the well-known *Home-to-Work* travel pattern where vehicles arrive from the outskirts of the city, go to the city centre and then return to the outskirts. Specifically, in our simulation, traffic is generated in the peripheral junctions of the network, goes to a randomly selected junction within the city and then returns back to a randomly selected peripheral junction. We do not consider trips with origin and destination inside the city centre because public transportation systems (e.g. train or subway) usually constitute a better alternative than private vehicles for those trips. The maximum crossing rate of each junction *τ*_{i} accounts, among others, for the existence of traffic lights governing the junction, the width of the street as well as its traffic. We have not been able to get this information for the studied cities, and consequently we cannot set to each junction its precise value. Instead, without loss of generality and for the sake of simplicity, we set to all junctions the same maximum crossing rate, *τ*_{i}=15 (an estimation of the average of their real values).

Figures 3 and 4 show the obtained results. Figure 3*b* displays the predicted congestion hotspots on a map of Milan, panel (*a*) of the same figure shows a real traffic situation obtained with Google Maps. We see that the predicted congestion hotspots are located in the circular roads of Milan as well as on the arterial roads of the city; this agrees with the real traffic situation shown in panel (*a*). Figure 4*a* shows the distribution of the mean increments each junction has to deal with. This might be a good indicator to decide about future planning actuations to improve city mobility. However, differently from what is described in [11], the improvement of the throughput of a single junction might not be enough to improve city mobility because this might end up with the collapse of neighbouring junctions (their incoming rate *σ*_{i} will increase). This situation is similar to Braess’ paradox [33]. Figure 4*b* shows the mean increment of vehicles (in vehicles per minute) for each hour of the weekday. The figure clearly shows the morning and evening rush hours as well as the lunch time.

For the other top nine congested cities, we do not have previous traffic information, neither about the real flux of vehicles nor about the vehicle source and destination distributions (to obtain a fair comparison between all the analysed cities, we have not considered the Telecom traffic data for Milan here). Thus, for each city, we consider homogeneously distributed source and destination locations and the required road traffic to obtain an order parameter *η* compatible with the congestion effects recorded by INRIX sensing of real traffic. By relating the INRIX value and *η*, we are assuming that there exists a relation between the fraction of global congestion and the fraction of extra time wasted reported by INRIX. The obtained results are summarized in table 1, which shows that the number of hotspots is correlated with the INRIX value. This shows evidence that the percentage increase in the average travel time to commute between city locations is related to the number of congestion hotspots and the excess of vehicles within the city. The extracted road network topology for all the analysed cities as well as the predicted congestion hotspots is provided in electronic supplementary material, figures S1–S18.

## 5. Discussion

The previous results show that the MCM can be used to predict the local congestion before and beyond the onset of congestion of a transportation network. To the knowledge of the authors, this is the first analytical model that is able to give predictions beyond the onset of congestion where the system is highly nonlinear, large fluctuation exists and the number of vehicles on transit diverge with respect to time. Our model is based on assuming that the growth of vehicles observed in each congested node of the networks is constant, which allowed us to derive a set of balance equations that can accurately predict macroscopic, mesoscopic and microscopic variables of the transportation network.

Traffic congestion is a common and open problem whose negative impacts range from wasted time and unpredictable travel delays to a waste of energy and an uncontrolled increase of air pollution. A first step towards the understanding and fight of congestion and its related consequences is the analytical modelling of the congestion phenomena. Here, we have shown that the MCM model is detailed enough to give real predictions considering real traffic data and topology. These results pave the way to a new generation of stylized physical models of traffic on networks in the congestion regime that could be very valuable to assess and test new traffic policies on urban areas in a computer-simulated scenario.

## Data accessibility

Data available at Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.32mq0 [34].

## Authors' contributions

A.S.-R., S.G and A.A. contributed equally to the research and writing of the manuscript; A.S.-R. performed the experiments.

## Competing interests

We have no competing interests.

## Funding

This work has been supported by Ministerio de Economía y Competitividad (Grant FIS2015-71582-C2-1) and European Comission FET-Proactive Projects MULTIPLEX (grant no. 317532). A.A. also acknowledges partial financial support from the ICREA Academia and the James S. McDonnell Foundation.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3491559.

- Received February 8, 2016.
- Accepted September 8, 2016.

- © 2016 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.