## Abstract

Because of increasing global urbanization and its immediate consequences, including changes in patterns of food demand, circulation and land use, the next century will witness a major increase in the extent of paved roads built worldwide. To model the effects of this increase, it is crucial to understand whether possible self-organized patterns are inherent in the global road network structure. Here, we use the largest updated database comprising all major roads on the Earth, together with global urban and cropland inventories, to suggest that road length distributions within croplands are indistinguishable from urban ones, once rescaled to account for the difference in mean road length. Such similarity extends to road length distributions within urban or agricultural domains of a given area. We find two distinct regimes for the scaling of the mean road length with the associated area, holding in general at small and at large values of the latter. In suitably large urban and cropland domains, we find that mean and total road lengths increase linearly with their domain area, differently from earlier suggestions. Scaling regimes suggest that simple and universal mechanisms regulate urban and cropland road expansion at the global scale. As such, our findings bear implications for global road infrastructure growth based on land-use change and for planning policies sustaining urban expansions.

## 1. Introduction

Modern civilizations developed along with road networks, simple and efficient systems designed to colonize free land, improve human mobility and move goods among locations. Today, the road system grooves and fragments the 19 million hectare surface of the Earth with more than 14 million km of paved surface. At the backbone of human colonization, road expansion embodies a complex blend of economic growth and often unsustainable development [1,2]. From an economic perspective, road expansion has typically been associated with economic growth, poverty reduction [3] and urbanization processes [4]. However, roads cutting through ecosystems may cause severe environmental degradation, like habitat fragmentation and biodiversity loss, facilitating urban sprawl and efficient deforestation [5,6]. Pressure from global population growth with the resulting increase in food demand [7,8] and from the ongoing global urban transition [9,10] will foster a massive road expansion in the upcoming decades. It has been estimated that the total paved length will increase by an additional 25 million km by 2050 [5]. Controlling such an expansion will be of crucial importance for global environmental conservation strategies and sustainable agricultural development. Yet despite the fundamental role of road expansion in global human–environment relations and some attempts made to reconcile their double-edged consequences [5], a quantitative and exhaustive description of the structure of the global road network (GRN) necessary to model this expansion is currently missing.

Statistical laws governing road networks have been extensively explored [11,12]. Such studies investigated issues such as scaling [13–16], road centrality [17], evolution [4,18–20] and urban sprawl [21], within the general domain of complex networks analysis [22]. However, such efforts have focused on urban road networks, neglecting connections among nodes serving areas dedicated to non-urban land uses. Here, by coupling a detailed dataset on the global road network with global land-use inventories, we provide an analysis of the structure of the network of major roads as of year 2015 and examine its dependence on land use. Such analysis is carried out by studying the distributions of road lengths and their scaling relationships.

## 2. Results

The GRN has been extracted from commercial vectorial maps of major roads on the Earth that are used mostly for navigation and cartography (see Material and methods). The GRN contains the major roads network for 2015 organized in four hierarchies: primary roads with limited access (H1), primary roads with non-limited access (H2), secondary roads (H3) and local roads (H4). The GRN does not contain local urban roads but only the major ones; therefore, the GRN can be used to analyse the major road infrastructure but not the urban morphology related to urban block size and form. The total road length deduced from the GRN database (14 522 470 km in 2015) is much larger than that estimated from a dataset recently used for global road environmental impact estimation (gROADS) [5,23], which accounts for a total road length of 7 644 410 km. The gROADS dataset comprises only hierarchies H1, H2 and H3 for the year 2009. Comparing gROADS with the correspondent subset of GRN, one can estimate an annual growth of 5.7% between 2009 and 2015. However, by considering all GRN road hierarchies (H1∪H2∪H3∪H4), up to 30% of the total road length at the global scale was not represented in previous analyses.

Starting from the original database, we produced the GRN as a *primal road network* [24], in which nodes are the road junctions and links are road segments, and each link carries a weight indicating its length (*l*). To extract the final GRN network, we removed street junctions connecting only two roads and split each link, regardless of hierarchy, at any intersection with three or more roads. Defining three layers of major land uses—urban, cropland and seminatural—we labelled each road with the land use it belongs to. These three land-use classes, corresponding to the three main land uses on the Earth, are extensively used in the global land-use literature [25]. We extracted the global urban footprint for 2013 from night-time lights (NTLs) [26] and cropland from a recent global cropland inventory for 2005 [27]. We then assigned each road to three mutually exclusive land-use classes as follows: roads have been labelled as urban (*U*) if they totally belong to urban areas, cropland (*C*) if they totally or partially overlap a cropland area and seminatural (*Sn*) if they are free of any agricultural or urban land use (that is, they are not dominated by direct human presence, e.g. roads crossing remote areas or natural parks). Figure 1*a* shows a global overview of the GRN and the three classes. It is important to note that the adopted labelling methodology allows us to avoid multiple land-use associations and treat croplands differently, as a single road segment can connect more than one cropland unit. A detailed description of the dataset along with an atlas of detailed visualizations and spatial analysis methodologies are provided in Material and methods and electronic supplementary material. Key to our spatial inventory at the global scale [8,25], we show that the threshold used to discriminate land-use classes, such as illumination threshold, does not affect our main results (see electronic supplementary material). Moreover, considering that cropland had very little growth in the last 10 years [8], data from different years should not affect the main results.

We here focused on a seemingly simple yet fundamental road network feature: the lengths (*l*) of road segments [29,30]. The GRN in the year 2015 spanned a total length of 14 522 470 km divided into more than 3 million road segments, with the mean road (segment) length independent of land use 〈*l*〉=4.8 km (with standard deviation (s.d.) 8.9 km).

The *U*, *C* and *Sn* classes cover, respectively, 12%, 37% and 51% of the GRN by road length as shown in figure 1*b*. These percentages depend mildly on the threshold levels used to discriminate between classes (see Material and methods and electronic supplementary material). The mean road lengths in the different land-use classes are 〈*l*〉_{U}=1.2 km (s.d. 1.3 km), 〈*l*〉_{C}=7.4 km (s.d. 9.0 km) and 〈*l*〉_{Sn}=7.0 km (s.d. 12.0 km). The large standard deviation values highlight an important feature of road length distributions: these distributions are heavy-tailed and potentially reminiscent of power laws [31], and thus are not grouped tightly around a typical value. The mean road length in different land-use classes highlights a second feature of these distributions: *C* and *Sn* roads are generally longer those *U* roads. Indeed, the distribution of *U* road lengths is very different from that of *C* and *Sn* segment lengths. Expectedly, cities encompass shorter streets than agricultural or seminatural areas. But apart from these different mean lengths, how fundamentally different are these distributions? Are they possibly rescaled versions of the same universal distribution, obtained by varying only the mean road length? To test this hypothesis, we examined whether road length distributions can be described by a finite-size scaling form [32–34]:
*l*〉 is the mean road length in the land-use class of interest and the function *F*(⋅) is identical across classes. Normalization of the distribution requires that the exponents *γ* and *α* must satisfy *α*=1/(2−*γ*) [34]. Furthermore, *F*(*x*) must satisfy appropriate limiting behaviours as *x*→0 and *p*(*l*) by data collapse [28], i.e. by plotting *l*^{γ}*p*(*l*) versus *l*/〈*l*〉^{α} for each land-use class separately and varying *α* until the curves describing each class collapse onto the same curve, thus providing a plot of the scaling function *F*. Figure 1*c* shows the plot of *p*(*l*) for each land use and figure 1*d* shows the resulting plot with *α*=1 and *γ*=1 (see Material and methods for details on probability distribution collapse). We find that road length distributions associated with urban and cropland classes collapse onto nearly the same curve, whereas the road length distribution associated with seminatural areas fails to do so. Different definitions of urban boundaries and different methods for the classification of roads crossing land-use boundaries led to indistinguishable results (see electronic supplementary material). Therefore, road length distributions in urban and cropland areas share the same fundamental structure at the global scale, despite large differences in the mean road length.

Equation (2.1) describes the ensemble distributions of road lengths obtained by grouping roads belonging to urban and cropland patches of different extents. Following previous approaches [35,36], we fragmented urban land use into separate components by means of spatial contiguity of urbanized and cropland cells, extracting all urban and cropland patches on the Earth (see Material and methods). As in our global analysis, we labelled each road with the land use, as well as the area of the patch it belongs too. We then study the road length distribution as a function of patch area for urban and cropland patches (figure 2*a*,*d*), finding different scaling behaviours in urban patches compared to cropland ones as explained below.

We found that the mean road length 〈*l* | *A*〉_{U} within an urban patch of area *A* increases sublinearly with *A*, 〈*l* | *A*〉_{U}∝*A*^{δ} with *δ*=0.41±0.02 (mean ± s.e. estimated via least squares fit of log-transformed data) for areas below *A*_{UT}≃4×10^{7} m^{2}, above which it remains relatively constant (figure 2*b*, inset). The distribution *p*_{U}(*l* | *A*) of urban road lengths conditional on the urban patch area *A* (figure 2*b*) appears to be well described by the scaling form:
*G*_{U} is a scaling function with suitable properties [34], as verified by data collapse (figure 2*a*,*b*). For each patch, we computed the total road length *L* and found that the mean total road length in urban patches increases sublinearly with *A*, 〈*L*〉_{U}∝*A*^{β} with *β*=0.62±0.01 (again, via least squares fit of log-transformed data) below *A*_{UT}, above which it becomes effectively linear (*β*=1.06±0.02) (figure 2*c*). As 〈*l* | *A*〉_{U} is relatively constant above *A*_{UT}, the linear scaling of 〈*L*〉_{U} implies that the mean total number of roads increases linearly with *A* above *A*_{UT}.

Road length statistics in cropland patches also display multiple scaling regimes, although with different behaviour compared to urban areas. Specifically, the mean road length 〈*l* | *A*〉_{C} associated with a cropland patch of area *A* displays a triphasic behaviour (figure 2*e*, inset). Initially, 〈*l*|*A*〉_{C} increases until *A*≃10^{5} m^{2}, although very few crop patches are found below this scale and we will thus neglect these data in our discussion. Then, 〈*l* | *A*〉_{C} decreases until *A*_{CT}≃10^{9} m^{2} and remains relatively constant above *A*_{CT}. The distribution *p*_{C}(*l* | *A*) of cropland road lengths conditional on the cropland patch area *A* (figure 2*d*) appears to be well described by the scaling form:
*G*_{C} is a scaling function with suitable properties [34], as demonstrated by data collapse in figure 2*d*,*e*. Analogous to the distribution of urban road lengths, the distribution *p*_{C}(*l* | *A*) is invariant for all cropland areas larger than *A*_{CT}. Conversely, the mean total road length in cropland patches (figure 2*f*) is approximately constant below *A*_{CT} and increases linearly above this threshold (exponent *β*=0.95±0.05, estimated via least squares fit of log-transformed data), implying that the mean total number of cropland roads also increases linearly with *A* above *A*_{CT}. We thus found that the average road length versus cropland patch area is well described by 〈*l*|*A*〉_{C} with *A*_{CT}=(7.4±0.5)×10^{7} m^{2}.

The above results lead to a natural question: are the scaling functions *G*_{C} and *G*_{U} related? Superimposing the two scaling functions (figures 2*b*,*e*, as shown in figure 3*a*) suggests that *G*_{C}≃*G*_{U}, such that equations (2.2) and (2.3) approximately coincide, although with different dependencies of 〈*l* | *A*〉_{U} and 〈*l* | *A*〉_{C} on *A*.

The scaling structure of a road network can be also investigated by observing the recursive fragmentation dependent on the hierarchy of roads. The GRN subdivides the surface of the Earth into non-overlapping regions called *faces* (roads networks are planar graphs consisting of a series of land cells suitably surrounded by road segments [4]). We defined ensembles of faces for each road hierarchy (H1–H4) and investigated the probabilistic relationship between the size of any face and the length of the roads within it at each scale of observation. For example, by considering only the coarse-grained view provided by highway faces alone (H1), it is possible to extract a series of large faces and study how they are fragmented at the smaller scale by major roads (H2). Then, similarly, we can study how the faces of major roads (H2) are fragmented by secondary roads (H3) and so on. Figure 4*a* gives a global overview of the hierarchical organization of GRN and figure 4*b*,*c* illustrates the procedure of nested fragmentation. Formally, once all road segments are labelled by their hierarchical class, *E*_{H(i)}, a protocol has been set up to assign each of them to a face of area *A*_{H(i−1)}, on which we speculate that the length distribution of such road segments is conditional. This is equivalent to a coarse-graining procedure. We binned the areas of all faces (i.e. at each scale of observation) into 10 log-binned intervals (*A*_{(k)}, *k*=1,…,10) to account for variable numerosity of the samples [31]. We collected the road lengths {*l*}_{A(k)} (*k*=1,…,10) belonging to the faces whose areas are included in the *k*th face area bin (irrespective of the hierarchical class of these roads) and computed the relative proportion *p*(*l* | *A*)=*p*(*l* | *A*_{(k)}) that measures the probability to find a road of a given length in the area bin *A*_{(k)}. We then tested whether the curves *l*^{γ}*p*(*l* | *A*) plotted against *l*/〈*l* | *A*〉^{α} (where 〈*l* | *A*〉 is the average road length in the set {*l*}_{k}) for each of the area bins collapse onto the same curve, i.e. whether *p*(*l* | *A*) displays finite-size scaling (figure 4*d*,*e*). By using the full dataset, a satisfactory collapse has been found for *α*=1.1 and *γ*=1.1 as shown in figure 4*d*,*e*. Such collapse indicates a universal scaling curve that regulates the fragmentation of the road network at all scales of observation.

## 3. Discussion

Our study provides evidence for general statistical laws describing global road lengths conditional on land use. At the global scale, urban and cropland roads share a universal distribution after rescaling that satisfies (equation (2.1) with *α*=*γ*=1)
*F*(⋅) is independent of the land-use class, with land use instead impacting the mean road length 〈*l*〉. At finer scales, we investigated the dependence of road length distributions on the area of the homogeneous patch under study, separately for each land-use class. We found that road length statistics conditional on patch area closely match a universal scaling law of the form
*G*(⋅) is independent of the land-use class (i.e. *G*=*G*_{U}=*G*_{C}). That is, land use affects *p*(*l* | *A*) solely through the scaling of the mean road length 〈*l* | *A*〉 with patch area *A*.

Equation (3.2) has interesting implications. Because 〈*l* | *A*〉_{U} and 〈*l* | *A*〉_{C} are approximately constant above the respective thresholds, *A*_{UT} and *A*_{CT}, the distributions *p*_{U}(*l* | *A*) and *p*_{C}(*l* | *A*) are invariant for all cities and cropland patches larger than such a threshold. In particular, because *F* and *G* are quite similar in shape and nearly coincide if one neglects small urban and cropland patches, i.e. below the critical sizes *A*_{UT} and *A*_{CT} (see Material and methods). The observation of nearly universal probability distributions of urban and cropland road lengths suggests that the processes that govern road expansion are to a large extent inevitable, regardless of climate, topography and social and economic constraints, echoing general results on self-organized patterns [37,38]. Properties arising from physical constraints imposed by the planarity of the road network may also be at play [19].

These results suggest that local conditions, such as the socio-economic development or the demography of a specific region, may simply accelerate or delay the development of road infrastructure, affecting the characteristic length scale of the road length distribution but not its scaling form. Significantly, we find that road length distributions belonging to seminatural areas do not collapse onto the same distribution, even after rescaling. We argue that such diversity stems from the diverse purpose of roads in natural areas which are not built specifically for direct access to the land and are therefore regulated by diverse, site-specific processes and possibly far from optimized. The linkage of optimality and self-organization is known to induce a variety of scaling phenomena, as shown by studies on network structures derived from selection principles [18,39–42], including small-world constructs [43], fluvial trees [44–46] and the topology of the fittest networks [47]. At this stage, however, attributing the departure of the scaling features of roads serving seminatural areas to transient or non-optimized processes would be premature.

Urban scaling approaches have suggested that larger cities require less infrastructure than bigger ones. Indeed, if the total road length (*L*) scales sublinearly with the total population of a city (*P*) within its boundary, i.e. *L*∝*P*^{η} with 0.7<*η*<0.9 [48,49], such scaling would be directly related to the management of mega-cities [50] which would potentially be more efficient than small towns. Our results put such scaling results in a different perspective. According to the linear scaling shown in figure 2*c*, an urban connected component of 10^{9} m^{2} and ten components of 10^{8} m^{2} together require approximately the same total length of major roads. This holds for all urban patches bigger than *A*_{UT}≃4×10^{7} m^{2}, thus including all major cities. This result is consistent with other recent studies performed for a set of cities in Europe and the USA [51] and for the entire UK [52], showing that the actual sizes of urban patches scale linearly with the total lengths of major roads within them. This then implies that the proposed sublinear scaling between length and population is due to a superlinear scaling between population density and city size, and that road network lengths are not a good approximation of urban population density. However, more analysis is necessary to prove such speculation, because the suggestion of sub-linearity in the scaling relationship between *L* and population may have been an artefact of census-based data, which are built regardless of land use within the political boundary of a given region. Therefore, it seems plausible that roads from different land-use classes, with correspondingly different characteristic lengths, may have been mixed. Moreover, census areas are defined differently among various countries, thus precluding cross-country comparative analyses. Indeed, it has been shown that different methods to define city boundaries, based on tuning urban population density, could drastically affect the scaling of urban measurements [52–54]. We used here a definition of urban patches only based on contiguity of urban land, therefore the proposed results must be interpreted on the basis of used datasets and adopted methodology.

Although the above implications are in need of deeper scrutiny, which should include scaling analyses of different urban measurables (e.g. energy and material flows *sensu* [50] or other infrastructures), our findings can be directly used as an urban planning tool aimed to estimate road infrastructure requirements. For example, the analysis of global urban evolution, carried out by using 16 night-light layers (see electronic supplementary material), reveals that the total number of urban patches is proportional to *A*_{Utot} and thus also to *L*_{Utot}. Therefore, our *ansatz* suggests, for example, that a 10% increase of global urban surface would imply a parallel increase in total urban road length of 10% (see electronic supplementary material for details). As it has been estimated that by 2030 the total urban area, *A*_{Utot}, could potentially grow by 200% [9,10], such urban expansion would entail a total length *L*_{tot} of 3 485 400 km of new paved urban roads.

Our approach also suggests that cropland patches smaller than *A*) require the same investment in road paving as *n* smaller cropland patches of area *A*/*n*, provided that *A*/*n*>*A*_{CT}. From this perspective, larger cropland patches are predicted to be as efficient in terms of road infrastructure requirements as smaller ones (within the linear regime highlighted in figure 2*d*). By contrast, cropland areas smaller than *A*_{CT} require a relatively greater length of roads (figure 2*e*,*f*). Such cropland patches are typically composed of several small and scattered cropland units located in wild or mountainous areas. A typical example is given by scattered agricultural plots in forest areas (see electronic supplementary material; figure 3*a*,*b*); in such configurations, long roads serve tiny cropland units, indicating potentially higher environmental impacts of sparse agricultural expansion. Meanwhile, cropland patches around urban areas are generally more compact and more fragmented, in the linear regime. Because cropland roads can connect more than one cropland patch, however, the proportion *N*_{C} ∝*A*_{Ctot}∝*L*_{Ctot} would overestimate *L*_{Ctot}. Moreover, by considering the competition among cropland patches and seminatural areas and the scarcity of residual free land to colonize [8], future cropland roads might merely partially substitute for the existing seminatural roads, thus partially conserving the existing extent of road length, *L*_{tot}. Such a hypothesis is corroborated by the similarity of the distributions *p*(*l*)_{C} and *p*(*l*)_{Sn} shown in figure 1*c*, which implies a similar level of fragmentation of seminatural areas. An important distinction arises therefore between urban and cropland areas in that the site-specific degree of reuse of existing infrastructure in cropland areas prevents the same kind of strict predictability allowed for urban areas.

## 4. Material and methods

### 4.1. Data preparation and data fusion

The general idea here is to transfer the land-use information, represented by continuous values, onto the road network, which is represented by lines vectorial geometry. To do so, a series of spatial analysis operations have been performed, mainly in ArchMap, Qgis and Python environments. An illustration of the data preparation schema is presented in the electronic supplementary material. Data preparation includes the extraction of the urban footprints, a preparation of the road network and a join phase between land uses and the road network. Below we report details of each phase; in the electronic supplementary material, we report detailed visualization of the final road network.

### 4.2. Urban footprints extraction

Urban footprint data have been extracted from inter-calibrated night-time light (DMSP-OLS) remote imagery using original images and data processing by NOAA’s National Geophysical Data Center and DMSP data collected by the US Air Force Weather Agency (https://www.ngdc.noaa.gov). The NTL data contain continuous values from *D*=1 to *D*=63, representing the intensity of stable light in a grid of 30 arc seconds (approx. 1 km). DMSP-OLS have been widely used to characterize urban footprint and urban evolution at the global scale [23,26,55,56]. After removal of gas flaring, the Jenks cluster algorithm has been applied to extract an urban settlement mask [57]. The Jenks algorithm is an unsupervised classification method which imposes the number of clusters and is widely used in geographical analysis to cluster continuous surface datasets in separated areas. Starting from a set of randomly selected values, the Jenks algorithm works to minimize the variance inside classes while maximizing variance between classes. We thus classified the entire NTL into three classes and took the most illuminated class to be the urban footprint, considering a pixel to be classified as urban if *D*>28. We tested other classification methods as well as different numbers of classes; but given the sharp separation between highly and poorly illuminated places, different values of *D* did not affect the main results of the scaling analysis (see the electronic supplementary material). Using the same procedure, we classified urban footprints from 1997 to 2012.

### 4.3. Cropland

The 1 km global IIASA-IFPRI [27] cropland likelihood map has been developed by integrating a number of individual cropland maps at global to regional to national scales. The individual map products include existing global land cover maps such as GlobCover 2005 and MODIS v. 5, regional maps such as AFRICOVER, and national maps from mapping agencies and other organizations. IIASA-IFPRI is a public dataset that can be downloaded from the Geo-wiky platform (http://cropland.geo-wiki.org/downloads/). Detailed visualizations of the cropland mask are provided in the electronic supplementary material.

### 4.4. Global road network preparation and land-use labelling method

Geo-located vectorial road data are usually produced for GPS navigation and cartography. Our dataset was produced by and acquired from DeLorme (GARMIN, Yarmouth, ME, USA). These data represent a complete and updated dataset of primary and secondary roads at the global scale. These data have commercial restrictions; additional information and request for data samples can be addressed to the contact details at https://developer.garmin.com/datasets/digital-atlas/. The topology of the road network has been corrected using Archmap software and ad hoc Python scripts for the purpose of joining connected roads at junctions with only two roads and to remove small road links representing highway ramps and crossroads intersections which are not representative of any fragmentation process and are potential noise for the statistical analysis. Roads shorter than 100 m cover only ≈0.03% of the total road length; these roads, as confirmed by an extensive and scrupulous inspection of the GRN dataset, appear to be highway ramps or road segments for large road junctions and were therefore excluded from our analyses. Coupling the GRN with two global land-use inventories, roads have been classified into three categories: urban roads, i.e. roads that entirely belong to urban areas; cropland roads, i.e. road that intersect or belong to cropland areas; and seminatural roads, i.e. roads that are completely free of direct urban or cropland use (see electronic supplementary material for a detailed visualization of the final classified road network). Data of road network are released under licence agreement with DeLorme (GARMIN, Yarmouth, ME, USA).

### 4.5. Probability distribution collapse

Normalization of the distribution *p*(*l*)=*l*^{−γ}*F*(*l*/〈*l*〉^{α}) requires that the exponents *γ* and *α* satisfy *α*=1/(2−*γ*) [34]. Furthermore, *F*(*x*) must satisfy appropriate limiting behaviours for *x*→0 and *p*(*l*) by data collapse [28], i.e. by plotting *l*^{γ}*p*(*l*) versus *l*/〈*l*〉^{α} for each land-use class separately and varying *α* until the curves describing each class collapse onto the same curve, thus providing a plot of the scaling function *F*(⋅). We similarly verified our scaling ansätze (equations (2.1)–(2.3)) on the scaling form of the road length distributions via data collapse, as routinely used in statistical physics and beyond [28,34]. We used an objective method [28] to determine the scaling exponent that gives the best data collapse in terms of minimizing the area between the rescaled distributions (figure 1*d*, inset).

### 4.6. Relationship between scaling functions

The relationship between the scaling functions *F* and *G* is mediated by the distribution of patch areas *p*(*A*), via *A*<*A*_{UT}, we find
*l*|*A*〉 on *A* that holds approximately for *A*>*A*_{UT}. Equation (4.1) coincides with equation (3.2) for *A*>*A*_{UT}. Within this approximation, therefore, the scaling functions *F* and *G* coincide. The same result can be derived for croplands, neglecting patches with area *A*<*A*_{CT}. Deviations between *F* and *G* are thus ascribable to urban and cropland patches in the sublinear scaling regimes. The quality of the approximation can be tested by contrasting the ensemble distribution of urban road lengths (red dashed curves in figures 1*c* and 3*b*) with the distribution of urban road lengths conditional on the urban area being *A*>10^{8} m^{2} (solid red curve in figure 3*b*, corresponding to the curves from yellow to red in figure 2*b*). Figure 3*b* shows that the two distributions overlap almost completely, implying that equation (4.1) is an excellent approximation for urban patches. The analogous approximation for croplands is slightly less satisfactory, caused by the fact that the longest roads are found in croplands of smaller area (see the inset of figure 2*e*). This unintuitive result appears to be due to long roads fragmenting wild areas for agricultural purposes. Therefore, the tail of the ensemble distribution of cropland road lengths (green solid line in figure 1*c* and green dashed line in figure 3*b*) is ‘fatter’ than the tail of the cropland road length distribution conditional on cropland area *A*>10^{9} m^{2} (i.e. above the threshold area *A*_{CT}, green solid line in figure 3*b*).

## Data accessibility

The data that support the findings of this study are generated combining three datasets, each of which is either publicly available or available under licence agreement. A detailed description of the data and the availability of each dataset is reported in Material and methods.

## Authors' contributions

E.S., A.G., S.S., E.B., P.J.M. and A.R. designed research. E.S. performed spatial analysis. E.S., A.G., S.S. and E.B. performed research. E.S., A.G., S.S., E.B., P.J.M. and A.R. analysed data. E.S., A.G., S.S., E.B., P.J.M. and A.R. wrote the paper.

## Competing interests

The authors declare no competing interests.

## Funding

E.S., A.G., E.B. and A.R. acknowledge the support provided by ENAC (EPFL) through the *Innovative research on urban dimension* grant. S.S. and P.J.M. acknowledge support from the James S. McDonnell Foundation 21st Century Science Initiative—Complex Systems Scholar Award grant no. 220020315. E.S. has been funded by the Swiss National Science Foundation. A.G. acknowledges funds from the Swiss National Science Foundation Project P2ELP2_168498.

## Acknowledgements

E.S. thanks Francois Golay, Marc Barthelemy, Riccardo di Clemente and Marta Gonzalez.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3899419.

- Received June 1, 2017.
- Accepted September 18, 2017.

- © 2017 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.