## Abstract

Urban morphology has presented significant intellectual challenges to mathematicians and physicists ever since the eighteenth century, when Euler first explored the famous Königsberg bridges problem. Many important regularities and scaling laws have been observed in urban studies, including Zipf's law and Gibrat's law, rendering cities attractive systems for analysis within statistical physics. Nevertheless, a broad consensus on how cities and their boundaries are defined is still lacking. Applying an elementary clustering technique to the street intersection space, we show that growth curves for the maximum cluster size of the largest cities in the UK and in California collapse to a single curve, namely the logistic. Subsequently, by introducing the concept of the condensation threshold, we show that natural boundaries of cities can be well defined in a universal way. This allows us to study and discuss systematically some of the regularities that are present in cities. We show that some scaling laws present consistent behaviour in space and time, thus suggesting the presence of common principles at the basis of the evolution of urban systems.

## 1. Introduction

Since the middle of the twentieth century, universal properties of cities have been identified, including Zipf's and Gibrat's laws [1,2]. City size has been measured most commonly in terms of built area or population since Zipf's seminal book [1], notwithstanding that most of the time city boundaries have been defined in terms of often arbitrary, fixed administrative boundaries.

Many different techniques to define cities have been suggested based on the analysis of urban growth [3–5], and recently a method using demographic and commuting data has been proposed [6]. Clustering techniques such as the City Clustering Algorithm have been applied, mostly to analyse satellite images and demographic data [7–9], but these are rarely parameter free. A method proposing a bottom-up approach that does not rely on highly aggregated census data or on the interpretation of remotely sensed images is needed.

When we define a city, we have to keep in mind that built area and population are strongly correlated [9], but these correlations, as we show in this paper, do not necessarily carry universal exponents. The interpretation of the empirical outcomes using these definitions has to be therefore put into context according to the methodology employed.

As pointed out in [6], a broad range of exponents based on different allometries inferred from urban studies [10,11] can be observed for different boundary definitions. This further supports the urgent need for an operational and context-free definition of the city. It is somewhat astonishing that in spite of the large body of literature about cities, the very concept of city remains in some ways obscure, hidden or assumed.

In this paper, we present some universal properties of cities which emerge when applying an elemental clustering technique to the vertices and edges of street networks. We obtain a logistic growth curve from which the structural fringe of the city can be defined mathematically in a bottom-up approach. This is achieved by obtaining the parameters at the point at which a *condensation* phenomenon is observed as we will explain below. The curves for all cities then collapse to a single curve, and city boundaries are hence defined in a universal way. Such universality in the spatial properties of cities prompts us to look at the spatial and temporal behaviour of important properties of urban street networks, and thus investigate whether some scaling laws could display a general behaviour.

## 2. Results

A city is a complex organism, composed of many superimposing layers, such as transportation networks, the built environment, and different economic, social and information flows [12–14]. Such layers are dynamical by nature and give rise to generic patterns, such as fractal geometries [12,13]. Administrative boundaries overlook these aspects and are not able to measure or record the dynamical aspects of cities in a consistent way across space.

Among others, street networks provide a good representation to characterize the morphology of a city, where a street network is defined as that planar graph where the street intersections *N* are the vertices and the street segments *E* are the links. We will consider here street intersections as being a good proxy for the urbanization process. Such a choice reduces the complexity of the problem to that of a spatial point pattern. This has the value of simplicity. Moreover, it has been positively tested before [15,16] with some correlations between the number of street intersections and built area for urban systems being shown in the electronic supporting material, §ID.

Considering a spatial window large enough to contain a given city and using an elementary clustering technique [17], we consider two street intersections to belong to the same cluster if they have a distance below a given distance threshold *τ*, where *τ* is measured in metres. Increasing *τ* enlarges the size of the clusters, until eventually a giant component appears, which spans the entire street network.

We measure the maximum cluster size *N*_{Max}(*τ*) in terms of number of intersections as a function of the increasing threshold *τ*, and we find that for all the cities *N*_{Max}(*τ*) grows exponentially and eventually the growth slows down and the curve condensates to a certain value (figure 1*a*). This behaviour has been positively tested for all the largest cities in the UK and in California, suggesting that the maximum cluster size behaviour for cities highlights universal properties of urban morphology (see electronic supplementary material, §II, for more details).

### 2.1. The condensation threshold

The function defined by *N*_{Max}(*τ*), i.e. exponential growth followed by condensation, has the characteristics of the logistic growth function:
2.1where *C* is the carrying capacity, *r* is the growth rate and *τ*_{0} is the inflection point [18].

Following equation (2.1), we show that for cities in the UK and in California, *N*_{Max}(*τ*) grows as e* ^{rτ}* until the inflection point

*τ*

_{0}, and after that it condensates at a constant value given by the carrying capacity

*C*. In order to do that, given the transformation we expect that all the measured curves would collapse to a single curve, namely

We test this hypothesis for the 61 largest cities in the UK and for the 52 largest cities in California (see figure 2 and electronic supplementary material, §I). These results are shown in figure 3, and we can see that for both cases there is a very high correlation (*R*^{2} > 0.99) for the quality of the collapse. This correlation is maintained if the maximum cluster size is measured according to the number of street segments *E*(*τ*) instead of the number of intersections. In this case, we find that the collapse is estimated with an *R*^{2} > 0.98.

These results indicate that the proposed clustering technique is able to capture generic properties of urban street networks. In order to investigate this further, we look at how the logistic form of equation (2.1) is related to urban morphology and whether it allows us to define in a rigorous way the boundaries of a city.

As the logistic function is associated to the Verhulst model [18], it is interesting to understand how the carrying capacity *C*, always referring to a reservoir in the system, could be associated to our clustering approach. To understand this, we note that the largest cluster grows in the area where the intersection density is large, i.e. the urban area (see figure 1 as a visual reference). The existence of a condensation phase shows that there exists an abrupt transition between the urban area and the rural area, where the intersection density consistently drops. Hence, the reservoir could be interpreted as the set of intersections belonging to the urban network which are consumed while the maximum cluster grows, and then the carrying capacity represents the city size in terms of street intersections.

Following the clustering analysis introduced above, when *τ* grows after the logistic condensation phase, *N*_{Max}(*τ*) starts to grow again (figure 1). This is because after the maximum cluster reaches the condensation phase, as *τ* grows rural intersections and small towns close by get absorbed by the maximum cluster. In such a way, *N*_{Max}(*τ*) exceeds the carrying capacity *C*.

We define the *city condensation threshold* as the threshold where the measured maximum cluster size *N*_{Max}(*τ*) intersects the carrying capacity of the fitted logistic function, i.e. The city is so defined as the maximum cluster at the city condensation threshold, as we show in figure 1 for London. In order to investigate whether the city boundaries obtained in this way bear any resemblance with the urbanized space, we overlap the given contours with land-use satellite images. Figure 2 demonstrates clearly that the city boundaries as defined via the condensation threshold delimit the so-called *urban fringe*, i.e. the spatial pattern related to the city's expansion.

### 2.2. Space and time scaling relations

In this section, we try to understand the meaning of different allometries that are usually found in urban studies and we examine them in spatial and temporal terms. To pursue this, we analyse a few simple global statistical properties of the spatial networks: the network total street length *L*(*N*), measured in meters, which is the sum of the lengths of the street segments for a given network; the network area *A*(*N*), measured in square metres, which is the area embedded by a given street network; the street intersection density *P*(*n*), obtained by imposing a 400 m side square grid on the top of the street network, and counting the number *n* of intersections falling in each cell.^{1} These quantities are quite sensitive to the structure of the network and some of them have been considered in different studies [11,15,16,19,20].

The following analysis shows that urban street networks, as defined via the condensation threshold, display statistical properties which are consistently different from the statistical properties of *rural* street networks.^{2} Moreover, we show that the allometric exponents obtained for the above-mentioned properties are compatible for cities in the UK and for cities in California. Remarkably, we find that these exponents are compatible with the ones found for the growth of London during the last two centuries.

The network total street length *L*(*N*) is a global quantity characterizing the nature of the underlying network. We can write that where *E* is the number of street segments and is the average length of a street segment, if such a quantity can be well defined. Then, considering that the average degree of the network can be written as we have where the density distributions for both *l* and *k* have finite mean and variance.

We find (figure 4*a*) that for cities in the UK, the behaviour of *L*(*N*) is consistent with a linear function of *N*. On the other hand, for the rural street network in the UK, we find a different behaviour statistically significant for the same quantity (*p*-value = 0.007), which scales in a sub-linear way, i.e. The linear relation for *L* in urban networks is due to the independence of and by *N*, while the sub-linear relation for *L* in the rural network is due to the sub-linearity of for those networks (see electronic supplementary material, §III).

In the case of cities in California (figure 4*b*), we find that the behaviour of *L*(*N*) is consistent with that of the UK in a slightly super-linear regime, i.e. On the other hand, for the rural street network in California, we find that *L*(*N*) is sub-linear, i.e. and it is not consistent within the error range neither with that of the California urban street network (*p*-value = 0.0003) nor with that of the UK rural street network.

In figure 4*c*,*d*, we see that the exponents for urban network areas *A*(*N*) in the UK and in California are quite similar, following a very mild super-linear relation, i.e. On the other hand, super-linearity can be statistically discarded for both exponents for the rural case in the UK (*p*-value = 0.000004) and in California (*p*-value = 0.0004). In addition, it is important to note that for the rural networks, the exponents for the UK and California are notably different. Linearity can be discarded for California, while this is not the case for the UK.

These differences reflect the contrast in the spatial patterns of the street networks covering these two countries. In particular, the nearly linear relations found for the urban areas reflect the fact that street intersections are generally homogeneously distributed within the urban fringes. Such homogeneity can be seen from the street intersection distributions *P*(*n*) shown in figure 4*e*,*f*. In this case, again we find very similar patterns between the UK and California, where *P*(*n*) is well fitted by a logistic distribution in the case of urban street networks. This is a bell-shaped distribution with a well-defined average and variance, while it is ill defined for rural street networks.

The analysis above highlights the fact that urban street networks are characterized by an overall homogeneous texture, which is consistent between the two different countries considered in this work. In the same way, we can observe how rural street networks differ consistently from urban street networks and between different countries, displaying an overall inhomogeneous structure. Hence, we find that for urban conglomerations, a general behaviour emerges in the study of the scaling laws which characterize the global street network structure.

These scaling exponent similarities do not imply that different cities look the same. In fact, different urbanization processes shape cities in very different ways, in terms of morphology and size. Nevertheless, the compatibility between the exponents for the analysed quantities suggests that there might be common principles for the growth of cities. If this is the case, then cities at a specific point in time represent different states of the evolutionary process. We will then expect to find a similar behaviour if we looked at the evolution in time of a specific city. In order to test this hypothesis, we consider a unique dataset recording the evolution of street networks of Greater London between 1786 and 2010, through nine well-spaced temporal intervals defined by the maps shown in figure 5 (see electronic supplementary material, §IC, for more information).

In figure 6, we perform a simple test, by measuring the aforementioned quantities in the contemporary UK urban street networks and in the historical London dataset. Interestingly enough, for *L*(*N*) in the UK, the historical dataset overlaps with the spatial dataset and both allometric fittings are consistent over a linear regime. As we stated above, this means an overall homogeneity in terms of the average connectivity and the average street segment length that is preserved over time. For *A*(*N*), even if the points do not really overlap, the allometric behaviour is consistent between the time and space averages in the slightly super-linear regime.

## 3. Discussion

Two important results can be derived from our study. On the one hand, we provided a methodology to define city boundaries through spatial urban networks in a universal way. On the other, we explored the generality of some scaling laws related to urban street networks. Both of these aspects relate to the quest for methodological advancements in the analysis of spatial urban networks, and they relate to the discussion of important statistical phenomena, such as those described by Zipf's law and Gibrat's law.

Regarding the concept of city boundaries, we discovered universal properties of street networks related to clustering properties in the street intersection space. These properties allow us to distinguish the urban agglomerate with a methodology that is parameter free and that reduces the problem to extract city boundaries to a simple clustering process on a spatial point pattern.

The concept of city boundaries is very important to distinguish between urban and rural networks. We show that allometries found in urban street networks consistently differ from the ones found in rural street networks. This means that an ill posed definition of boundaries, such as arbitrary administrative boundaries, would mix the properties of street networks that are in two distinct phases of their evolution, producing spurious results (see electronic supplementary material, §III, for a direct example).

Regarding our analysis about the generality in space and time of relevant allometries found in urban street networks, we chose two very distinct datasets that present different urbanization paths. While cities in the UK are mostly of Roman or Mediaeval origin and reflect a long line of urban evolution spanning two millennia, cities in California are mostly the result of an urban explosion during the latter half of the nineteenth and the twentieth centuries. In this context, we find that urban street networks display compatible properties, even though the datasets are very different. This highlights how the city is an overall homogeneous structure in terms of its street network quantities (average degree, average street length, etc.). These findings are confirmed by our analysis, which compares the structure of the urban street networks in the UK with the street networks of the historical evolution of London during more than two centuries. Even if these results are not definitive, a general behaviour for the found exponents cannot be excluded at this point and new perspectives of research in this direction are thus opened.

Spatial networks are widespread in nature and it is possible to see how the organization of spatially embedded structures is often similar for a variety of different phenomena. Leaf venation, crack pattern formation, river networks, ant galleries, circulatory systems, soap froths, pipe networks and so on, have been studied in a wide range of disciplines which are often strongly related [21–25]. In particular, brain networks seem to share a number of similarities with the organization of spatial street networks, due to their high modularity and fractal structure [26].

Even though cities present a diverse range of morphological features, we have shown that the boundaries of cities can be identified through universal properties of street networks. This opens up new research perspectives in terms of the analysis of the logistic parameters for each city. As cities undergo different stages of evolution, related either to expansion or to condensation phases, those different evolution phases could be easily recognized and classified from the deviations in the logistic curve related to the clustering process (see electronic supplementary material, §IIA).

Moreover, from our analysis, we can derive a broad picture of the way a city evolves. What we observe is that the street network can be found in two very distinct phases, the rural one, which is not characterized by any distinctive properties, and the urban one which is characterized by high density of intersections which are distributed in patterns that are mostly homogeneous and which carry very similar exponents. In such a picture, a city street network develops as an articulated organism territorializing the sparse rural street network, filling the space with denser residential patterns and then radically changing its morphology.

A key advantage of our method of analysis, compared to other existing approaches, such as those based on data extracted from satellite imagery, is the ease of use. Recent advances in geographic information system technologies have led to the proliferation of street network data generated by public and private entities. Our study demonstrates that these datasets can be deployed in new ways to analyse key properties of cities, enhancing our ability to manage the built environment. A disadvantage of our methodology, as it is presented in this form, derives from its bottom-up approach. As a matter of fact, it is especially indicated to extract a limited number of cities, as the extraction procedure could not be completely automated and needs eye inspection (see electronic supplementary material, §IIA). In order to extract a large number of cities, top-down techniques, such as the one presented in [27], are definitively more efficient, even if less precise.

## Authors' contributions

A.P.M. conceived the study; A.P.M, E.A. and M.B. conducted the study; A.P.M, E.A. and M.B. analysed the results; A.P.M. and E.H. wrote the algorithms; K.S. extracted the historical dataset. All authors reviewed the manuscript.

## Competing interests

We declare we have no competing interests.

## Funding

A.P.M. was partially funded by the Engineering and Physical Sciences Research Council (EPSRC) SCALE project (EP/G057737/1); E.A. and M.B. by the European Research Council (ERC) MECHANICITY Project (249393 ERC-2009-AdG); and K.S. by the ESRC TALISMAN Project (ES/I025634/1). E.H. acknowledges support from J. M. Epstein's NIH Director's Pioneer Award, no. DP1OD003874, from the National Institutes of Health.

## Acknowledgements

We acknowledge helpful discussions with Dr Roberto Murcio.

## Footnotes

↵1 The size of the cell is somehow arbitrary. In France, for example, administrative urban boundaries are set according to a maximum of 200 m separation threshold between buildings. In a highly dense urban system such as the UK, the choice of 400 m seems to be a reasonable scale to allow that each square contains a fair amount of intersections.

↵2 In order to define rural street networks, we delete from the maps all the cities defined by the condensation threshold and then we sample from the resulting maps 1000 random portions of street network from each map (see the electronic supplementary material, §I).

- Received August 25, 2015.
- Accepted September 21, 2015.

- © 2015 The Authors.