## Abstract

We propose a quantitative method to classify cities according to their street pattern. We use the conditional probability distribution of shape factor of blocks with a given area and define what could constitute the ‘fingerprint’ of a city. Using a simple hierarchical clustering method, these fingerprints can then serve as a basis for a typology of cities. We apply this method to a set of 131 cities in the world, and at an intermediate level of the dendrogram, we observe four large families of cities characterized by different abundances of blocks of a certain area and shape. At a lower level of the classification, we find that most European cities and American cities in our sample fall in their own sub-category, highlighting quantitatively the differences between the typical layouts of cities in both regions. We also show with the example of New York and its different boroughs, that the fingerprint of a city can be seen as the sum of the ones characterizing the different neighbourhoods inside a city. This method provides a quantitative comparison of urban street patterns, which could be helpful for a better understanding of the causes and mechanisms behind their distinct shapes.

## 1. Introduction

The recent availability of large amounts of data about urban systems has opened the exciting possibility of a new ‘science of cities’, with the aim of understanding and modelling phenomena taking place in the city [1]. Urban morphology and morphogenesis, activity and residence location choice, urban sprawl and the evolution of urban networks, are just a few of the important processes that have been discussed for a long time but that we now hope to understand quantitatively. An important component of cities is their street and road networks. These networks can be thought of as a simplified schematic view of cities, which captures a large part of their structure and organization [2], and contains a large amount of information about underlying and universal mechanisms at play in their formation and evolution. Extracting common patterns between cities is a way towards the identification of these underlying mechanisms. At stake is the question of the processes behind the so-called ‘organic’ patterns—which grow in response to local constraints—and whether they are preferable to the planned patterns which are designed under large-scale constraints. This programme is not new [3,4], but the recent dramatic increase of data availability such as digitized maps, historical or contemporary data [5–8] allows us now to test ideas and models on large-scale cross-sectional and historical data.

Streets and roads form a network (where nodes are intersections and links are segment roads) which is planar to a good approximation. This network is now fairly well characterized [9–20]; owing to spatial constraints, the degree distribution has peaked, the clustering coefficient and assortativity are large, and most of the interesting information lies in the spatial distribution of betweenness centrality [21]. An important point is that information about these networks is not contained only in their adjacency matrix. Geometry, encoded in the spatial distribution of nodes, plays a crucial role. A classification of cities according to their street network should then rely on both topology and geometry.

We note that while classifications do not provide any understanding of the objects being classified *per se*, they provide a useful first insight into the different characteristics exhibited by objects of the same nature. Classifying, from a fundamental point of view, is however difficult: finding a typology of street patterns essentially amounts to classifying planar graphs, a non-trivial problem. The classification of street networks has previously been addressed by the space syntax community [22,23] and a good account can be found in the book by Marshall [24]. These works, although based on empirical observations, contain much subjectivity and our goal is to eliminate this subjective part to reach a non-ambiguous, scientific classification of these patterns.

An interesting direction is provided by the study of leaves and their classification according to their veination patterns [25,26], but with a notable difference which prevents us from a direct application to streets: the existence of a hierarchy of veins governed by their diameter (the width of streets is usually absent from datasets). Another enticing idea can be found in the mathematics literature: there exists an exact bijection between planar graphs and trees [27]. Using this bijection, classifying planar graphs would amount to classify trees, which is a simpler problem. However, this bijection does not take into account the geometrical shape of the planar graph: indeed two street patterns can have the same topology but cells could be of very different areas, leading to visually different patterns and to cities of different structures. It is thus important to take into account not only the topology of the planar graph—as described by the adjacency matrix—but also the position of the nodes. In order to do that, we propose in this article a method to characterize this complex object by extracting the ‘fingerprint’ of a street pattern. These fingerprints allow us to define a measure of the distance between two graphs and to construct a classification of cities.

## 2. Streets versus blocks

A major shortcoming of existing classifications is that they are based on the street network. This is problematic for two different reasons. First, there is no unambiguous, purely geometrical definition of what a street is: we could define it as the road segment between two intersections, as an almost straight line (up to a certain angular tolerance, see [12]), or we could also follow the actual street names. There is a certain degree of arbitrariness in each of these definitions, and it is not clear how robust a classification based on streets would be. Second, it seems that what is perceived by the human eye of a city map is not streets but the distribution of the shapes, area and disposition of blocks (figure 1).

A natural idea when trying to classify cities is thus to focus on blocks (or cells, or faces) rather than streets. A block can usually be defined without ambiguity as being the smallest area delimited by roads (it has then to be distinguished from a parcel which is a tax-related definition). While the information contained in the blocks and the streets are equivalent (up to dead-ends), the information related to the visual aspect of the street network seems to be easier to extract from blocks. Blocks are indeed simple geometrical objects—polygons—whose properties are easily measured. The properties of blocks and their arrangement thus seem to be a good starting point for attempting a classification of urban street patterns.

## 3. Characterizing blocks

Blocks are defined as the cells of the planar graph formed by streets, and it is relatively easy to extract them from a map. We have gathered road networks for 131 major cities across the world, spanning all continents (but Antartica), and their locations are represented on the map (figure 2). The street networks have been obtained from the OpenStreetMap database [5], and restricted to the city centre using the Global Administrative Areas database (or databases provided by the countries' administration). We extracted the blocks from the street network and removed undesired features (aspects that have no real-world counterpart but appear due to the particular way data are encoded in OpenStreetMap). We end up with a set of blocks, each with a geographical position corresponding to their centroids.

Blocks are polygons and as such can be characterized by simple measures. First, the surface area *A* of a block gives a useful indication, and its distribution is an important piece of information about the block pattern. As in [13,28], we find that for different cities, the distributions have different shapes for small areas, but display fat tails that decrease as a power law
3.1with an exponent of order *τ* ≈ 2 [6,7,13,21]. Although this seemingly universal behaviour gives a useful constraint on any model that attempts to model the evolution of cities' road networks, it does not allow cities to be distinguished from one another.

A second characterization of a block is through its shape, with the form (or shape) factor *Φ*, defined in the geography literature in [29] as the ratio between the area of the block and the area of the circumscribed circle
3.2

The quantity *Φ* is always smaller than one, and the smaller its value, the more anisotropic the block is. There is not a unique correspondence between a particular shape and a value of *Φ*, but this measure gives a good indication about the block's shape in real-world data, where most blocks are relatively simple polygons. The distributions of *Φ* displays important differences from one city to another, and a first naive idea would be to classify cities according to the distribution of block shapes given by *P*(*Φ*). The shape itself is however not enough to account for visual similarities and dissimilarities between street patterns. Indeed, we find for example that for cities such as New York and Tokyo, even if we observe similar distributions *P*(*Φ*) (figure 3), the visual similarity between both cities' layouts is not obvious at all. One reason for this is that blocks can have a similar shape but very different areas: if two cities have blocks of the same shape in the same proportion but with totally different areas, they will look different. We thus need to combine the information about both the shape and the area.

In order to construct a simple representation of cities which integrates both area and shape, we rearrange the blocks according to their area (on the *y*-axis) and display their *Φ* value on the *x*-axis (figure 3). We divide the range of areas in (logarithmic) bins and the colour of a block represents the area category to which it belongs. We describe this pattern quantitatively by plotting the conditional probability distribution *P*(*Φ|A*)*P*(*A*) of shapes, given an area bin (figure 3*b*,*d*). The coloured curves represent the distribution of *Φ* in each area category, and the curve delimited by the grey area is the sum of all of these curves and is the distribution of *Φ* for all cells, which is simply the translation of the well-known formula for probability conditional distribution
3.3

These figures give a ‘fingerprint’ of the city which encodes information about both the shape and the area of the blocks. In order to quantify the distribution of blocks inside a city, and thus the visual aspect of the latter, we will then use *P*(*Φ|A*) for different area bins. The comparison between these quantities provides the basis for the classification of street patterns that we propose here.

## 4. A typology of cities across the world

Two cities display similar patterns if their blocks have both similar area and shape. In other words, the shape distributions for each area bin should be very close, and this simple idea allows us to propose a distance between street patterns of different cities. More precisely, as one can see in figure 3, the number of blocks of area in the range [10^{3}, 10^{5}] (in square metres) dominate the total number of cells, and we will neglect very small blocks (of area less than 10^{3} m^{2}) and very large ones (of area more than 10^{5} m^{2}). We thus sort the blocks according to their area in two distinct bins
and

We denote by *f _{α}*(

*Φ*) the ratio of the number of cells with a form factor

*Φ*that lie in the bin

*α*over the total number of cells for that city. We then define a distance

*d*between two cities

_{α}*a*and

*b*characterized by their respective and 4.1

We tested different choices (*n* = 1 and *n* = 2) for *d _{α}* (

*a*,

*b*), and although they might change the position of some cities in the classification, our conclusions are robust. We then construct a global distance

*D*between two cities by combining all area bins

*α*4.2At this point, we have a distance between the patterns of two cities, and we measure the distance matrix between all the 131 cities in our dataset and perform a classical hierarchical clustering on this matrix [30]. We obtain the dendrogram represented in figure 4 and at an intermediate level, we can identify four distinct categories of cities, which are easily interpretable in terms of the abundance of blocks with a given shape and with small or large areas. In figure 5, we show the average distribution of

*Φ*for each category and show typical street patterns associated with each of these groups. The main features of each group are the following:

— In group 1 (comprising Buenos Aires, Argentina only), we essentially have blocks of medium size (in the bin

*α*_{2}) with shapes that are dominated by the square shape and regular rectangles. Small areas (in bin*α*_{1}) are almost exclusively squares.— Athens, Greece, is a representative element of group 2, which comprises cities with a dominant fraction of small blocks with shapes broadly distributed.

— Group 3 (illustrated here by New Orleans, USA) is similar to group 2 in terms of the diversity of shapes but is more balanced in terms of areas, with a slight predominance of medium-size blocks.

— Group 4, which contains for this dataset the interesting example of Mogadishu, Somalia, displays essentially small, square-shaped blocks, together with a small fraction of small rectangles.

The proportion and location of cities belonging to each group is shown in figure 2. Although one should be wary of sampling bias here, it seems that the type of pattern characteristic of group 3 (various shapes with larger areas) largely dominates among cities in the world. Interestingly, all North American cities (except Vancouver, Canada) are part of group 3, as well as all European cities (except Athens, Greece). The composition of the other continents is more balanced between the different groups. At a smaller scale within group 3 (figure 4), all European cities (except Athens) in our sample belong to the same subgroup of group 3 (the largest one, third from the top in figure 4). Similarly, 15 American cities out of the 22 in our dataset belong to the same subgroup of group 3 (the second largest one, fourth from the top in figure 4). Exceptions are Indianapolis (IN), Portland (OR), Pittsburgh (PA), Cincinnati (OH), Baltimore (MD), Washington (DC) and Boston (MA), which are classified with European cities, confirming the impression that these US cities have a European feel. These results point towards important differences between US and European cities, and could constitute the starting point for the quantitative characterization of these differences [31].

## 5. A local analysis

Cities are complex objects, and it is unlikely that a representation as simple as the fingerprint can capture all their intricacies. Indeed, cities are usually made of different neighbourhoods, which often exhibit different street patterns. In Europe, the division is usually clear between the historical centre and the more recent surburbs (a striking example of such differences is the Eixample neighbourhood in Barcelona, very distinct from other areas of the city). In order to illustrate this difference and to show that they also can be captured with our method, we isolate the different boroughs of New York, NY: the Bronx, Brooklyn, Manhattan, Queens and Staten Island. We extract the fingerprint of each borough, as represented in figure 6. The fingerprint of New York (bottom, figure 3) is indeed the combination of different fingerprints for each of the boroughs. While Staten Island and the Bronx have very similar fingerprints, the others are different. Manhattan exhibits two sharp peaks at *Φ* ≈ 0.3 and *Φ* ≈ 0.5, which are the signature of a grid-like pattern with the predominance of two types of rectangles. Brooklyn and the Queens exhibit a sharp peak at different values of *Φ*, also the signature of grid-like patterns with different rectangles for basic shapes.

## 6. Discussion and perspectives

We have introduced a new way of representing the road networks of cities, which can be seen as the equivalent of fingerprints for cities. It seems reasonable to think that the possibility of a classification based on these fingerprints hints at common causes behind the shape of the networks of cities in the same categories. Of course, this study has limitations: even if the shape of the blocks alone is good enough for the purpose of giving a rough classification of cities, we miss some aspects of the patterns. Indeed, the way the blocks are arranged together locally should also give some information about the visual aspect of the global pattern. Indeed, many cities are made of neighbourhoods, built at different times, with different street patterns. What is lacking at this point is a systematic, quantitative way to identify and distinguish different neighbourhoods and to describe the correlation between the positions of the blocks. Indeed, the New York boroughs, taken as examples in the last section, are administrative, arbitrary definitions of a neighbourhood. The reality is however more complex: similar patterns might span several administrative regions, or a given administrative division might host very distinct neighbourhoods. A further step in the classification would thus be to find a method to extract these neighbourhoods and integrate the spatial correlations between different types of neighbourhoods.

Despite the simplifications that our method entails, we believe that the classification we propose is an encouraging step towards a quantitative and systematic comparison of the street patterns of different cities. This, together with the specific knowledge of architects, urbanists, etc. should lead to a better understanding of the shape of our cities. Further studies are indeed needed in order to relate the various types that we observe to different urban processes. For example, in some cases, small blocks are obtained through a fragmentation process, and their abundance could be related to the age of the city. Consistency of cell shapes could be related to planning, such as in the case of Manhattan for example, but we also know with the example of Paris [7] that a large variety of shapes is also directly related to the effect of urban modification that does not respect the existing geometry.

## Data accessibility

All the data used in this article can be downloaded from the OpenStreetMap database. Information on how to download OpenStreetMap data is available at http://wiki.openstreetmap.org/wiki/downloading_data.

## Acknowledgements

We thank Vincenzo Nicosia for interesting discussions at an early stage of this project. We also thank Anne Bretagnolle, Maurizio Gribaudi, Vito Latora, Thomas Louail, Denise Pumain for stimulating discussions at various stages of this study.

- Received August 18, 2014.
- Accepted September 17, 2014.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.