Cities are characterized by concentrating population, economic activity and services. However, not all cities are equal and a natural hierarchy at local, regional or global scales spontaneously emerges. In this work, we introduce a method to quantify city influence using geolocated tweets to characterize human mobility. Rome and Paris appear consistently as the cities attracting most diverse visitors. The ratio between locals and non-local visitors turns out to be fundamental for a city to truly be global. Focusing only on urban residents' mobility flows, a city-to-city network can be constructed. This network allows us to analyse centrality measures at different scales. New York and London play a central role on the global scale, while urban rankings suffer substantial changes if the focus is set at a regional level.
Ever since Christaller proposed the central place theory in the 1930s , researchers have worked to understand the relations and competition between cities leading to the emergence of a hierarchy. Christaller envisioned an exclusive area surrounding each city at a regional scale to which it provided services such as markets, hospitals, schools, universities, etc. The services display different level of specialization, inducing thus a hierarchy among urban areas according to the type of services offered. In addition, this idea naturally brings an equidistant distribution of urban centres of similar category as long as no geographical constraints prevent it. Still, in the present globalized world relations between cities go beyond mere geographical distance. In order to take into account this fact, it was necessary to introduce the concept of world city . These are cities that concentrate economic warehouses such as the headquarters of large multinational companies or global financial districts, of knowledge and innovation as the cutting edge technological firms or universities, or political decision centres, and that play an eminent role of dominance over smaller, more local, counterparts. The concept of global city is, nevertheless, vague and in need of further mathematical formalization. This is attained by means of so-called world city networks, in which each pair of cities is linked whether they share a common resource or interchange goods or people [3–7]. For instance, a link can be established if two cities share headquarters of the same company [7–9], if both are part of good production chains , interchange finance services , Internet data  or if direct flights or boats connect them [4,13–15]. Centrality measures are then applied to the network and a ranking of the cities naturally emerges. Due in great part to their geographical locations and traditional roles as trans-Atlantic bridges, New York and London are typically the top rankers in many of these studies [5,9,14]. There are, however, inconsistencies in terms of the meaning and stability of the results obtained from different networks or with different centrality measures [14,16] and a more organic and stable definition is needed.
Here, we use information and communication technologies (ICT) to approach the problem from a different perspective. How long would information originating from a given city reach any other city if it were to pass from person to person only through face to face conversations? Or, in other words, what is the likelihood that information reaches a certain distance away after a given time period. In this experiment, the most central place in the world would simply be the place from which a message could reach everywhere else in the shortest amount of time. This view allows us to easily define a temporal network of influence.
We perform this analysis by empirically observing how people travel worldwide and using that as a proxy for how quickly our message would be able to spread. The recent popularization and affordability of geolocated ICT services and devices such as mobile phones, credit or transport cards generates a large quantity of real-time data on how people move [17–25]. This information has been used to study questions such as interactions in social networks [26–29], information propagation , city structure and land use [23,31–40], or even road and long-range train traffic . It is bringing a new era in the so-called science of cities by providing a basis for systematic comparison of the structure of urban areas of different sizes or in different countries [37,38,40,42–47]. Data from credit cards and mobile phones are usually constrained to a limited geographical area such as a city or a country, whereas those generated from online social media as Twitter, Flickr or Foursquare can refer to the whole globe. This is the reason we focus here on geolocated tweets, which have already proven to be an useful tool to analyse mobility between countries  and provide the ideal framework for our analysis.
In particular, we select 58 of the most populated cities in the world and analyse their influence in terms of the average radius travelled and the area covered by Twitter users visiting each of them as a function of time. Differences in the mobility for local residents and external visitors are taken into account, in such a way that cities can be ranked according to the extension covered by the diffusion of visitors and residents, taken both together and separately, and by the attractiveness they exhibit towards visitors. Finally, we also consider the interaction between cities, forming a network that provides a framework to study urban communities and the role cities play within their own community (regional) versus a global perspective.
2. Material and methods
2.1. Twitter dataset
Our database contains 21 017 892 tweets geolocated worldwide written by 571 893 users in the temporal period ranging from October 2010 to June 2013 (1000 days). There are on average 36 tweets per user. Non-human behaviour or collective accounts have been excluded from the data by filtering out users travelling faster than a plane (750 km h−1). For this, we have computed the distance and the time spent between two successive geolocated tweets posted by the same user. The geographical distribution of tweets is plotted in figure 1. The distribution matches population density in many countries, although it is important to note that some areas are under-represented such as, for example, most of Africa and China.
We take as reference 58 cities around the world (see electronic supplementary material, table S1, for a detailed account) that are both highly populated (most are among the 100 most populated cities in the world) and have a sufficiently large number of geolocated Twitter users. To avoid distortions imposed by different spatial scales and urban area definitions that can be problematic [49,50], we operationally defined each city to be a circle of radius 50 km around the respective city hall.
In order to assess the influence of a city, we need to characterize how users travel after visiting it. To do so, we consider the tweets posted by user υ Δt days after visiting city c. In figure 2, the locations of geolocated tweets are plotted according to the number of days since the first visit in Paris and New York as an example. Not surprisingly, a large part of the tweets are concentrated around these cities but one can observe how users eventually diffuse worldwide.
2.2. Definition of the user's place of residence
To identify the Twitter users' place of residence, we start by discretizing the space. To do so, we divide the world using a grid composed of 100 × 100 km2 cell in a cylindrical equal-area projection. In total, there are approximately 5000 inhabited cells in our dataset. The place of residence of a user is a priori given by the cell from which he or she has posted most of his/her tweets. However, to avoid selecting users who did not show enough regularity, we consider only those users who posted at least one-third of their tweets form the place of residence (representing more than 95% of the overall users). For each city, the number of valid users as well as the number of tweets posted from their first passage in the city are provided in electronic supplementary material, table S1.
We can now determine for each city whether a user is resident (local user) or a visitor (non-local user). To do so, we compute the average position of the tweets posted from his/her cell of residence. If this position falls within the city boundaries (circle of radius 50 km around the city hall), the user is considered as a local and as a non-local user otherwise.
2.3. Metrics to assess city influence
We select a fixed number of users u in each city at random and track their displacements in a given period of time Δt as their first tweet from it. As the results might depend on the specific set of users chosen, we average over 100 independent user extractions. As shown in electronic supplementary material, figure S2, the longer Δt is, the lower the population of users who remain active, so we must establish a trade-off between number of users and activity time. Unless otherwise stated, we set u = 300 and Δt = 350 days in the discussion that follows.
2.3.1. Average radius
There are different aspects to take into account when trying to define how to properly measure the influence of a city due to human mobility. We start our discussion by considering the average radius travelled by Twitter users since their first tweet from a city c. We tracked for each user the positions from which he or she tweeted after visiting c, and compute the average distance from these locations to the centre of c. The average radius, R, is then defined as the average over all the u users of their individual radii.
The average radius is informative but can be biased by the geography. Cities that are in relatively isolated positions such as islands may have a high average radius just because a long trip is the only option to travel to them. To avoid this effect, we define the normalized average radius of a city c as the ratio between R(c) and the average distance of all the Twitter users' places of residence to c (electronic supplementary material, figure S4).
One possible way to overcome the limitation of the average ratio defined above is to discard geographical coherence all together and simply measure the area covered by those users, regardless of the distance at which it might be located from the originating city. In order to estimate the area cover by the users, the world surface has been divided into cells of 100 × 100 km2 as we have done to identify the users' place of residence. By tracking the movements of the set of users passing through each city, we count the number of cells from which at least a tweet has been posted and define coverage as this number. This metric has the clear advantage of not being sensitive to isolated locations but it still does not consider how specific cells, specifically those corresponding to other important cities, are visited much more often than others.
3.1. Comparing the influence of cities
We start by taking the perspective from the city to the world and compare how effective the cities are as starting points for the Twitter users' diffusion. The evolution of the average radius as a function of the time is plotted in figure 3 for the 58 cities. The curves of the log–log plot show an initial fast increase followed by a much slower growth after approximately 15–20 days. The presence of these two regimes is mainly due to the presence of non-local users as it can be observed in electronic supplementary material, figure S5. In the initial phase, the radius grows for all the cities at a rhythm faster than the square root of time, which is the classical prediction for 2D Wiener diffusion . This is not fully surprising as the users' mobility is better described by Levy flights than by a Wiener process. Still the differences between cities are remarkable. There are two main behaviours: the radius for cities such as Detroit grows slowly, whereas others like Paris show an increase that is close to linear. After this initial transient, the average radius enters in a regime of slow growth for all the cities that is even slower than . This implies that the long displacements by the users are concentrated in the first month, period during which the non-local users come back home, after which the exploration becomes more localized. Even though the curves of different cities may cross in the first regime, they reach a relatively stable configuration in the second one. We can see that the top ranker in terms of capacity of diffusion is Hong Kong for the whole time window considered and the bottom one is Bandung (West Java, Indonesia).
The top 10 cities according to the average radius are plotted in figure 4a. It is worth noting that New York only appears in the last position, in contrast to previously published rankings based on different approaches [5,9,14]. Many cities on the top are in the Pacific Basin (Hong Kong, Sydney, Beijing, Taipei, San Francisco and Shanghai), which is clear evidence for the impact of geography on R. We take geographical effects into account by calculating the normalized radius as shown in figure 4b. With this correction, the top cities are Rome, Paris and Lisbon. These cities are located in densely populated Europe but still manage to send travellers further away than any other, proof of their aptitude as sources for the spread of information as described in the introduction. Actually, all cities in the top 10 set are also able to attract visitors at a worldwide scale, some are relatively far from other global cities and/or they may be the gate to extensive hinterlands (China). The same ranking for the coverage is shown in figure 4c. Even though these two metrics are strongly correlated (see electronic supplementary material, figure S6), there are still some significant differences indicating that they are able to capture different information. The top cities, however, are again Rome, Paris and Lisbon probably due to a combination of the factors explained above. It should also be noted that even though the users extraction is stochastic and the rankings can vary slightly from one realization to another (see electronic supplementary material, figure S7), the ranking is stable when averaged over several realizations (electronic supplementary material, figure S8).
3.2. Local versus non-local Twitter users
We have yet to take into account that individuals residing in a city might behave differently from visitors. We consider a user to be a resident of a city if most of his/her tweets are posted from it. Otherwise, he/she is seen as an external visitor. Residents of the 58 cities we consider have a significantly lower coverage (about 96) than visitors (about 260). This means that the locals move towards more concentrated locations, such as places of work or the residences of family and friends, while visitors have a comparatively higher diversity of origins and destinations.
The difference between locals and non-locals is even more dramatic when the normalized radius, , for each city is plotted as a function of the coverage for both types of users in figure 5a. Two clusters clearly emerge showing that the locals tend to move less than the visitors. This difference between users is likely to be behind the change of behaviour in the temporal evolution of the average radius detected in figure 3, and introduces the ratio of visitors over local users as a relevant parameter to describe the mobility from a city. Indeed, visitors contribute the most to the radius and the area covered (see figure 5b for the coverage) while residents contribute most to the local relevance of a city (figure 5c for the coverage and electronic supplementary material, figure S10a, for ). The top rankers in this classification are Hong Kong and San Francisco in and Moscow and Beijing in the coverage. All of them are cities that may act as gates for quite extensive hinterlands. The rankings based on non-locals (figure 5d for the coverage and electronic supplementary material, figure S10b, for ) get us back the more common top rankers such as Paris, New York and Lisbon.
3.3. City attractiveness
Thus far, we have considered a city as an origin and analysed how people visiting it diffuse across the planet. We now consider the attractiveness of a city by taking the opposite point of view and analysing the origins of each user seen within the confines of a city. We modify the two metrics defined above to consider the normalized average distance of the users' residences (represented by the centroid of the cell of residence) to the centre of the considered city c and the number of different cells where these users come from. In this case, the two metrics are averaged over 100 independent extractions of u = 1000 Twitter users per city. The resulting rankings depict the attractiveness of each city from the perspective of external visitors: how far are people willing to travel to visit this city? The top 10 cities are shown in figure 6 for the coverage (see electronic supplementary material, figure S11, for the normalized average radius). Rome, Paris and Lisbon are also quite consistently the top rankers in terms of attractiveness to external visitors.
3.4. A network of cities
Finally, we complete our analysis by considering travel between the 58 selected cities. We build a network connecting the 58 cities under consideration where the directed edge from city i to city j has a weight given by the fraction of local Twitter users in the city i which were observed at least once in city j. For simplicity, in what follows, we consider only local users who left their city at least once. This network captures the strength of connections between cities allowing us to analyse the communities that naturally arise due to human mobility. Using the OSLOM clustering detection algorithm [52,53], we find six communities as shown in figure 7. These communities follow approximately the natural boundaries between continents: two communities in North and Central America, one community in South America, another in Europe, two communities in Asia (Japan and rest of Asia plus Sydney), indicating that they correspond to economic, cultural and geographical proximities. Similar results were obtained using the Infomap  cluster detection algorithm, confirming the robustness of the communities detected.
With these empirical communities in hand, we can now place each city into a local as well as a global context. In a network context, the importance of each node can be measured in different ways. Two classical measures are the strength of a node  and the weighted betweenness [56,57]. Given the way we defined our network above, these correspond, roughly, to the fraction of local users that travel out of a city and how important that city is in connecting travellers coming from other cities to their final destinations. In the inset of the figure 7, we analyse the ranking resulting from these two metrics and identify New York and London as the most central nodes in terms of degree and betweenness and, particularly, New York for the weighted degree at a global scale. However, when we restrict our analysis to just the regional scene of each community, the relative importance of each city quickly changes. The rankings for the regional weighted degree are similar to the global ones as this metric depends only on the population of each city and not on who it is connected to. The most central cities occupy the same positions except for San Diego, which slipped down three places. On the other hand, the weighted betweenness is a property that depends strongly on the network topology, a property that can be seen by the dramatic shifts we observe when considering only the local community of each city with most cities moving several positions up or down (see details in table 1 and electronic supplementary material, table S2). For example, San Diego went down nine places meaning that this city has a global influence due to the fact that San Diego is a communication hub between United States and Central America. Dallas went up six places, indicating that its influence is higher at the regional scale rather than in the international arena. In the same way, Madrid went down four places, whereas Barcelona stayed at the same place, this means that Madrid is more influential than Barcelona on a global scale as an international bridge connecting Europe and Central and South America but not on a regional (European) scale.
The study of competition and interactions between cities has a long history in fields such as geography, spatial economics and urbanism. This research has traditionally been based on information from finance exchanges, sharing of firm headquarters, number of passengers transported by air or tons of cargo dispatched from one city to another. One can define a network relying on these data and identify the so-called world cities, those with a higher level of centrality as the global economic or logistic centres. Here, we have taken a radically different approach to measure quantitatively the influence of a city in the world. Nowadays, geolocated devices generate a large quantities of real-time and geolocated data permitting the characterization of people mobility. We have used Twitter data to track users and classify cities according to the mobility patterns of their visitors. Top cities as mobility sources or attraction points are identified as central places at a global scale for cultural and information exchanges. This definition of city influence makes possible its direct measurement instead of using indirect information such as firm headquarters or direct flights. Still, the quality of the results depends on the capacity of geolocated tweets to describe local and global mobility. Indeed, observing the World through Twitter data can lead to possible distortions, economic and sociodemographic biases, the Twitter penetration rate may also vary from country to country leading to an under-representation of the population, for example, from Africa and from China. The cities selected for this work are those that, on one hand, concentrate large populations and, on the other, have sufficient tweets to be part of the analysis. There are biases acting against our work, such as the lack of coverage in some areas of the world, and others in favour, such as the fact that younger and wealthier individuals are more likely to both travel and use Twitter. The estimated mobility patterns are naturally partial as they only refer to selected cities. Still, as long as the users provide a significant sample of the external urban mobility, the flow network is enough for the performed analysis. Furthermore, several recent works have proven the capacity of geolocated tweets to describe human mobility, comparing different data sources as information collected from cell phone records, Twitter, traffic measure techniques and surveys [23,24,41].
More specifically and assuming data reliability, we consider the users' displacements after visiting each city. The urban areas are ranked according to the area covered and the radius travelled by these users as a function of time. These metrics are inspired by the framework developed for random walks and Levy flights, which allows us to characterize the evolution of the system with well-defined mathematical tools and with a clear reference baseline in mind. Previous literature rankings usually find a hierarchy captained by New York and London as the most central world cities. The ranks dramatically change when one has taken into account users' mobility. A triplet formed by Rome, Paris and Lisbon consistently appear on the top of the ranking by extension of visitor's mobility but also by their attractiveness to travellers of very diverse origin. A combination of economic activity appealing to tourism and diversity of links to other lands, in some cases the product of recent history, can explain the presence of these cities on the top. These three cities are followed by others such as San Francisco, which without being one of the most populated cities in the US extends it influence over the large Pacific basin, or Hong Kong, Beijing and Shanghai, which replicates that on the other side of the Pacific region. These cities are in some cases gates to broad hinterlands. This is relevant as our metrics have into account the diversity in the visitors' origins.
These results rely on the full user population, discriminating only by the place of residence between locals and non-locals to each city. The influence of cities measured in this way includes their impact on rural as well as on other urban areas. However, the analysis can be restricted to users residing in an urban area and to their displacements towards other cities. In this way, we obtain a weighted directed network between cities, whose link weights represent the (normalized) fluxes of users travelling from one city to another. This network provides the basis for a more traditional centrality analysis, in which we recover London and New York as the most central cities on a global scale. The match between our results and those from previous analysis brings further confidence on the quality of the flow measured from online data. The network framework permits clustering techniques to be performed and divides the world city network into communities or areas of influence. When the centrality is studied only within each community, we obtain a regional perspective that induces a new ranking of cities. The comparison between the global and the regional ranking provides important insights into the change of roles of cities in the hierarchies when passing from global to regional.
In summary, we have introduced a new method to measure the influence of cities based on Twitter user displacements as proxies for mobility flows. The method, despite some possible biases due to the population using online social media, allows for a direct measurement of a city's influence in the World. We proposed three types of rankings capturing different perspectives: rankings based on ‘city-to-world’ and ‘world-to-city’ interactions and rankings based on ‘city-to-city’ interaction. It is interesting to note that the most influential cities are very different according to the perspective and the scale (regional and global). This introduces the possibility of studying relations among cities and between cities and rural areas with unprecedented detail and scale.
M.L. designed the study, analysed the data and wrote the manuscript. B.G., A.T. and J.J.R. designed the study and wrote the manuscript. All authors read, commented and approved the final version of the manuscript.
We declare we have no competing interests.
Partial financial support has been received from the Spanish Ministry of Economy (MINECO) and FEDER (EU) under project INTENSE@COSYP (FIS2012–30634), and from the EU Commission through projects LASAGNE and INSIGHT. The work of M.L. has been funded under the PD/004/2013 project, from the Conselleria de Educación, Cultura y Universidades of the Government of the Balearic Islands and from the European Social Fund through the Balearic Islands ESF operational programme for 2013–2017. J.J.R. acknowledges funding from the Ramón y Cajal program of MINECO. B.G. was partially supported by the French ANR project HarMS-flu (ANR-12-MONU-0018).
- Received May 26, 2015.
- Accepted June 24, 2015.
- © 2015 The Author(s)
Published by the Royal Society. All rights reserved.