Sage Journals: Discover world-class research

Abstract

The fact that accessibility shapes the geographic distribution of activity needs to be addressed in any long-term policy and planning for urban systems. One major problem is that current accessibility measures rely on the identification and quantification of attractions in the system. We propose that it is possible to devise a network centrality measure that bypasses this reliance and predicts the distribution of urban activity directly from the structure of the infrastructure networks over which interactions take place. From a basis of spatial interaction modelling and eigenvector centrality measures, we develop what we call a preferential centrality measure that recursively and self-consistently integrates activity, attraction and accessibility. Derived from the same logic as Google’s PageRank algorithm, we may describe its operation by drawing a parallel: Google’s PageRank algorithm ranks the importance of networked documents without the need to perform any analysis of their contents. Instead it considers the topological structure of the network and piggybacks thereby on contextualised and deep evaluation of documents by the myriad distributed agents that constructed the network. We do the same thing with regard to networked geographical zones. Our approach opens up new applications of modelling and promises to alleviate a host of recalcitrant problems, associated with integrated modelling, and the need for large volumes of socio-economic data. We present an initial validation of our proposed measure by using land taxation values in the Gothenburg municipality as an empirical proxy of urban activity. The resulting measure shows a promising correlation with the taxation values.

Keywords

Accessibility urban activity centrality eigenvector centrality preferential attachment PageRank transportation spatial regression land value spatial interaction

Introduction

Spatial interaction is essential for urban activity and is ultimately afforded by the transportation network. Can the geographical distribution of urban activity thereby be inferred directly from some measure of centrality derived from the transportation system? In this paper, we combine theories from spatial interaction modelling (e.g. Wilson, 2000) and network centrality (e.g. Newman, 2008) to develop a model to test this hypothesis with encouraging results. As a framing, we begin by subdividing the problems faced by planners and theorists into: a planning problem that carries with it a modelling problem, and a data problem.

The planning problem concerns the need to integrate transport and land use to handle dynamical consequences of change. At its heart, the planning problem stems from the essential unpredictability of complex interactions within and between domains. For example, a newly constructed road may itself increase traffic by inducing new development attracted to improved accessibility along its extent.

Computational models are attractive as tools for studying these dependencies, which leads us to the modelling problem. If we begin unpacking the transportation and land use domains, many levels of fine-grained subsystems appear (e.g. Iacono et al., 2008). To make matters worse, these subsystems are not as internally integrated and externally separated as we may wish. Integrated models are near decomposable (Simon, 1962) in a complicated machine-like manner, while urban systems are wicked (Andersson et al., 2014). Integrated model systems and urban systems are not complex in the same way (Timmermans, 2003).

However, even if we were to solve the modelling problem, we would still be left with a data problem. Attempting to improve realism by integrating as much theoretical and empirical detail as possible (e.g. Waddell et al., 2003) leads to a two-fold problem. First, suitable and consistent data must be obtained. Second, empirical patterns must be expected to remain valid even as planning parameters are changed, which is particularly problematic for long-term forecasts.

Our intention is to strike at the modelling and data problems simultaneously by exploring an alternative approach. We aim to infer the distribution of urban activity, by modelling only the physical characteristics of geographical zones and their interactions, i.e. without reliance on any demographic data. Our centrality measures are derived from the same basis as Google’s PageRank algorithm (Brin and Page, 1998), but in our case the main input is the transportation network, which is used to infer the importance – or centrality – of the zones that it links. Our hypothesis is that this centrality concept is intimately linked with the concept of urban activity. The result is an expandable, scalable and portable model based on new principles that bypasses some of these key modelling and data problems in planning. The model may be reapplied anywhere in the world, and, with regard to data availability, it may be scaled up to the global level, opening up new vistas of possible applications besides those of traditional planning.

The first part of the paper concerns theoretical background and derivation of centrality models for predicting urban activity. We then present our data sources, followed by ‘Methods’ and ‘Results’ sections where the model implementation and empirical validation processes are described.

Theory

Background

From the common wisdom that cities tended, from early on, to be established on trade routes, natural ports or river crossings, stems the fundamental assumption of all spatial economic theories: a location with good accessibility is more attractive than locations with bad access. This is a fundamental assumption that theoretically goes back to Von Thünen (1826). A breakthrough study by Hansen (1959) demonstrated that locations with high accessibility were developed earlier and more densely than less accessible locations. On the same path, Alonso (1964) formulated a theory linking accessibility and land use. Following Krugman (1996) and Fujita et al. (1999), a great part of spatial development can be explained by the interplay between two major driving forces: (i) economies of scale and (ii) spatial factors such as transport costs and land prices.

To take the leap from these concepts towards an urban centrality measure, we propose to use a simplified model of urban economic activity in combination with a much more detailed spatial representation. This makes it possible to view the urban system as a network of interacting locations (Andersson et al., 2006; Barthélemy, 2011; De Montis et al., 2013).

Urban activity

A central concept in this paper is the notion of urban activity (denoted $a_{i}$ , for zone $i$ ). In our definition, urban activity is fundamentally tied to a location and to interactions. We do not differentiate between activity types but leave it as an aggregated intensity measure¹ corresponding to the sum of all interactions between a location and all other locations. Since it includes both social and economic interactions, it cannot be easily measured in total, which means that any modelling and empirical studies must resort to studying some relevant proxies. The monetary part of urban activity can be understood as a concept close to GDP, so that activity can be approximated by the sum of the market value of all (value-adding) production of goods and services taking place at a location at a certain point in time.

Local characteristics

A fundamental property of a location is its capacity to be adapted to human activity, determined by basic usability such as local access to buildable land and infrastructure. These local characteristics (denoted $R_{j})$ correspond to the attractivity of a zone ‘in itself’. Details about how we have calculated the local attractivity characteristics are described in the ‘Methods’ section.

Accessibility and centrality

Consider the accessibility to attractions as defined by Hansen (1959); $A_{i} = \sum_{j} W_{j} f (c_{i j}),$ where $W_{j}$ is the index of attraction of $j$ , $c_{i j}$ is a measure of distance or travel time of moving between $i$ and $j,$ and $f$ is a decreasing function. One way of describing centrality is by stating that a location is central if it has strong accessibility to other central locations, which can be formalised by replacing attraction $W_{j}$ with accessibility $A_{j}$ itself, to arrive at a recursive eigenvector centrality definition, $A_{i} = \sum_{j} A_{j} f (c_{i j}) .$

This concept is powerful and forms the basis for the measures that we elaborate in this paper. One outcome of such a centrality concept is the famous PageRank algorithm used by Google (Brin and Page, 1998), which enables a ranking of web documents with regard to their importance. Documents on the internet are given a higher ranking if they are linked to from other pages with high ranking. Notably, at no point, the search engine has to analyse the semantic contents of the documents, which is exactly what it seeks to rank the importance of. This approach has also been applied to physical road networks by, e.g. Jiang (2006) and Chin and Wen (2015), with the main objective to describe human movement. El-Geneidy and Levinson (2011) have tackled the centrality calculation from a different direction, by using data on actual flows as a starting point. Our proposed centrality measures are also based on flows of interactions, but without any requirements of specific travel data. Instead, the computations are performed by modelling these flows using a general interaction function with infrastructure network data as input (although modelling accuracy could likely be improved by using detailed empirical interaction data).

Using centrality measures based on the road network to predict urban flows and activities is not a new idea, see for example Hillier and Hanson (1989), Porta et al. (2009), Sevtsuk and Mekonnen (2012) and Gao et al. (2013). However, the measures that have been mostly in focus (closeness and betweenness centrality) cannot easily be incorporated into a spatial interaction modelling framework, which is our main reason for instead exploring extensions of eigenvector centrality.

Closing the loop from activities to flows and back again to activities

Our modelling approach departs from classical spatial interaction modelling (Batty, 2013; Wilson, 2000), where local activity levels $a_{i}$ are exogenous variables, appearing as specific aspects of local activity, such as population or purchasing power. We then ask whether we may instead infer the distribution of activity from knowledge about the other variables, in particular, the information embodied by infrastructure networks. The causal rationale for this belief is, first, that large-scale infrastructure change is a relatively slow process, which implies that land use, activity levels and interaction flows have enough time to adapt to a semi-static infrastructure network. Second, even to the extent that the timescales of road and land use change do overlap, actual planning practices link according to ideas of need and geographical importance, so the effect also of the reciprocal dynamics goes in the same direction.

From activity to spatial interaction

Spatial interaction models arise by subjecting the logic of the gravity model to local constraints on the size of flows in the system. Flows of interactions between zones can then be estimated, by distributing economic flows from origins to destinations in proportion to their relative attractions (see Figure 1). As noted by Wilson (2000) such a model formulation will take into account the competition between different locations for attracting incoming flows.

Figure 1.

Deriving flows from activity and attractivity. The flow is shown as unidirectional, but a flow in the opposite direction is also present and can be computed analogously. See supplemental material for a detailed derivation of the interaction model.

From spatial interaction back to activity

In many cases, the distribution of activities in the system is of interest in itself. Salient questions include how infrastructural change affects things like urban extent, patterns of interaction, housing, jobs, and so on. Infrastructural data are considerably more widely available, complete and consistent than demographic and economic data on the nebulous concepts of activity and attraction, which we must approach via its rich flora of expressions such as buildings, land value and population. If we can tease most of the information we need out of the infrastructure of interactions, we are in a much better shape with regard to data supply but also with regard to model design. We may then circumvent the need to figure out how various sub-models interact, and we are at least less exposed to the ontological mismatch between models and reality.

In Figure 2 we outline the logical sequence in which we develop our preferential centrality model by using a ‘quasi-growth model’ – quasi since it embodies a growth logic but is really used in an iterative process to find a stable equilibrium distribution of activity. First, we assume that activity quasi-growth is proportional to the sum of flows entering the zone. Second, attraction $W_{j}$ is refined into an intrinsic property equal to our measure of local characteristics, $W_{j} = R_{j}$ . Now, if we begin with activity uniformly distributed across the system, and we redistribute it according to this logic we arrive at an iterative algorithm $a_{j} (t + 1) = C [a_{j} (t) + ϵ \sum_{i} S_{i j} (t)]$ (1)with the equilibrium distribution $a_{j} = R_{j} \sum_{i} \frac{a_{i} f (c_{i j})}{\sum_{k} R_{k} f (c_{i k})}$ (2)

Figure 2.

From spatial interaction to activity modelling.

independently of the quasi-growth constants $C$ and $ϵ$ . See supplemental material for the full derivation of this self-referring equilibrium condition that can be restated as $a_{j} = \sum_{i} a_{i} M_{i j}$ , where $M_{i j} = \frac{R_{j} f (c_{i j})}{\sum_{k} R_{k} f (c_{i k})}$ . The adjacency matrix $M_{i j}$ corresponds to a transformation of the physical network and the activity will correspond to the eigenvector centrality of this weighted, transformed network. Thus, we can infer the structure of urban activity from the physical linking of places, similar to how the PageRank centrality algorithm can infer the relative importance of pages from the hyperlink structure.

The model may be substantially improved by positing that activity in itself stimulates attractivity, $W_{j} = {a_{j} + α R}_{j}$ , which results in a modification of the equilibrium formulation $a_{j} = (a_{j} + α R_{j}) \sum_{i} \frac{a_{i} f (c_{i j})}{\sum_{k} (a_{k} + α R_{k}) f (c_{i k})}$ (3)

We call this new non-linear measure preferential centrality, because the activity-dependent attraction can be thought of as a continuous version of preferential attachment (Barabási and Albert, 1999) for the activity interaction network. The resulting equation can be solved for $a_{j}$ by iteration. However, unique or positive solutions are not guaranteed for low values of $α .$

Interaction function

The most common choices for interaction functions are the exponential function $f_{e} (c_{i j}) = e^{- β c_{i j}}$ and power law decay $f_{p} (c_{i j}) = c_{i j}^{- β}$ . If we were studying a single type of activity, it would be reasonable to assume a specific spatial scale of interaction, which is something that the exponential function captures well. However, our generalised concept of urban activity implies a mix of interactions on all scales which makes it more reasonable to use the power law function. Generally, the choice of interaction function is of course an empirical question.

Data

The data used for this study are of three kinds: road network, property polygons and land taxation values. The road network is used for three purposes: finding accessible areas within the polygons, finding connections from the polygons onto the road network and finally performing the distance calculations between zones. The property polygons are assigned a land taxation value from the taxations database according to a common identifier. They are thereafter aggregated into zones based on area and type code. In this study, the municipality of Gothenburg is chosen as a prototype area to develop, test and validate the model.

Roads and streets are imported with preserved topology and attributes from Open Street Map (OSM). OSM has been subject to questions about its quality, but studies have found that the data quality is on pair with other data sources (Dhanani et al., 2012; Haklay, 2010). The reasons for choosing OSM are several: it is readily available to download, it contains the necessary attributes for the calculation, it has worldwide coverage for future expansions of the model and the data are open.

The entire extent of Sweden is partitioned into ‘properties’. Properties are either owned by individuals or juridical entities, or they can be jointly owned in the form of associations. The precision and quality of these data is high, since the purpose is to establish and prove ownership (which needs to be precise and just). Properties are of different types and usages; therefore, they are classified and assigned a type code based on usage by the Swedish taxation authority. The extent and borders of these properties are obtained from the Swedish land survey.

The Swedish taxation authority assigns to all properties a taxation value that should represent about 75% of the market value. This value is arrived at by a procedure that takes several characteristics into consideration such as area, closeness to water, building type, sales values of the neighbouring properties, etc. The quality of these data is also very good in the sense that it is done according to a legal criterion, although the values for industries are a bit uncertain due to the fact that they are seldom sold. Therefore, these few sales have a disproportionately big impact on the industrial properties taxation values. This has to be taken into account for in the regression analysis. All the taxation values and type codes are acquired from the Swedish taxation authority.

Methods

The procedure for model exploration and validation is roughly composed of three steps: (1) data preparation in order to create the input for the activity model as well as preparing the empirical data used in the last step, (2) running the activity model and (3) finally, using the results from the models in a multiple spatial regression analysis with the empirical values.

For the activity model, we compare four different versions: the local model, the monocentric model, the iterative eigenvector model and the iterative preferential model. Our aim is to assess whether or not the more elaborate iterative models provide any additional predictive capabilities compared to the simpler versions. To find out whether the models are capable of capturing all of the spatial dependencies, we have performed spatial testing (Anselin, 1988) in the regression analysis.

Data preparation

Spatial entities

The spatial entities used in the activity model and the multiple regression analysis are chosen to be realised as zones, defined as one or more aggregated properties. All properties smaller than 3000 square metre are aggregated to zones by dissolving common borders, if they have the same taxation type code.

Geographical analysis of polygon features is subject to the MAUP (Openshaw and Taylor, 1979). The way of spatial partitioning of land must therefore be carefully chosen. The justifications for using zones as spatial units are that properties are readily available, have a designated usage and can provide useful output in planning applications. Property-based zones also simplify the empirical comparisons, since model and data will have the same spatial representation.

Connection between road network and zones

We do not use detailed data about physical connections between zones and the road network. Instead approximate ‘virtual’ connections are created in the road network model by choosing the shortest Euclidean lines between zonal centroids and connection-permissible roads. Motorways, trunk roads and other roads with high speed limits are not considered permissible for these virtual connections.

Zonal weights – Local characteristics

A zonal weight ( $R_{i}$ ) is assigned to every zone $i$ based on accessible, buildable and permitted areas. Generally, the weight can also be modified with different types of (physical) attractivity factors.

Accessible areas are here stipulated as land that can be accessed from roads. Therefore, the assumption in the model is that only the area within a certain distance from a road is possible to develop. These areas are created by buffering the roads (30 meters in the baseline case) and doing a union overlay onto the properties.

Buildable areas are hereby defined as firm ground suitable for buildings. Areas used by (or very close to) road or rail infrastructure are not considered as buildable.

Permitted areas are those that, according to planning restrictions, are allowed for development. In our current model implementation, productive forestry, agricultural land and areas used for special purpose buildings are considered as not permitted.

A basic attractivity factor is closeness to open water, which can have a large effect on land value and land taxation. Since our study area (Gothenburg) is situated by the coast we must include some approximation for this effect. We have chosen to include the water attraction as a multiplicative factor of 1.5 for the zonal weights for zones with centroids within 500 meters of the coastline.

Implementation of the activity model

To arrive at zone-to-zone impedances $c_{i j}$ , Dijkstra’s algorithm is used to identify the shortest paths in the road network weighted by segment travel times (taking into account speed limits). A constant impedance penalty (comparable to 1 minute in the baseline case) is added to all relations to reflect the cost of starting and ending an interaction. Zones are assumed to not interact with themselves, i.e. $f (c_{i i}) = 0$ . As a baseline interaction function we have used the power law decay, $f (c_{i j}) = c_{i j}^{- β}$ , with $β = 2$ .

The eigenvector activity model is implemented by using simple iterative updating of the activity for all zones. Initial activity is chosen to equal local zonal weights, i.e. $a_{i} (t = 0) = R_{i}$ . Zonal weights are then considered static during the iteration. For every iteration a new activity vector is computed using equation (1). Total activity is kept constant in every iteration by a global normalisation. The relative vector norm of activity differences between subsequent iterations is compared to a predefined tolerance value (we have used 10⁻⁵), to determine if a good enough approximation to the equilibrium is found.

The implementation of the preferential model is identical to the eigenvector model in all aspects except from the additional mechanism of activity-dependent attractivity. This mechanism introduces the parameter $α$ , for which we have chosen a value as low as possible, but that still results in a convergent iterative process. This principle gives the largest possible difference of activity configuration in comparison to the eigenvector model, since increasing values of $α$ can bring the results of the preferential model arbitrarily close to the eigenvector model. In the baseline case, the application of the principle resulted in $α = 1.625$ .

Compared to the iterative models, the monocentric version is simpler. It is derived by assuming that all zones only interact with the most central zone, defined in the implementation as the zone closest to Gothenburg Central Station. For a full description of this model version, see supplemental material.

Zonal weights are mainly used as input to the iterative activity models. However, for comparative purposes we also investigate a local activity model, without any interaction between zones. It is implemented using direct proportionality between zonal weights and activity.

Spatial regression

Preparation of the spatial regression analysis data

The two independent variables are the prediction from the activity model and the amount of industrial area per zone. The reason to include the amount of industrial area in the regression model is that industrial properties have on average a lower taxation value due to the taxation process.

The dependent variable is the property taxation value. For some records in the taxation database, there is not a 1:1 relationship to property polygons. We handle this by aggregation, de-aggregation and filtering. We start from 60,137 property polygons and arrive at 27,628 zones after aggregation. Out of these, we have empirical taxation values for 12,062 zones, hence only they are used in the regression.

Weight matrix creation

In order to specify a regression model with spatial diagnostics, a spatial weights matrix has to be created. The weights matrix in this study is created by using the impedance of the road network between all places and then apply a cut-off value in order to determine which zones are to be treated as adjacent ones. We have chosen a cut-off value that is 3000 meters. To examine the robustness of the model, a weight matrix based on Euclidian distance of 600 meters is also tested in the regression.

Investigating spatial dependencies

To examine the presence of spatial dependence, an analysis of Moran’s I for the model values and empirical values is made (Haining, 2003; Moran, 1950). This test (see Table 1) shows that both preferential model values and taxation values are subject to a rather strong spatial autocorrelation while the local weights are not.

Table 1.

Indicators for spatial autocorrelation.

Variable	Moran’s I
Land taxation value (dependent)	0.34
Local weights (independent)	0.04
Preferential model prediction (independent)	0.47
Industrial areas (independent) used as correction factor	0.24

This finding indicates that spatial diagnostics need to be evaluated in the regression analysis, to make sure that all spatial autocorrelation is taken care of. The finding that local weights are virtually not at all spatially autocorrelated tells us that they cannot sufficiently explain the variation in the empirical property taxation values.

Ordinary least squares (OLS) with spatial diagnostics

An OLS with both spatial and non-spatial diagnostics is performed in order to know whether the dependent variable’s spatial autocorrelation is captured by the independent variables (which would mean that an ordinary OLS is sufficient). If not, the diagnostics are used as guidance for the next steps in order to take care of the spatial autocorrelation (Anselin, 1988). This results in a collection of diagnostics that need to be analysed:

Diagnosis for non-normal error distribution, Jaque–Bera (JB) test.

Diagnostics for heteroscedasticity, Breusch–Pagan and Koenker–Bassett tests (B–P and K–B).

Diagnostics for spatial autocorrelation, Lagrange multipliers (LM) tests and Moran’s I on the residuals.

Comparative indicators for model fitness and validity

To evaluate and compare models, $R^{2}$ is commonly used but is not reliable when residual spatial autocorrelation is present. Therefore, the Schwarz information criterion is also used (Anselin and Rey, 2014).

When spatial autocorrelation is present in the residuals, the observations are not independent from each other, hence the regression model is not valid. This is investigated with the LM tests; if they are significant it indicates that some measure like using a spatial lag or spatial error model has to be taken in order to handle the remaining spatial autocorrelation (Anselin, 1988). If the LM (or robust LM) test for spatial error model is significant while the tests for lag model are not, a spatial error model is probably the right way to go, and vice versa. If both tests are significant, the regression analysis is not valid and there is no indication of any spatial model that can make it valid. In that case the model has to be respecified (Anselin and Rey, 2014). This procedure has been used in this study for guidance in the search for a good and valid model.

Software

For the data preparation, cleaning and aggregation, FME was used. The activity models were implemented in python, using the packages OSMnx (Boeing, 2017) and NetworkX (Hagberg et al., 2008). The spatial statistical analysis was performed in GeoDa (Anselin et al., 2006).

Results

Model validity and fitness

All models except the preferential models have all the LM tests significant, which invalidates them due to untreated spatial autocorrelation. The local and industrial models are included just as control, to see that it is actually the activity model prediction that is responsible for the good results. The other indicators on model fitness shown in Table 2 imply that the preferential model is the best choice, even before considering and applying the spatial error model.

Table 2.

Results from the spatial regression. A better fit is indicated by a lower Schwarz and a higher R². For Morans’ I, low values indicate low spatial autocorrelation. The pseudo R² value in a spatial error model is computed differently than in a standard OLS, which means that the R² for the preferential spatial error model is not directly comparable to the other R² values in the table.

Model version	R ²	Morans’ I on residuals	Schwarz information criterion	Model valid?
Industrial area coverage (as control)	0.00	0.34	20842	No, since all LM tests are significant
Local	0.40	0.42	14644	No, since all LM tests are significant
Monocentric	0.54	0.24	11329	No, since all LM tests are significant
Eigenvector	0.54	0.24	11470	No, since all LM tests are significant
Preferential	0.58	0.16	10297	No, not as non-spatial OLS, since LM tests are significant
Preferential spatial error model	(Pseudo)0.66	Not applicable(none)	7792	Yes, since remaining spatial autocorrelation is taken care of as error term

For the preferential model, the robust version of the LM test for error model was significant (0.00) while the robust version of the LM test for lag model was not (0.83). This suggested that using a spatial error model is the correct approach (Anselin and Rey, 2014). Therefore, only the preferential spatial error model is usable for inference and predictions, although its spatially clustered errors (Anselin, 1995) are hiding some unknown spatial factors (see Figure 3).

Figure 3.

Preferential spatial error model: Predictions (top left), empirical land value (top right) and local weights (bottom left) are normalised with regard to zone area. Spatial residuals (bottom right) show the remaining spatially autocorrelated error term.

Other statistical tests on the preferential spatial error model

The low multicollinearity number (12) indicates that there is no problematic multicollinearity among the explanatory variables. Values < 30 are usually considered as unproblematic (Anselin and Rey, 2014).

The JB test is significant, which indicates a non-normal distribution of error terms. However, this test is less relevant, since this dataset is large (Anselin and Rey, 2014).

According to the B–P and K–B tests there is a significant heteroskedasticity in the model results. There can be multiple reasons for this where one possible cause is the aggregation of properties (Haining, 2003). The effects are not that great in these specific models, since the standard errors are very low on their own. It is therefore not considered as crucial for the conclusions of this study.

Sensitivity analysis

We have explored many variations of the key parameters, such as the preferentiality parameter $α$ , and the functional form and parameters of the interaction function. See supplemental material for details on these results. The main finding is that the preferential model seems to be robust with regard to changes in parameter values.

Discussion of results

Comparing the model versions

The eigenvector and monocentric models have decent performance; therefore, the interpretation of their results has been used as steps in the search for a valid model. The preferential spatial error model, besides being the only valid model, also performs well in absolute numbers with a pseudo $R^{2} = 0.66$ . Considering the small number of input data sources used, and the simple underpinning model assumptions, this level of correlation indicates that the proposed preferential centrality measure is promising.

Remaining challenges

In this paper, we have not aimed to present a full predictive model. Some improvements for moving in that direction are as follows:

To reduce uncertainty in the regression coefficients, heteroskedasticity should be sufficiently taken care of. Some more parameter variations as well as trying different levels of aggregation into zones might give some clues on how to handle this problem.

The preferential spatial error model still contains unknown spatial variables that are handled as a spatial error term together with standard residuals. To understand those errors can be helpful for further development of the model. Some ideas and suggestions for further investigation are as follows:

○ Different kinds of properties (i.e. commercial versus residential) might not be fully comparable in taxation terms.

○ Other transportation modes, such as pedestrian, bicycle and public transport are not captured in the current car-oriented implementation of the model

○ Truncation effects: this study is only investigating areas within the Gothenburg municipality, although the city also acts as a regional centre for a larger surrounding region.

In the preferential model, we have a parameter $α$ for which model fitness improves as it is lowered towards the threshold of iterative divergence. Perhaps the empirical system state corresponds to a non-convergent model outcome? To explore this hypothesis, the convergence criterion in the model can be replaced by a minimisation target.

Conclusions and ways forward

By using a theoretical concept of interaction-based centrality, we have demonstrated that it is possible to create an urban activity model with empirical validity, using only two data sources – road networks and property polygons. The empirical validation is based upon using land taxation values as a proxy for urban activity.

According to the comparative results from the spatial regression, local characteristics are far from enough to explain the geographical variation of land values. The activity intensity is also affected by the geographical ranking of the location: in the city and in the region. Including the distance to the city centre in a monocentric interaction model gives a seemingly better fit, but the spatial statistical tests show this model to be invalid for the geographical area that we study, indicating that a more elaborate model is warranted. With the introduction of our concept of preferential centrality, where initial concentrations of activity are assumed to ignite local feedback mechanisms that attract even more activity, we finally arrive at a valid regression model.

The preferential centrality model has several additional advantages compared to a monocentric approach. First, we avoid the requirement of having to manually identify the most central location. Instead the centrality model will endogenously determine central places and their relative importance. In a polycentric setting this is a crucial model feature. Second, in a planning context it can often be an important question in itself how the location and strength of urban centres are affected by planning interventions, such as new infrastructure. For example, the preferential model can be used to analyse the robustness of a city centre under the influence of suggested new road investments. Such an analysis is clearly not possible within a monocentric model framework.

Regarding data requirements, our approach is somewhat more demanding when compared to a basic monocentric model, since travel times must be computed between all zones and not only to the predefined centre. The number of zones needed (i.e. the spatial resolution) depends on context and further studies are needed to determine what levels of resolution that are adequate for different planning applications.

Our current model implementation is technically complicated and requires different pieces of software. This is, however, not a fundamental property of the approach and we aim in future work to achieve a workflow within a single open source framework, to open up for broader testing and practical application.

Before using our modelling approach in a practical planning context, further validation is needed: both cross-sectional by studying other and larger areas, and longitudinal by investigating changes in urban activity over a time period where the road network also has changed. For the purpose of this validation, we cannot escape the need to use empirical activity data, such as taxation values or night light data. However, since our sensitivity analyses show that model outcomes are fairly robust, a validated preferential centrality model should be transferrable to applications in different geographical settings, without any need for local economic or demographic data.

Supplemental Material

Supplemental material for Preferential centrality – A new measure unifying urban activity, attraction and accessibility

Supplemental Material for Preferential centrality – A new measure unifying urban activity, attraction and accessibility by Alexander Hellervik, Leonard Nilsson and Claes Andersson in EPB: Urban Analytics and City Science

Footnotes

Acknowledgements

The paper has benefitted from discussions and support in the context of the Spatial Morphology Group at Chalmers University of Technology.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This study was funded by the Norwegian Public Roads Administration,FORMAS – the Swedish Research Council for Environment,Agricultural Sciences and Spatial Planning (Grant Number 2015-124),and the Swedish Transport Administration.

Supplemental material

Supplemental material for this article is available online.

Note

Alexander Hellervik is a PhD student in Complex Systems at the Department of Space,Earth and Environment,Chalmers University of Technology. He also works as a strategic transport planner at the Swedish Transport Administration. His research interests focus on how the complex systems perspective can shed light on the interplay between accessibility and land use,both in theory and in practical planning contexts. He approaches the questions with a combination of theoretical models,computer simulation,data analysis and stakeholder dialogue.

Leonard Nilsson is a PhD student at the division of urban planning,Department of Architecture and Civil Engineering at Chalmers University of Technology. He is funded by the Norwegian Public Road Administration,in the Ferry free E39 project. The aim of the research is to investigate the societal impacts of new highway projects. More precisely this is about to understand how cities and parts of them grow,decline or change when the road network is changed. This is tackled as that the road network is providing accessibility,which in turn affects the land use,for example by labour market region’s transformations or changed access to services and amusements. The methods used are mainly simulations and statistical analysis.

Claes Andersson is an Associate Professor and Senior Researcher in Complex Systems at the Department of Space,Earth and Environment,Chalmers University of Technology,and external Fellow at the European Center for Living Technology in Venice. He has a PhD in Complex Systems and has held research positions at the Los Alamos National Laboratory and the University of Modena and Reggio Emilia. Claes’ research is focused at the long-term and large-scale evolution of societal systems and his research areas currently include the deep origins of human society,urban and regional dynamics,as well as fundamental issues in complex systems.

References

Alonso

(1964) Location and Land Use. Cambridge, UK: Harvard University Press.

Andersson

Frenken

Hellervik

(2006) A complex network approach to urban growth. Environment and Planning A 38: 1941–1964.

Andersson

Törnberg

(2014) Societal systems – Complex or worse? Futures 63: 145–157.

Anselin

(1988) Lagrange multiplier test diagnostics for spatial dependence and spatial heterogeneity. Geographical Analysis 20: 1–17.

Anselin

(1995) Local indicators of spatial association – LISA. Geographical Analysis 27: 93–115.

Anselin

Syabri

Kho

(2006) GeoDa: An introduction to spatial data analysis. Geographical Analysis 38(1): 5–22.

Anselin

Rey

(2014) Modern Spatial Econometrics in Practice. Chicago, IL: GeoDa Press LLC.

Barabási

A-L

Albert

(1999) Emergence of scaling in random networks. Science 286: 509–512.

Barthélemy

(2011) Spatial networks. Physics Reports 499: 1–101.

10.

Batty

(2013) The New Science of Cities. Cambridge, UK: MIT Press.

11.

Boeing

(2017) OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems 65: 126–139.

12.

Brin

Page

(1998) The anatomy of a large scale hypertextual web search engine. Computer Networks and ISDN Systems 30: 107–117.

13.

Chin

Wen

(2015) Geographically modified PageRank algorithms: Identifying the spatial concentration of human movement in a geospatial network. PLoS One 10: e0139509.

14.

De Montis

Caschili

Chessa

(2013) Recent developments of complex network analysis in spatial planning. In: Scherngell

(ed) The Geography of Networks and R&D Collaborations. Cham: Springer International Publishing, pp.29–47.

15.

Dhanani

Vaughan

Ellul

, et al. (2012) From the axial line to the walked line: Evaluating the utility of commercial and user‐generated street network datasets in space syntax analysis. In: Proceedings: eighth international space syntax symposium (eds M Greene, J Reyes and A Castro), Santiago de Chile.

16.

El-Geneidy

Levinson

(2011) Place rank: Valuing spatial interactions. Networks and Spatial Economics 11: 643–659.

17.

Fujita

Krugman

Venables

, et al. (1999) The Spatial Economy: Cities, Regions and International Trade. Wiley Online Library.

18.

Gao

Wang

Gao

, et al. (2013) Understanding urban traffic-flow characteristics: A rethinking of betweenness centrality. Environment and Planning B: Planning and Design 40: 135–153.

19.

Hagberg

Swart

Schult

(2008) Exploring Network Structure, Dynamics, and Function Using NetworkX. Los Alamos, NM: Los Alamos National Lab.

20.

Haining

(2003) Spatial Data Analysis: Theory and Practice. Cambridge, UK: Cambridge University Press.

21.

Haklay

(2010) How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design 37: 682–703.

22.

Hansen

(1959) How accessibility shapes land use. Journal of the American Institute of Planners 25: 73–76.

23.

Hillier

Hanson

(1989) The Social Logic of Space. Cambridge, UK: Cambridge University Press.

24.

Iacono

Levinson

El-Geneidy

(2008) Models of transportation and land use change: A guide to the territory. Journal of Planning Literature 22: 323–340.

25.

Jiang

(2006) Ranking spaces for predicting human movement in an urban environment. International Journal of Geographical Information Science 23: 823–837.

26.

Krugman

(1996) Urban concentration: The role of increasing returns and transport costs. International Regional Science Review 19: 5–30.

27.

Moran

PAP

(1950) Notes on continuous stochastic phenomena. Biometrika 37: 17–23.

28.

Newman

(2008) The mathematics of networks. The New Palgrave Encyclopedia of Economics 2: 1–12.

29.

Openshaw

Taylor

(1979) A million or so correlation coefficients: Three experiments on the modifiable areal unit problem. In: Wrigley

(ed) Statistical Applications in Spatial Sciences. London: Pion, pp.127–144.

30.

Porta

Strano

Iacoviello

, et al. (2009) Street centrality and densities of retail and services in Bologna, Italy. Environment and Planning B: Planning and Design 36: 450–465.

31.

Sevtsuk

Mekonnen

(2012) Urban network analysis. Revue Internationale de Géomatique–N 287: 305.

32.

Simon

(1962) The architecture of complexity. Proceedings of the American Philosophical Society 106: 467–482.

33.

Thünen

JHV

(1826) Der isolierte Staat. Beziehung auf Landwirtschaft und Nationalökonomie.

34.

Timmermans

(2003) The saga of integrated land use-transport modeling: How many more dreams before we wake up? In: Tenth international conference on travel behaviour research, Lucerna.

35.

Waddell

Borning

Noth

, et al. (2003) Microsimulation of urban development and location choices: Design and implementation of UrbanSim. Networks and Spatial Economics 3: 43–67.

36.

Wilson

(2000) Complex Spatial Systems: The Modelling Foundations of Urban and Regional Analysis. London, UK: Pearson Education.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.48 MB

Preferential centrality – A new measure unifying urban activity,attraction and accessibility

Abstract

Keywords

Introduction

Theory

Background

Urban activity

Local characteristics

Accessibility and centrality

Closing the loop from activities to flows and back again to activities

From activity to spatial interaction

From spatial interaction back to activity

Interaction function

Data

Methods

Data preparation

Spatial entities

Connection between road network and zones

Zonal weights – Local characteristics

Implementation of the activity model

Spatial regression

Preparation of the spatial regression analysis data

Weight matrix creation

Investigating spatial dependencies

Ordinary least squares (OLS) with spatial diagnostics

Comparative indicators for model fitness and validity

Software

Results

Model validity and fitness

Other statistical tests on the preferential spatial error model

Sensitivity analysis

Discussion of results

Comparing the model versions

Remaining challenges

Conclusions and ways forward

Supplemental Material

Supplemental material for Preferential centrality – A new measure unifying urban activity, attraction and accessibility

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

Supplemental material

Note

References

Supplementary Material