Sage Journals: Discover world-class research

Abstract

Racism, discriminatory practices, institutional bias, and systematic exclusion can take lasting physical form in the built environment. There has been growing attention to the long-term consequences of housing policies and practices on social and economic outcomes, including residential segregation, but comparatively less attention to other aspects of the built environment, such as road networks and spatial (dis)connectivity. In this article, the authors introduce a novel method that constructs counterfactual road networks by identifying missing road segments that would be expected to exist in a city’s road network, given the surrounding infrastructure. The authors demonstrate the empirical application of the method by analyzing differences in racial composition and residential segregation for the observed and counterfactual road networks of five U.S. cities. The authors find that unexpected disconnectivity in a city’s road network is associated with greater differences in the racial composition of nearby areas and higher levels of segregation at the local and city levels. The present findings suggest that road networks warrant more attention as a factor that may contribute to the persistence of segregation.

Keywords

segregation race built environment measurement spatial analysis networks

Racism, discriminatory practices, institutional bias, and systematic exclusion can take lasting physical form in the built environment. Prominent examples include the segregationist mortgage lending policies, including redlining, that contributed to White suburbanization and Black ghettoization (Faber 2020; Jackson 1985; Rothstein 2017), and the interstate highway construction and urban renewal projects that razed neighborhoods, displaced Black residents, and systematized the physical divisions between racial groups (Connerly 2002; Schindler 2015). Such practices have lasting power, in part, because they take physical form: rather than requiring shared understandings among actors or the legal enforcement of covenants, racism and inequality can become embedded in the urban infrastructure. Their consequences can persist for generations, resulting in long-standing inequalities and enduring patterns of racial residential segregation (Faber 2020; Rothstein 2017).

In this article, we seek to understand the relationship between residential segregation—the extent to which social groups reside in distinct places—and a key yet often overlooked feature of the built environment, road network connectivity. Connectivity is an important consideration in understanding segregation because two residential areas may be spatially proximate but not well connected by roads, and connectivity may be more important than mere proximity in explaining racial segregation patterns (Grannis 1998). Road disconnectivity and physical barriers, such as fences, walls, railroad tracks, highways, and dead-end streets, have been used as mechanisms to reinforce or exacerbate segregation by facilitating greater separation between ethnoracial groups in nearby areas (Jackson 1985; Mohl 2008; Schindler 2015; Sugrue 2005). Examples of this phenomenon include the selection of routes for interstate highways built during the 1950s and 1960s in cities such as Chicago, Atlanta, and Houston (Feagin 1988; Mohl 2008), as well as more recent instances of using dead-end streets, bollards, and fences in cities such as Baltimore and Detroit (Armborst, D’Oca, and Theodore 2015).

To study the relationship between segregation and (dis)connectivity, we develop a novel computational approach, the counterfactual road networks method, that conceptualizes roads as spatial networks and identifies missing road segments that we would expect to exist in a city’s road network given the surrounding infrastructure. We demonstrate the application of our approach with three analyses in five U.S. cities.

First, we analyze the relationship between residential segregation and road network connectivity at the city level. We find that the missing road segments we would most expect to exist are associated with the largest differences in segregation.

Second, we examine the racial composition of nearby areas that would be connected by missing road segments. We find that unexpected disconnectivity in a city’s road network is associated with greater differences in the racial composition of nearby areas. Road segments we would expect to exist given the surrounding infrastructure are more likely to be missing between areas with different racial compositions.

Third, we construct counterfactual road networks that include missing road segments, and we use the spatial proximity and connectivity (SPC) method (Roberto 2018) to compare the segregation measured using the observed and counterfactual road networks. We find that unexpected disconnectivity is associated with significantly higher levels of segregation in local areas of missing road segments and at the city level. The results suggest these highly likely but nonetheless missing road segments facilitate both social and spatial disconnection.

Our approach does not establish causality or imply the precedence of the road network, nor does it adjudicate which came first: residential segregation that shaped the structure of the road network, or road network connectivity that shaped where people live. We use this approach not to make recommendations about road construction, but to examine the significance of unexpected missingness. Our results underscore the power of the built environment and suggest that infrastructure decisions can have long-term social consequences. Overall, our approach enables further research to uncover the inequalities embedded in urban infrastructure and examine the consequences of racist infrastructure.

Background

Scholarship on the built environment and residential segregation has tended to focus on historical and contemporary policies and processes related to housing. Studies have examined how segregation has changed alongside historical processes of redlining (Faber 2020; Rothstein 2017) and suburbanization (Fischer 2008; Hayden 2003; Logan et al. 2023; Massey and Denton 1988b), as well as contemporary processes of gentrification (Ding, Hwang, and Divringi 2016; Freeman 2009; Hwang 2020; Hwang and Sampson 2014; Lees, Slater, and Wyly 2008; Smith 1979, 1987; Zukin 1987). In particular, there has been growing attention to the consequences of housing-related inequalities related to housing values, housing location, and mortgage lending on social and economic outcomes, including residential segregation (Faber 2018, 2019; Hwang, Hankinson, and Brown 2015; Hyra et al. 2013; Korver-Glenn 2022; Owens 2019; Rugh, Albright, and Massey 2015; Spielman and Harrison 2013).¹

This scholarship has generated insights on housing policies and processes and their lasting effects on segregation, but other aspects of the built environment remain understudied and undertheorized. In particular, few scholars have considered how roads or road networks are related to segregation (for notable exceptions, see Archer 2020; Bayor 1988; Grannis 1998, 2005; Korver-Glenn et al. 2024; Roberto 2018; Roberto and Korver-Glenn 2021). And when they are considered, studies often focus on particular streets or highway segments, rather than conceptualizing roads as a spatial network. We seek to contribute to this nascent area of scholarship by developing a computational approach for understanding the relationship between residential segregation and road network connectivity.

Road Connectivity and Residential Segregation

In urban policies and planning, roads are typically regarded as connectors that facilitate the movement of goods and people between locations. From this perspective, roads may seem to be neutral elements of the urban landscape. But as Harvey (1973:61) noted, changes in urban form and functions (including building roads and highways) are not simply “natural” manifestations of changing urban demands.

Road networks have important social dimensions. First, the placement of roads may provide better connectivity between some areas and residents than others. Second, disconnected roads, such as dead-end streets, can facilitate separation and division between nearby areas. Third, constructing or expanding roads in urban areas often requires land that is already in use and decisions about which residents and businesses to displace or protect. Fourth, once roads are constructed, their change or removal can be slow and costly and may require institutional action, hence becoming one of the most static elements of the built environment. Finally, roads can carry symbolic meaning, such as being a commercial destination or a well-known boundary between neighborhoods.

Roads should thus be regarded as both a structural element of the built environment and part of the social fabric of a city. Moreover, roads are an important consideration in understanding segregation patterns and processes. In what follows, we first describe how previous scholarship has considered these social dimensions of roads by conceptualizing them as boundaries (also referred to as edges or dividing lines) or as sites. We then argue for the need to conceptualize roads as networks and explain how attention to road networks and connectivity can contribute to our understanding of the relationship between segregation and the built environment.

Roads as Boundaries, Roads as Sites

Scholarship on residential segregation has considered roads in two distinct ways: roads as boundaries and roads as sites. In conceptualizing roads as boundaries, scholars have pointed out how roads can both connect and divide. More specifically, scholars highlight how highways have cut through neighborhoods and communities, displacing some groups and isolating others (Mohl 2008). In doing so, some highways form the “edges” of neighborhoods, including defining the physical border of “ghettos” and separating them from the rest of the city, further isolating populations that were already among the most marginalized (Wacquant and Wilson 1989). In some cities, such as Birmingham and Atlanta, highways were constructed to mirror the boundaries of racial zoning (Archer 2020).

Although highways created great opportunities for commuting and being “connected” to the city for suburban residents (mostly middle- and upper-class White families), some city residents recognized their detrimental effects on neighborhoods and communities, including creating separation between nearby areas. By the 1960s, residents in many cities began to organize and resist the construction of planned highway routes. Starting in San Francisco and spreading to other cities, these highway (or freeway) revolts affected future highway plans, including the paths of highways and the cancellation of some projects (Mohl 2008).

The construction of highways had other spatial and functional consequences. Not only did highways cut through the social fabric of cities and neighborhoods, they were also part of the complex process of suburbanization and the emergence of malls as the main site of consumer shopping (Hayden 2003). This process destabilized and sometimes completely destroyed commercial streets within cities, turning “soulful” streets at the center of urban life to soulless spaces (Wacquant and Wilson 1989).

In the following decades, in some cities, the disinvestment and neglect of inner-city neighborhoods were somewhat reversed through gentrification and urban programs that sought to reinvent cities. In this process, some streets emerged as sites of investments through which urban governance and private capital tried to redefine land use and social space. In these efforts, the emphasis was not on the vehicular speed and movement efficiency of roads. Instead, a mixed-use conceptualization of roads and a return to the street as the site of commercial activity, social life, and human connection were highlighted. Commercial streets became sites to implement urban policies that hoped to attract new residents, businesses, and tourism (Deener 2007; Zukin 2009). But although some commercial streets became social hubs, others became the symbolic boundaries of residential neighborhoods, dividing residents of different races and classes or separating gentrifiers from old-timers (Anderson 1990; Lloyd 2006; Rabin 1987).

Roads as Networks: A New Approach

Prior work has studied how roads become physical manifestations of the social boundaries between groups. Scholars have considered how roads carry symbolic meaning and influence residents’ perceptions and social relations (Korver-Glenn et al. 2024). Scholars have also studied how roads, as symbolic boundaries, shape residents’ interactions (Anderson 1990; Deener 2007) and how rates of crime or conflict change in areas of transition between neighborhoods (Kim and Hipp 2016; Legewie 2018; Legewie and Schaeffer 2016). Such studies conceptualize roads as both boundaries (e.g., the borders of neighborhoods) and sites (e.g., of crime or conflict). This work recognizes both the physical and symbolic nature of roads and their capacity to influence social relations. Both methodologically and empirically, these studies link the social and physical elements of urban environments.

However, whether on the scale of residential streets or highways, roads are part of a broader network. We argue that analyzing road networks offers insights that go beyond studying roads as boundaries or sites. It allows us to foreground social processes and institutions that create road network connectivity and the mechanisms through which patterns of connection and disconnection are reproduced or reconfigured over time. We reject assumptions that conceive of the road network as neutral or in the background, and we examine, theoretically and empirically, how road connectivity matters in patterns of segregation.

We highlight the importance of road networks as connectors by examining the connectivity provided by a city’s road network. We also consider how road segments that may be missing from the network facilitate disconnection between nearby areas. We show that decreases in the connectivity function of road networks are associated with increases in the possibility of segregation. Through a study of roads as networks, rather than boundaries or sites, this article expands our understanding of the separating and connecting power of roads.

In the following section, we introduce our computational approach for identifying road segments that one would expect to exist in a city’s road network, given the surrounding infrastructure, but are nonetheless missing. We then demonstrate the application of our approach by analyzing the relationship between residential segregation and road network connectivity at the city level. We continue with two local analyses: first, we compare the racial composition of nearby, disconnected areas where there is a missing road segment; we then compare the segregation measured using the observed road network and counterfactual road networks that include missing road segments.

Counterfactual Road Networks

One of the most common problems in network topology inference is that of edge propensity or link prediction, that is, which among the missing edges is the most probable to materialize given the current structure of the network (Liben-Nowell and Kleinberg 2007; Popescul and Ungar 2003). To answer this question, one must rely on some modeling assumption about the growth of the underlying network and the observation of the current state of the network. There is an interesting technical challenge in the fact that commonly used edge propensity scores (e.g., the Jaccard index or the Adamic and Adar score [Adamic and Adar 2003]) are more relevant for social networks than for urban road networks. Such metrics are rooted in underlying social mechanisms of edge formation, such as homophily (McPherson, Smith-Lovin, and Cook 2001) or the strategic bridging of “structural holes” between unconnected nodes (Burt 2009).

In the absence of existing methods, we developed a novel approach for estimating edge propensity in road networks: the counterfactual road networks (CRN) method. Our goal is to identify road segments (or edges) that, although missing from the network, one would expect to be present given the surrounding infrastructure. A key challenge in this process is that the space of all possible road network configurations is vast, and exhaustively exploring the space would be computationally intractable. Because network growth problems, particularly those involving connectivity optimization, are often NP-hard,² one must use heuristics and approximations to constrain the space of candidate road segments. Our approach leverages structural properties of the observed network to guide this selection, allowing us to construct a counterfactual representation of the road network on the basis of plausibility, rather than a combinatorial enumeration of all possibilities.

To achieve this, we first characterize the existing road network using a variety of features, namely, the maximum straight-line distance (i.e., the maximum distance between two intersections directly connected by a road), shadow angle threshold (i.e., the typical level of collinearity between connected nodes), and neighbor angle threshold (i.e., the typical angles in connected V-shaped motifs). We use these features, along with road classifications, to constrain the set of new edge “candidates” (i.e., road segments that are not currently in the network but could plausibly be added). The nodes (or vertices) in the network are the intersections or end points of roads. We treat the nodes as a fixed attribute of the network—we do not add or remove nodes from the network.

We assess the utility of each candidate edge in terms of how much connectivity it contributes to the network. We measure this as the reduction in the shortest path length between nodes that are nearby if the road segment is added to the network.³ This provides a heuristic for identifying counterfactual edges that are not merely plausible but also functionally impactful. We use this utility as our metric of edge propensity with the assumption that edges with higher utility have a higher likelihood (propensity) of being included in the network, absent additional considerations. We validate this assumption for the cities under study by comparing the utility of existing edges against that of potential candidates that could replace them.

We define three networks for use throughout the following sections:

G₀ includes 10 road types from the full road network defined by the U.S. Census Bureau (see Table 1) and excludes the following road types: walkway or pedestrian trail, stairway, bike path or trail, and bridle path.

G₁ is the largest connected component of G₀. This is the main network used for measuring road network connectivity, edge propensity, and segregation.

G₂ excludes two road types from G₁: primary road and ramp. We use G₂ when calculating edge characteristics, selecting edge candidates and compatible edge candidates, and validating our measure of edge propensity. In each of these cases, we use G₂ because we are concerned with generating counterfactual road segments that mimic local roads by enhancing local connectivity. The local roads included in G₂ provide a more reasonable basis for evaluation and comparison than G₁, which includes highway segments.

Table 1.

Proportion of Road Segments in the Network, G₀, by Type.

	Primary Road (S1100)	Ramp (S1630)	Secondary Road (S1200)	Local Road^a (S1400)	Vehicular Trail (S1500)	Service Drive^b (S1640)	Alley (S1730)	Private Road (S1740)	Private Driveway (S1750)	Parking Lot Road (S1780)	Total Road Segments
Hartford, CT	.06	.03	.04	.86	.001	0	0	.001	0	.003	3,316
Rochester, NY	.03	.03	.07	.86	0	.004	0	.01	.0003	.001	8,592
Cincinnati, OH	.02	.04	.08	.86	0	.0001	.002	.001	0	0	15,398
Baltimore, MD	.02	.01	.07	.90	0	.001	.0003	.0003	0	.0001	38,553
Philadelphia, PA	.02	.02	.12	.84	.0001	.001	.0003	.0004	.0001	.0001	47,052

Note: The proportion of road segments by type is nearly identical for the G₀ and G₁ road networks. The following road types are excluded from the road networks: walkway or pedestrian trail (S1710), stairway (S1720), bike path or trail (S1820), and bridle path (S1830).

Local neighborhood road, rural road, or city street.

Service drive usually along a limited access highway.

Observed Road Network

We construct the observed road network data using publicly available geographic data provided in TIGER/Line shapefiles (U.S. Census Bureau 2012). We use the TIGER/Line shapefiles for “edges” to define the path of roads, where we restrict our analysis to edge entries that represent road segments. Each edge is assigned a permanent unique identifier (UID) by the Census Bureau. The two end points of an edge are called nodes, and each node also has a UID. A single node may be associated with multiple edges, such as a node that joins together two road segments. The data record for each edge includes the edge’s UID, the two node UIDs for its end points, and the classification code for the type of road feature (e.g., primary road, local road, alley). (For more information on the data construction, see Roberto 2018.)

Given a city of interest, we represent its road network as a graph G₀, with edges representing road segments and nodes representing road intersections. We consider a subset of 10 road types defined by the Census Bureau in constructing G₀ (see Table 1).⁴ Not all road types contribute equally to the road network, with primary roads (S1100), ramps (S1630), secondary roads (S1200), and local roads (S1400) accounting for 99.7 percent of total road segments in the five cities presented. According to Census Bureau definitions, primary roads are generally divided, limited-access interstate highways that are accessible by ramps, often in the form of a cloverleaf interchange. Secondary roads are main arteries that are usually part of a highway system (U.S., state, or county) and have at-grade intersections with other roads and driveways. Local roads are generally paved, nonarterial roads that typically have a single lane of traffic in each direction; this includes neighborhood roads, rural roads, and city streets (U.S. Census Bureau 2012).

To rule out outliers and avoid the effect of a few road segments disconnected from the main networks in our analysis, we keep the largest connected component of G₀ and we denote it by G₁. This excludes a small proportion of populated nodes, amounting to less than 0.1 percent of a city’s population.

Computation of Road Network Features

We compute global descriptors of the road network under study to establish characteristics of roads that are already present in a city. Previous research has used a variety of features to characterize road networks (Boeing 2021, 2022; Jiang and Claramunt 2016; Knaap and Rey 2023; Louf and Barthelemy 2014). Such studies have typically classified types of street networks, developed measures of street network design, or created indicators for related concepts, such as sprawl. Our aim is to identify new edge candidates that do not drastically differ from existing roads. As such, we now propose characteristics of road networks that we will later use to create minimum criteria that new edges must satisfy to be considered edge candidates.

We first remove the primary road (S1100) and ramp (S1630) edge types from G₁ to obtain G₂, while keeping the node set unchanged. (See Table 2 for a comparison of the number of edges and nodes included in the full network, G₀, G₁, and G₂.) We remove these road types when calculating edge characteristics to identify candidate edges that provide connectivity within small geographic areas, rather than highway segments that provide longer distance connectivity. We denote the node set and the edge set of G₂ by V₂ and E₂, respectively.

Table 2.

Count of Edges and Nodes in the Road Networks.

		Edges	Nodes
Hartford, CT	Full network	3,319	3,250
	G ₀	3,316	3,247
	G ₁	3,300	3,231
	G ₂	2,972	2,904
Rochester, NY	Full network	8,597	8,527
	G ₀	8,592	8,522
	G ₁	8,494	8,429
	G ₂	7,970	7,912
Cincinnati, OH	Full network	15,404	15,238
	G ₀	15,398	15,232
	G ₁	15,213	15,052
	G ₂	14,327	14,181
Baltimore, MD	Full network	38,631	38,070
	G ₀	38,553	37,994
	G ₁	38,498	37,944
	G ₂	37,221	36,698
Philadelphia, PA	Full network	47,094	46,633
	G ₀	47,052	46,595
	G ₁	46,871	46,415
	G ₂	45,232	44,789

Note: G₀ excludes the following road types from the full network: walkway or pedestrian trail (S1710), stairway (S1720), bike path or trail (S1820), and bridle path (S1830). G₁ is the largest connected component of G₀. G₂ removes the following road types from G₁: primary road (S1100) and ramp (S1630).

We compute the following three global features of G₂:

Maximum straight-line distance d_max: Denote by d(v_i, v_j) the straight-line distance between nodes v_i and v_j. We compute the straight-line distance between every pair of nodes connected by an edge, and identify the maximum value d_max = max{d(v_i, v_j)|(v_i, v_j) ∈ E₂} (measured in meters).

Fifth-percentile shadow angle α: For any three (distinct and ordered) nodes (v_i, v_j, v_k), the shadow angle is defined as the angle (in the range 0°–180°) between the vectors v_i→v_k and v_k→v_j, which we denote by ∠_shad(v_i, v_j, v_k). Figure 1 shows an example of such vectors, where node v_i is directly connected to v_j and v_k, and v_k is closer to v_i than v_j. For every node v_i in the network and every node v_j connected to v_i, we compute ∠_shad(v_i, v_j, v_k) for every node v_k whose straight-line distance to v_i is less than the straight-line distance between v_i and v_j. All these values can be written into the set {∠_shad(v_i, v_j, v_k)|v_i ∈ V₂, (v_i, v_j) ∈ E₂, 0 < d(v_i, v_k) < d(v_i, v_j)}, and we define α as the fifth percentile of them, that is, the angle value for which only 5 percent of the shadow angles are smaller. Note that a very small (close to zero) shadow angle would imply that v_i, v_k, and v_j are almost colinear, with v_k located between v_i and v_j nearly in line with the direct connection from v_i and v_j. Small shadow angles such as this should be rare, and α captures a typically small value (i.e., it is unlikely to observe smaller values) for a shadow angle specific to the network under study.

Fifth percentile neighbor angle β. For any three nodes (v_i, v_j, v_k), the neighbor angle is defined as the angle (in the range 0°–180°) between the vectors v_i → v_j and v_i → v_k, which we denote by ∠_neigh(v_i, v_j, v_k) (see Figure 1). We compute ∠_neigh(v_i, v_j, v_k) for every pair of edges (v_i, v_j) and (v_i, v_k) sharing one end point v_i. All these values can be collected into the set {∠_neigh(v_i, v_j, v_k)|v_i ∈ V₂, (v_i, v_j), (v_i, v_k) ∈ E₂}, and we define β as the fifth percentile of them. Road segments meeting at very small angles is atypical in most cities, and β captures a reasonable lower bound on such angles that is specific to the network under study.

Figure 1.

Illustration of different angles considered, centered on v_i.

The maximum distance d_max captures the scale of the road network under study; the other two measures encode how gridlike the network is. These characteristics help us generate new edges that are consistent with those already present in the city.

Selection of Edge Candidates

The vast majority of disconnected pairs of nodes in G₂ would constitute unreasonable edges in a counterfactual road network. Indeed, consider two road intersections that are not directly connected and are two miles apart with plenty of road intersections between them. It would be extremely unlikely to see a road directly connecting these two points in a real-world city. Thus, we leverage the network features computed in the prior section to winnow out unreasonable potential edges and form our set of edge candidates. Precisely, we deem a pair of disconnected nodes (v_i, v_j) in G₂ as an edge candidate if it satisfies all the following conditions:

Condition 1: At least one of the nodes v_i or v_j is incident to at least one edge that is a secondary road (S1200) or local road (S1400).

Condition 2: The straight-line distance between v_i and v_j is no greater than d_max, that is, 0 < d(v_i, v_j) ≤ d_max.

Condition 3: The shadow angles ∠_shad(v_i, v_j, v_k) for all nodes v_k satisfying 0 < d(v_i, v_k) < d(v_i, v_j) are no smaller than α, or the shadow angles ∠_shad(v_j, v_i, v_k) for all nodes v_k satisfying 0 < d(v_j, v_k) < d(v_i, v_j) are no smaller than α.

Condition 4: The neighbor angles ∠_neigh(v_i, v_j, v_k) for all existing edges (v_i, v_k) incident to v_i are no smaller than β, and the neighbor angles ∠_neigh(v_j, v_i, v_k) for all existing edges (v_j, v_k) incident to v_j are no smaller than β.

These rules ensure the selected edge candidates behave similar to existing edges. The first condition requires at least one of the nodes to be associated with a secondary or local road. These road types are associated with residential streets and are the main sources of connectivity within small geographic areas. The second condition encodes the fact that, in general, we are unlikely to see a road between two nodes that are far from each other. Instead, we tend to observe several road segments connecting these two nodes via some intermediate nodes. For the third rule, if there exists a node v_k closer to v_i than v_j and the shadow angle is small, then the existence of two road segments between (v_i, v_k) and (v_k, v_j) is more likely than a single road segment connecting v_i and v_j. The fourth condition encodes the fact that the existence of two road segments (v_i, v_k) and (v_i, v_j) pointing in almost the same direction is unlikely.

Computation of Edge Utility Scores

Our next objective is to assign a propensity score to each of the candidate edges identified in our previous step. We develop a measure of the propensity, or utility, of a particular edge candidate by computing the (normalized) change of the average shortest path length in the candidate’s local environment when the candidate edge is included in the network. The local environment of each edge includes its end points, as well as all nodes within a particular distance of the end points (here we use a distance of 0.5 km). We use G₁ for these calculations to account for all connectivity provided by local roads as well as primary roads and ramps.

For every edge candidate e = (s, t), we adopt the following procedure to compute its utility score:

Step 1: Identify a subset of nodes (denoted by V_e) in the road network G₁ whose straight-line distance to s or t is no greater than a predefined local environment reach r₁ (V_e includes s and t).

Step 2: Construct the subgraph G_e induced in G₁ by V_e.

Step 3: Add e = (s, t) to G_e to form a new graph Ĝ_e and set the corresponding road length as d(s, t).

Step 4: Compute the shortest path lengths between every pair of nodes in G_e and Ĝ_e, with edges weighted by their lengths. Denote by $Δ_{G_{e}} (v_{i}, v_{j})$ the shortest path length between nodes v_i and v_j in G_e (similar notation for Ĝ_e). If G_e is a connected graph, skip steps 5 and 6.

Step 5: Add e = (s, t) to G₁ to form a new graph Ĝ₁ and assume the corresponding road length as d(s, t).

Step 6: If any two nodes v_i and v_j are disconnected in G_e, reset $Δ_{G_{e}} (v_{i}, v_{j})$ = $Δ_{G_{1}} (v_{i}, v_{j})$ and $Δ_{{\hat{G}}_{e}} (v_{i}, v_{j}) = Δ_{{\hat{G}}_{1}} (v_{i}, v_{j})$ .

Step 7: Compute the average shortest path length for G_e as $Δ_{G_{e}} = \frac{\sum_{v_{i} \in V_{e}} \sum_{v_{j} \in V_{e}} Δ_{G_{e}} (v_{i}, v_{j})}{| V_{e} | (| V_{e} | - 1)}$ where |V_e| denotes the number of nodes in V_e. Similarly, compute the average shortest path length $Δ_{{\hat{G}}_{e}}$ for Ĝ_e.

Step 8: The utility score of the candidate e is computed as $δ_{e} = \frac{Δ_{G_{e}} - Δ_{{\hat{G}}_{e}}}{d (s, t)}$ , which is nonnegative.

This procedure quantifies the average reduction in the shortest path length in a local environment achieved by including the candidate edge, normalized by the length of the edge. These steps can also be followed to determine the utility of an existing edge e. To do this, we first remove e from the network G₁ and then treat e as a candidate, computing its utility score as described previously.

Figure 2 provides an example of the network produced through the procedure described in the prior sections for the city of Hartford, Connecticut. Edge candidates are sorted by utility: higher utility edges are the top 20 percent of the distribution with a mean of 1.328 (s.d. = 1.532) and a median of 0.831, and lower utility edges are the bottom 80 percent of the distribution with a mean of 0.067 (s.d. = 0.078) and median of 0.035. The higher utility, blue edge candidates tend to be relatively short road segments that drastically increase the local connectivity of the network. Because we normalize by the length of the edge (see step 8), edges that are shorter and create the most connectivity yield the highest utility scores.

Figure 2.

The road network of Hartford, Connecticut.

Identifying Compatible Edge Candidates

Observing the edge candidates identified in Figure 2, we see that several edge candidates share one terminal node. Thus, when augmenting the network, we might include one of these candidates but not all of them, as the realization of one of these edges might invalidate the others as plausible edge candidates. More formally, we say that two edge candidates are compatible with each other if either of these two cases hold: (1) they do not share any end points; or (2) they share one end point v_i such as node pairs (v_i, v_j) and (v_i, v_k), but the neighbor angle ∠_neigh(v_i, v_j, v_k) is no smaller than β (see Figure 1).

With this notion of compatibility defined, we detail our procedure for growing a road network G₁. Given a set of edge candidates (denoted by C), we add them to G₁ according to the following steps:

Step 1: We denote by $\hat{C}$ the set of edge candidates to be selected. Initialize $\hat{C} = C$ .

Step 2: Add the edge candidate in $\hat{C}$ with the highest utility score to G₁ and then remove it from $\hat{C}$ .

Step 3: Remove the edge candidates from $\hat{C}$ that are incompatible with the newly added edge.

Step 4: Repeat steps 2 and 3 until $\hat{C}$ becomes empty.

Put differently, in the above procedure we sort the edge candidates by utility and add them in an ordered fashion to G₁, making sure that any potential candidate that is not compatible with an edge already added is discarded. In the section “Global Analysis of Segregation,” we calculate segregation using counterfactual road networks obtained through this procedure and compare that with segregation measured with the observed road network.

Comparison between Existing Edges and Edge Candidates

Before implementing the procedure described in the prior sections, we validate our choice of edge utility score. To do this, we consider a measure of edge propensity to be reasonable if existing edges tend to receive higher scores than competing edge candidates. (Edge candidates are considered to be competing with an existing edge if they are nearby and provide similar connectivity, as defined formally below.) In other words, we deem the chosen utility to be meaningful if it can be used to significantly differentiate between the actual edges that exist in the network and competing edge candidates. Thus, in what follows we compare the utility scores of existing edges and edge candidates to see whether they meet such a requirement.

A simple way to evaluate our measure of edge propensity is to compare the average utility score of existing edges (in G₂) and that of edge candidates. In Hartford, for example, the average utility scores of existing edges and edge candidates are 0.941 and 0.320, respectively (see Tables A1 and A2 in the Appendix).

For a more detailed validation procedure, we perform the following experiment. We denote by C the set of edge candidates associated with G₂ as defined in the section “Selection of Edge Candidates.” For every existing edge e ∈ E₂, we undertake the following steps. First, we remove e from G₂ and update the set of edge candidates (denoted by $\hat{C}$ ) accordingly. To do this, we recheck condition 4 in the section “Selection of Edge Candidates.” In other words, we update $\hat{C}$ from C by adding the node pairs that were previously excluded because they were not compatible with e. Second, we identify a subset of edge candidates in $\hat{C}$ that are potential alternatives of e (denoted by C_e). Here we consider two schemes: (1) the edge candidate shares one end point with e, or (2) the candidate appears in $\hat{C}$ but not in C. Finally, we compute the utility scores of e as well as the candidates in C_e according to the section “Computation of Edge Utility Scores” and check if the utility score of e is greater than all the utility scores of candidates in C_e. We exclude from this analysis the (few) edges e whose deletion from G₁ would result in a disconnected network.

Figure 3 shows the results of this validation process for both selection schemes for the competing edge candidates. For every existing edge, we have a histogram of the size of the set of competing candidate edges (blue) and a histogram of the cases in which the existing edge has a larger utility than all its competing edge candidates (orange). A large fraction of existing edges do not have alternative candidate edges (x-axis = 0), and existing edges tend to have larger utility scores than their alternatives, which validates the proposed edge propensity measure. For instance, if we focus on Figure 3a, 732 of the existing edges have a single competing candidate edge, and in 513 of these cases, the utility of the existing edge is greater than that of the alternative. The fact that the proportion of the latter to the former is well above 50 percent is a strong indication the chosen utility carries information about the existence of edges in real-world cities and is a reasonable measure of edge propensity. The other four cities show similar results.

Figure 3.

Utility validation experiment for Hartford, Connecticut: (a) candidate edge shares one end point with e and (b) candidate edge appears in $\hat{C}$ but not in C.

Racial Residential Segregation in Observed and Counterfactual Road Networks

To demonstrate the application of our proposed method, we analyze segregation in five cities in the rust belt region of the United States: the former “industrial belt that extended from New England across New York, Pennsylvania, and West Virginia, through the Midwest to the banks of the Mississippi” (Sugrue 2005:6). Rust belt cities have well-documented histories of residential segregation, which reached particularly high levels during the midtwentieth century when these cities peaked in urban growth and has been more resistant to change than other regions (Logan 2000). Concurrent with these legacies of segregation, policy and planning decisions that shaped the built environments of these cities, such as where to build interstate highways or public housing, were often made in response to racial tensions (Mohl 2008; Schindler 2015; Sugrue 2005). We selected five cities that capture a range of sizes: Philadelphia, Pennsylvania, and Baltimore, Maryland, are among the largest rust belt cities (population > 500,000); Cincinnati, Ohio, is a medium-sized city (population between 250,000 and 500,000); and Hartford, Connecticut, and Rochester, New York, are smaller cities (population between 100,000 and 250,000). (Table A4 provides a summary description of each city.)

We use the SPC method (Roberto 2018) to measure and analyze segregation. The SPC method incorporates spatial features of the built environment, including the road connectivity and physical barriers between locations, into the measurement of segregation. We use publicly available population data from the 2010 decennial census (U.S. Census Bureau 2011) and the TIGER/Line shapefiles for blocks and roads (U.S. Census Bureau 2012).⁵ Following the SPC method, we use the permanent UIDs assigned by the Census Bureau to establish the relationships between roads (and their associated nodes) and blocks. We then distribute the aggregate population of a block by assigning a portion of the block population to each of the nodes associated with the block. Following this procedure, we estimate the population count and composition for each node in the road network.

We measure distance by calculating the shortest path length (weighted by the length of each road segment) along the road network between all pairs of nodes. We use the road distance measure to construct local environments, or “egocentric neighborhoods” (Lee et al. 2008), around each node.⁶ The reach of local environments (i.e., the distance in each direction from a given node that we denote by r₂) used to measure segregation in this analysis is 0.5 km, which approximates the area of a small pedestrian neighborhood (Donaldson 2013; Lee et al. 2008).⁷ We construct separate sets of local environments using the distance measures derived from the observed and counterfactual road networks. Next, we calculate the population composition in the local environment of each node for each set of local environments. We use this information to measure the level of segregation for each local environment and for the city as a whole, with separate sets of results for the observed and counterfactual road networks.

To measure the level of segregation, we use the divergence index (Roberto 2024), which measures the difference between the population composition of each local environment and the city’s overall composition. The values of the divergence index represent how surprising the composition of a local environment is, given the overall population composition of the city. The divergence index equals zero, its minimum value, when there is no difference between the local and overall population composition; greater differences produce higher values and indicate a greater degree of segregation. Local values of the divergence index will reach their maximum value when the smallest group in a city is 100 percent of the local population.

The divergence index for node i’s local environment with a reach of r is

${\tilde{D}}_{ri} = \sum_{m} {\tilde{π}}_{rim} \log \frac{{\tilde{π}}_{rim}}{π_{m}},$

where π_m is group m’s proportion of the city’s overall population, and ${\tilde{π}}_{rim}$ is group m’s proportion of node i’s local environment population with reach r.⁸ The divergence index for the city for a given reach is the population weighted mean of the divergence index for all nodes, calculated as

${\tilde{D}}_{r} = \sum_{i} \frac{τ_{i}}{T} {\tilde{D}}_{ri},$

where T is the city’s overall population count, and τ_i is the population count of node i.

The divergence index measures the same concept of segregation as the dissimilarity index. Both indexes measure the evenness dimension of segregation (Massey and Denton 1988a) by comparing the residential distribution of groups to an even distribution in which groups are distributed proportionally across residential environments (for more details about the divergence index, see Roberto 2024). In the following sections, we focus on the segregation of Black, Hispanic, and White residents in each city.⁹

In summary, the SPC method measures the distance between locations in a city along the road network, rather than using straight-line distance as in previous studies, to represent the connectivity of roads and the excess distance imposed by physical barriers. The SPC method systematically analyzes the prevalence of disconnectivity and physical barriers, their association with segregation, and their variation within and across cities. The SPC approach significantly improves on previous measures of spatial segregation by incorporating the connectivity between locations. To examine the relationship between road network disconnectivity and segregation levels, previous studies have compared segregation measures using road network distance and measures using straight-line distance (e.g., Roberto 2018). This is a reasonable comparison because prior spatial segregation measures have relied on straight-line distance to represent proximity. However, as a counterfactual, it would not be realistic for us to expect all intersections to be connected by straight-line roads. The SPC method also does not resolve questions about where in a city disconnection occurs or where it is most surprising. These observations motivated the development of the CRN method to be used in conjunction with the SPC method.

Global Analysis of Segregation

In this section we consider whether adding connectivity to a road network is associated with changes in a city’s level of segregation, and whether the lack of connection facilitated by higher utility missing road segments may facilitate higher levels of segregation. We do this by measuring segregation with the counterfactual road network and comparing it to a control scenario that adds connectivity to the road network without regard for edge propensity.

We calculate segregation using the observed road network G₁ and compare that with segregation measured for the counterfactual road network, where edge candidates are added in sets of 10 (i.e., 10, 20, 30, etc.). These additions are performed starting with the highest utility edge candidates, according to the procedure described in the section “Identifying Compatible Edge Candidates.” The corresponding results are plotted in blue for the five cities in Figure 4. Note that the blue dot for zero edge candidates added (at x = 0) corresponds to the segregation index of the observed city (without any added edges). As we add edges, the segregation index tends to decrease.

Figure 4.

Comparison of segregation for utility sorted and random edge candidates added to the road network: (a) Philadelphia, Pennsylvania; (b) Cincinnati, Ohio; (c) Baltimore, Maryland; (d) Rochester, New York; and (e) Hartford, Connecticut.

At the city level, the overall decreases in segregation may seem small (see Table A4); there are three reasons why this may be the case. First, the segregation values are for the whole city, including places where there are no counterfactual road segments and we would not expect any difference between observed and counterfactual segregation. Second, even in areas with counterfactual road segments, not all of them have differences in racial composition in the disconnected areas, so there would be little to no change in the composition of their local environments with our without the counterfactual road segment.

Third, depending on local compositions, there are cases when including a counterfactual road segment in the network increases segregation. This can occur if, hypothetically, area A has a racial composition similar to the city’s overall composition (low segregation), area B has a racial composition very different from the city’s overall composition (high segregation), and there is a counterfactual road segment between area A and area B. When we measure segregation with the counterfactual road segment included in the network, segregation in area B may decrease, but perhaps not by enough to offset the potential increase for area A. Although the city-level decreases may seem small, these results suggest an association between segregation values and unexpected disconnectivity in the road network.

Next, we would like to determine whether the decreases in segregation in Figure 4 are simply due to the increased connectivity of adding edges to the network, or if the decreases are related to our measure of edge propensity. To test this, we compare the segregation calculated for the counterfactual road network, in which sets of edge candidates are added in order of highest to lowest utility, to a control scenario in which edge candidates are added in sets of randomly selected edges. This control scheme is similar to the procedure in the section “Identifying Compatible Edge Candidates,” but in step 2 we add the candidates randomly rather than based on utility scores.

We consider 10 different random controls for each city and plot the results (average and standard deviation of segregation) in orange in Figure 4. The blue curves are significantly different from the orange ones, which indicates a significant difference in segregation when adding edge candidates from highest to lowest utility compared with randomly ordered edge candidates. Note that the blue and orange curves coincide when no edges are added, as the initial, observed city is the same for both cases. Also, once most edge candidates have been added, the curves tend to converge. However, for intermediate numbers of added edges, we see a significant difference between them, indicating that higher edge propensity is related to larger differences in segregation.

Our computation of edge propensity in the section “Computation of Edge Utility Scores” does not include any information about racial composition. Thus, the significant differences in racial segregation shown in Figure 4 are particularly striking. This finding suggests the high propensity missing road segments—roads that are not present in the city but which one would expect—facilitate disconnection, and their absence is associated with higher levels of segregation.

Local Analysis of Racial Composition and Segregation

In the previous section, we considered city-level patterns and found that high propensity missing road segments facilitate unexpected disconnectivity in the road networks, which contributes to higher levels of segregation in each city. We now shift our focus to the local level and analyze differences in racial composition and segregation for local areas within the cities. We examine whether unexpected disconnectivity (because of the absence of an edge candidate) is associated with differences in racial composition between nearby areas and with higher levels of segregation in nodes’ local environments.¹⁰

To conduct this analysis, we use the set of compatible edge candidates included in the counterfactual road network in the prior section. Taking each compatible edge candidate one at a time, we measure the racial composition and segregation for each of its end points, using local environments with a reach of 0.5 km. We measure the racial composition and segregation with and without the edge candidate included in the observed road network. Segregation values represent the average for nodes associated with each compatible edge candidate.¹¹

Local Analysis of Racial Composition

If road disconnectivity and physical barriers facilitate greater separation between ethnoracial groups in nearby areas, we should see bigger differences in the local environment composition of an edge candidate’s end points in the observed road network when the edge candidate is missing, compared with when the edge is included in a counterfactual road network. To examine this, we measure the difference in the racial composition of end points’ local environments with and without the candidate edge included in the road network. We then analyze whether there is a statistically significant difference between the observed and counterfactual differences in racial composition using a paired t test. In other words, we analyze if these disconnected nearby areas become more similar (or different) in their racial composition when the candidate edge is present (or missing).

We find statistically significant differences for all three ethnoracial groups in all five cities (see Table 3). On average, there are larger differences in the local environment composition of an edge candidate’s end points when the edge is missing from the road network. In other words, racial differences between nearby areas are larger when the areas are disconnected. This trend is particularly pronounced for the Black and White populations of Cincinnati and Baltimore and for the Hispanic and White populations of Hartford. Although some of the differences are relatively small, they represent the average over all edge candidates’ end points. The standard deviation of the difference is quite large, indicating sizable variation in the differences among edge candidates.

Table 3.

Average Differences in the Racial Composition of Local Environments for the End Points of Compatible Candidate Edges.

		Difference in Local Environments’ Racial Composition		Counterfactual − Observed
		Observed	Counterfactual	Difference
Hartford, CT	Black	.062	.035	−.026***
	Hispanic	.078	.040	−.038***
	White	.086	.044	−.041***
Rochester, NY	Black	.053	.029	−.024***
	Hispanic	.029	.017	−.012***
	White	.056	.032	−.024***
Cincinnati, OH	Black	.072	.037	−.035***
	Hispanic	.013	.007	−.006***
	White	.073	.038	−.035***
Baltimore, MD	Black	.061	.031	−.030***
	Hispanic	.018	.009	−.009***
	White	.062	.031	−.030***
Philadelphia, PA	Black	.045	.026	−.019***
	Hispanic	.018	.011	−.007***
	White	.047	.027	−.020***

***

p < .001 (paired t test).

The association between disconnectivity and differences in the racial composition of nearby areas suggests that road segments we would expect to exist given the surrounding infrastructure may be more likely to be missing between areas with different racial compositions. To examine this further, we use a series of ordinary least squares models, summarized in Table 4. We analyze whether there is a relationship between edge propensity and differences in the local environment composition of an edge candidate’s end points, using the observed road network. In other words, are the high propensity missing road segments more likely to connect areas with different racial compositions, compared with the lower propensity edge candidates? The models include the end points of all edge candidates, and we run separate models for each city and each ethnoracial group.

Table 4.

Difference in Observed Racial Composition for Nodes Associated with Each Compatible Edge Candidate.

	Difference in Observed Racial Composition
	Hartford, CT			Rochester, NY			Cincinnati, OH			Baltimore, MD			Philadelphia, PA
	Black	Hispanic	White	Black	Hispanic	White	Black	Hispanic	White	Black	Hispanic	White	Black	Hispanic	White
Intercept	.053*** (.006)	.065*** (.005)	.068*** (.007)	.048*** (.002)	.027*** (.001)	.050*** (.002)	.070*** (.002)	.012*** (.001)	.072*** (.002)	.055*** (.002)	.017*** (.001)	.056*** (.002)	.045*** (.002)	.018*** (.001)	.047*** (.002)
Edge utility	.027*** (.006)	.041*** (.006)	.057*** (.008)	.027*** (.004)	.009*** (.002)	.031*** (.004)	.004*** (.001)	.001** (.0002)	.004*** (.001)	.023*** (.001)	.004*** (.001)	.023*** (.001)	.001*** (.0002)	.0002 (.0001)	.001*** (.0002)
n (edges)	415			1,223			3,078			4,752			3,431
Adjusted R²	.045	.113	.107	.044	.013	.048	.005	.003	.005	.070	.011	.068	.004	.0002	.004

Note: Measured using local environments with a reach of 0.5 km. If either node associated with an edge has no population in its local environment, the edge is excluded.

p < .01 and ***p < .001 (paired t test).

We find a significant relationship, with higher utility values associated with larger differences in the racial composition of disconnected nodes’ local environments when the edge candidate is missing from the road network. The relationship holds for all cities and racial groups, except the Hispanic population of Philadelphia. This suggests these highly likely but nonetheless missing road segments facilitate both social and spatial disconnection. This is consistent with prior qualitative findings that physical barriers and disconnectivity have been used as mechanisms to reinforce or exacerbate segregation by facilitating greater separation between ethnoracial groups in nearby areas (Armborst et al. 2015; Feagin 1988; Jackson 1985; Mohl 2008; Schindler 2015; Sugrue 2005).

Figure 5 provides an example of the social and spatial division associated with high propensity missing road segments. The figure shows a map of the Black-Hispanic-White population composition in an area of Hartford. In the center of the map, the dotted line represents a high propensity edge candidate that is missing from the observed road network, and the asterisks represent its end points. The solid gray line bisecting the map from north to south represents railroad tracks that divide this area of Hartford, with residents who are predominantly White to the west of the tracks and predominantly Black residents to the east.

Figure 5.

Black, Hispanic, and White populations in an area of Hartford, Connecticut, in 2010.

When the edge candidate is excluded from the network, the racial composition in the local environments of its end points are nearly monoracial. In contrast, the local environments are more diverse and representative of the city’s composition if the edge is included in the observed network. This difference in the racial composition of local environments corresponds to a difference in segregation. Segregation in the local environments is higher by 0.16 for the node to the east and by 0.21 for the node to the west when the edge is missing from the network.

Local Analysis of Segregation

To further consider the implications of these differences in racial composition, we examine the relationship between disconnectivity and segregation. We measure segregation in the local environment of each compatible candidate edge’s end points with and without the candidate edge included in the road network. We then compute the difference between segregation when the road network includes and does not include the edge candidate. We measure this separately for the local environment of each end point and calculate the mean value of the two end points for each edge candidate. If the nodes’ disconnectivity (because of the absence of the edge candidate) helps facilitate segregation in the local area, we should see higher segregation values when the edge candidate is missing from the road network.

On average, we find that local segregation is significantly higher when an edge candidate is missing from the road network, compared with when it is included (p < .05, paired t test). The average differences range from 0.004 to 0.022 across the five cities (see Table 5). These results amplify the city-level findings in the previous section—the local differences are about twice the magnitude of the city-level differences. Despite their significance, the differences may seem relatively small. One reason for this may be that we included all edge candidates in the analysis, regardless of their utility value, and higher propensity edges may have a stronger association with segregation values.

Table 5.

Summary Statistics for Compatible Edge Candidates.

		Mean	s.d.
Hartford, CT	Edge utility	.353	.994
	Observed segregation	.396	.429
	Counterfactual segregation	.378	.424
	Difference in segregation	−.022	.113
Rochester, NY	Edge utility	.215	.723
	Observed segregation	.352	.319
	Counterfactual segregation	.344	.321
	Difference in segregation	−.008	.043
Cincinnati, OH	Edge utility	.483	2.851
	Observed segregation	.384	.293
	Counterfactual segregation	.372	.295
	Difference in segregation	−.012	.053
Baltimore, MD	Edge utility	.301	1.669
	Observed segregation	.561	.472
	Counterfactual segregation	.549	.469
	Difference in segregation	−.013	.098
Philadelphia, PA	Edge utility	.590	7.751
	Observed segregation	.641	.357
	Counterfactual segregation	.637	.360
	Difference in segregation	−.004	.049

Note: Segregation values represent the average for nodes associated with each edge using local environments with a reach of 0.5 km. Nodes with no population in their local environments are excluded. If both nodes associated with an edge have no population in their local environments, the edge is excluded.

To further understand the relationship between segregation and edge propensity, we run a series of ordinary least squares models regressing the difference in segregation with and without the edge candidate on the propensity for the edge to exist in the network (measured in terms of its utility). Table 6 presents the results. The values for the intercept correspond to the difference in segregation for edge candidates with a utility score of zero. The coefficient for edge utility indicates the change in the segregation difference associated with a one unit increase in edge utility. Although we normalize our measure of utility to give it a consistent meaning (i.e., the average reduction in shortest path length per unit of candidate edge length, see the section “Computation of Edge Utility Scores”), the mean and range of values varies across cities. This is important to keep in mind, as the cities with the smallest coefficients for edge utility (Cincinnati and Philadelphia) also have the largest mean and maximum utility values (see Table A3).

Table 6.

Models of the Relationship between Edge Utility and Difference in Counterfactual and Observed Segregation.

	Difference in Segregation
	Hartford, CT	Rochester, NY	Cincinnati, OH	Baltimore, MD	Philadelphia, PA
	(1)	(2)	(3)	(4)	(5)
Intercept	−.013* (.005)	−.006*** (.001)	−.012*** (.001)	−.011*** (.001)	−.004*** (.001)
Edge utility	−.028*** (.005)	−.012*** (.002)	−.001* (.0003)	−.006*** (.001)	−.0003*** (.0001)
Observations	446	1,247	3,133	5,001	3,527
R ²	.062	.044	.002	.011	.002
Adjusted R²	.059	.043	.002	.011	.002
Residual standard error	.109	.042	.053	.098	.049
F	29.110***	57.190***	6.153**	54.165***	8.351***

p < .05 and ***p < .001 (paired t test).

We find a significant relationship between edge propensity and segregation differences (see Table 6). Higher propensity edge candidates are associated with larger (negative) differences in segregation. These results indicate that local segregation is higher when an edge candidate is missing from the road network, and segregation is even higher when the edge candidate is expected to exist but is missing. In other words, the more unexpected it is for a road segment to be missing, the higher levels of segregation are when it is, in fact, missing. This suggests these unexpectedly missing road segments are the sources of disconnectivity that contribute the most to higher levels of segregation. This relationship is statistically significant in all five cities.

Conclusions

In this article, we developed and demonstrated a novel approach for measuring and analyzing residential segregation, the CRN method, which enables a deeper understanding of the relationship between road network connectivity and segregation patterns. Our method identifies missing road segments one would expect to exist in a city’s road network, given the surrounding infrastructure. We demonstrated the application of this approach with a global analysis of segregation and two local analyses in five U.S. cities.

First, at the city level, we compared changes in segregation when counterfactual road segments are added to the road network from highest to lowest edge propensity, to a control scenario with edges added without regard for their propensity. We found that the highest utility missing road segments are associated with the largest differences in segregation. The disconnection created by these missing road segments seems to facilitate higher levels of segregation.

Second, at the local level, we examined the racial composition of nearby areas that would be connected by missing road segments. We found that compositional differences are associated with unexpected patterns of disconnectivity: road segments one would expect to exist are more likely to be missing between areas with different racial compositions.

Third, we compared segregation measured using the observed road network and counterfactual road networks that include missing road segments. We found that unexpected disconnectivity was associated with significantly higher segregation in the local areas of missing road segments and at the city level. Missing road segments that are most likely to exist are the sources of disconnectivity that contribute the most to higher levels of segregation. Our findings suggest these highly likely but nonetheless missing road segments facilitate both social and spatial disconnection. This is consistent with prior qualitative findings that disconnectivity and physical barriers have been used as mechanisms to reinforce or exacerbate segregation by facilitating greater separation between ethnoracial groups in nearby areas (Armborst et al. 2015; Feagin 1988; Jackson 1985; Mohl 2008; Schindler 2015; Sugrue 2005).

Some of the differences in observed and counterfactual segregation may seem small, particularly at the city level (see Table A4). To highlight the range of differences, we can compare the differences in segregation for compatible candidate edges in the top 5 percent and bottom 95 percent of utility values (see Table A5). Three notable patterns emerge in this comparison. First, local differences in segregation are substantially larger than citywide differences. Second, the differences in segregation are larger for the top 5 percent of utility values than for the bottom 95 percent, as we would expect from our models (see Table 6). Third, there is a lot of variation across and within cities. For example, in Hartford, the average difference for the top 5 percent of utility is huge: −0.183. In Philadelphia, the average difference is quite small, regardless of the utility value. In both cities, there are sizable standard deviations, especially for the top 5 percent. We would not expect all counterfactual road segments to be associated with patterns of racial segregation equally within or across cities. However, they are an important consideration in understanding the spatial structure of segregation and may help explain why segregation persists in some areas and not others.

In this article, we do not examine the historical processes that made some roads possible and prevented others. Future research using archival methods could extend our approach by shedding light on these decision-making processes. Moreover, our research suggests further questions about the social processes and institutions that create road networks, the mechanisms through which patterns of connection and disconnection are reproduced or reconfigured over time, and the consequences of such patterns. Future research can also explore how spatial patterns of connection and disconnection become a source of information for individuals and institutions and influence residential mobility decisions and housing market processes.

The CRN method does not establish causality or imply the precedence of the road network, nor does it adjudicate which came first: residential segregation that shaped the structure of the road network, or road network connectivity that shaped where people live. Nor is the CRN method a tool for suggesting where new road segments should be built. The method is designed to gain a deeper understanding of complex processes that may result from existing (or missing) infrastructural elements.

Indeed, because of the complex nature of networked systems, we use abstraction and approximation to construct a counterfactual representation of road networks on the basis of plausibility rather than a combinatorial enumeration of all possibilities. We do not account for all physical or geographic features, such as bodies of water or changes in elevation, that might affect decisions about the feasibility of roads. We use this approach not to make recommendations about road construction, but to examine the significance of unexpected missingness.

Overall, our development of the CRN method makes four key contributions. First, we draw attention to the possibilities that emerge by analyzing roads as networks. By situating roads within their networked context, our approach expands our understanding of the separating and connecting power of roads.

Second, rather than focusing only on the observed built environment, we consider what is missing. We developed a set of measures and criteria for identifying a set of plausible and compatible edge candidates for a given road network.

Third, we developed and validated a method to quantify the propensity of missing road segments, which evaluates individual edge candidates and how surprising it is that they are missing, on the basis of how much connectivity they would provide in the nearby area. This allows us to examine the significance of unexpected missingness.

Fourth, the CRN method foregrounds the role of the built environment in understanding residential segregation. Road networks are a major component of the built environment and a durable form of infrastructure that cannot be easily modified.

The CRN method enables further research to uncover the inequalities embedded in this infrastructure and examine the consequences of racist infrastructure. In these ways, this article contributes to understanding the interconnectedness of the spatial and social dimensions of cities.

Footnotes

Appendix

Table A5.

Summary Statistics for Compatible Candidate Edges among the Top 5 Percent and Bottom 95 Percent of Utility Values.

	Top 5 Percent of Utility Values			Bottom 95 Percent of Utility Values
	Mean	s.d.	Median	Mean	s.d.	Median
Hartford, CT	(n = 21)			(n = 425)
Edge utility	3.969	2.366	2.876	.174	.304	.049
Observed segregation	.506	.473	.302	.390	.426	.226
Counterfactual segregation	.323	.318	.202	.381	.429	.221
Difference in segregation	−.183	.323	−.056	−.015	.084	0
Rochester, NY	(n = 61)			(n = 1,186)
Edge utility	2.442	2.255	1.456	.101	.154	.037
Observed segregation	.462	.331	.413	.346	.318	.282
Counterfactual segregation	.434	.346	.396	.339	.319	.260
Difference in segregation	−.030	.081	−.008	−.007	.039	0
Cincinnati, OH	(n = 157)			(n = 2,976)
Edge utility	6.102	11.323	2.650	.186	.284	.061
Observed segregation	.435	.286	.356	.381	.293	.325
Counterfactual segregation	.396	.299	.320	.370	.295	.302
Difference in segregation	−.044	.083	−.011	−.011	.051	0
Baltimore, MD	(n = 239)			(n = 4,762)
Edge utility	4.263	6.433	1.987	.103	.175	.031
Observed segregation	.704	.528	.520	.554	.468	.438
Counterfactual segregation	.643	.535	.449	.545	.465	.433
Difference in segregation	−.066	.216	0	−.011	.088	0
Philadelphia, PA	(n = 161)			(n = 3,366)
Edge utility	10.921	34.797	1.931	.096	.169	.025
Observed segregation	.673	.471	.601	.640	.352	.666
Counterfactual segregation	.669	.475	.597	.636	.354	.663
Difference in segregation	−.006	.101	0	−.004	.045	0

Acknowledgements

We are grateful for the feedback from seminar participants at the University of Washington Center for Studies in Demography and Ecology and the Institute for Analytical Sociology, and the attendees of sessions at the American Sociological Association and Population Association meetings. We also benefited from conversations with Maria Riolo and comments from the editors and anonymous reviewers.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by a Rice University InterDisciplinary Excellence Award and resources from the Center for Research Computing at Rice University.

ORCID iDs

Elizabeth Roberto

Santiago Segarra

Jaleh Jalili

Notes

Author Biographies

Elizabeth Roberto is an assistant professor in the Department of Sociology and the founding codirector of the Center for Computational Insight on Inequality and Society at Rice University. She has broad research interests in social and spatial inequality, a substantive focus on residential segregation, and methodological expertise in computational social science and quantitative methods. Her research uses innovative methods to examine the complex relationship between the social and built environment of cities. Her research has been supported by the James S. McDonnell Foundation, the American Sociological Association, and the National Academies, and she is the recipient of a CAREER Award from the National Science Foundation.

Yu Zhu is a machine learning scientist at PayPal. She was previously a postdoctoral researcher in the Department of Computer Science at Purdue University and received her PhD in electrical and computer engineering from Rice University. Her research interests include network science, machine learning, and graph signal processing.

Santiago Segarra is a W. M. Rice Trustee Associate Professor of Electrical and Computer Engineering at Rice University, where he also holds courtesy appointments in computer science and statistics. He earned his PhD in electrical and systems engineering from the University of Pennsylvania in 2016. Prior to joining Rice in 2018, he was a postdoctoral researcher at the Massachusetts Institute of Technology. His research focuses on network theory, machine learning, graph signal processing, and data analysis. His honors include early career awards from the National Science Foundation, Army Research Office, and Army Research Institute, and six best paper awards.

Jaleh Jalili is an assistant professor in the Department of Sociology at Rice University. Her research interests include urban sociology, social movements, space and place, inequality, cultural sociology, and gender. Her work explores different aspects of social, political, and cultural change in urban contexts. She is the author of Tehran’s Borderlines: Urban Development and Public Life in Contemporary Iran, and her articles have appeared in Social Problems, Sociological Perspectives, Sociology Compass, Middle East Journal, Frontiers, and as book chapters in edited volumes. She received her doctoral degree in sociology from Brandeis University in 2018.

References

Adamic

Lada A.

Adar

Eytan

. 2003. “Friends and Neighbors on the Web.”Social Networks 25(3):211–30.

Anderson

Elijah

. 1990. Streetwise: Race, Class, and Change in an Urban Community. Chicago: University of Chicago Press.

Archer

Deborah

. 2020. “‘White Men’s Roads through Black Men’s Homes’: Advancing Racial Equity through Highway Reconstruction.”Vanderbilt Law Review 73:1259–1330.

Armborst

Tobias

D’Oca

Daniel

Theodore

Georgeen

, eds. 2015. The Arsenal of Exclusion/Inclusion. New York: Actar.

Bayor

Ronald H.

1988. “Roads to Racial Segregation.”Journal of Urban History 15(1):3–21.

Bivand

Roger S.

Keitt

Tim

Rowlingson

Barry

. 2023. “rgdal: Bindings for the Geospatial Data Abstraction Library.” R Package Version 1.6-7. Retrieved September 29, 2025. https://CRAN.R-project.org/package=rgdal.

Bivand

Roger S.

Rundel

Colin

. 2023. “rgeos: Interface to Geometry Engine—Open Source (GEOS).” R Package Version 0.6-4. Retrieved September 29, 2025. https://CRAN.R-project.org/package=rgeos.

Boeing

Geoff

. 2021. “Off the Grid . . . and Back Again? The Recent Evolution of American Street Network Planning and Design.”Journal of the American Planning Association 87(1):123–37.

Boeing

Geoff

. 2022. “Street Network Models and Indicators for Every Urban Area in the World.”Geographical Analysis 54(3):519–35.

10.

Burt

Ronald S.

2009. Structural Holes: The Social Structure of Competition. Cambridge, MA: Harvard University Press.

11.

Connerly

C. E.

2002. “From Racial Zoning to Community Empowerment: The Interstate Highway System and the African American Community in Birmingham, Alabama.”Journal of Planning Education and Research 22(2):99–114.

12.

Csardi

Gabor

Nepusz

Tamas

. 2006. “The igraph Software Package for Complex Network Research.”InterJournal, Complex Systems 1695:1–9.

13.

Deener

Andrew

. 2007. “Commerce as the Structure and Symbol of Neighborhood Life: Reshaping the Meaning of Community in Venice, California.”City & Community 6(4):291–314.

14.

Ding

Lei

Hwang

Jackelyn

Divringi

Eileen

. 2016. “Gentrification and Residential Mobility in Philadelphia.”Regional Science and Urban Economics 61:38–51.

15.

Donaldson

Kwame

. 2013. “How Big Is Your Neighborhood? Using the AHS and GIS to Determine the Extent of Your Community” [Working paper]. Washington, DC: U.S. Census Bureau.

16.

Faber

Jacob William

. 2018. “Segregation and the Geography of Creditworthiness: Racial Inequality in a Recovered Mortgage Market.”Housing Policy Debate 28(2):215–47.

17.

Faber

Jacob William

. 2019. “Segregation and the Cost of Money: Race, Poverty, and the Prevalence of Alternative Financial Institutions.”Social Forces 98(2):819–48.

18.

Faber

Jacob W.

2020. “We Built This: Consequences of New Deal Era Intervention in America’s Racial Geography.”American Sociological Review 85(5):739–75.

19.

Feagin

Joe R.

1988. The Free Enterprise City: Houston in Political and Economic Perspective. New Brunswick, NJ: Rutgers University Press.

20.

Federal Financial Institutions Examination Council. 2023. “Home Mortgage Disclosure Act (HMDA): Loan Application Register (LAR) Raw Data” [Data set]. Washington, DC: Consumer Financial Protection Bureau. Retrieved September 29, 2025. https://ffiec.cfpb.gov.

21.

Federal Housing Finance Agency. 2022. “Uniform Appraisal Dataset Aggregate Statistics, 2013–2021” [Data set]. Retrieved September 29, 2025. https://www.fhfa.gov/data/uniform-appraisal-dataset-aggregate-statistics.

22.

Fischer

Mary J.

2008. “Shifting Geographies: Examining the Role of Suburbanization in Blacks’ Declining Segregation.”Urban Affairs Review 43(4):475–96.

23.

Freeman

Lance

. 2009. “Neighbourhood Diversity, Metropolitan Segregation and Gentrification: What Are the Links in the US?” Urban Studies 46(10):2079–2101.

24.

Garey

Michael R.

1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco, CA: W. H. Freeman.

25.

Grannis

Rick

. 1998. “The Importance of Trivial Streets: Residential Streets and Residential Segregation.”American Journal of Sociology 103(6):1530–64.

26.

Grannis

Rick

. 2005. “T-Communities: Pedestrian Street Networks and Residential Segregation in Chicago, Los Angeles, and New York.”City & Community 4(3):295–321.

27.

Hagberg

Aric A.

Schult

Daniel A.

Swart

Pieter J.

2008. “Exploring Network Structure, Dynamics, and Function Using NetworkX.” Pp. 11–15 in Proceedings of the 7th Python in Science Conference (SciPy2008), edited by Varoquaux

Vaught

Millman

Pasadena, CA: SciPy.

28.

Harris

Charles R.

Millman

K. Jarrod

van der Walt

Stefan J.

Gommers

Ralf

Virtanen

Pauli

Cournapeau

David

Wieser

Eric

, et al. 2020. “Array Programming with NumPy.”Nature 585(7825):357–62.

29.

Harvey

David

. 1973. Social Justice and the City. Baltimore, MD: Johns Hopkins University Press.

30.

Hayden

Dolores

. 2003. Building Suburbia: Green Fields and Urban Growth, 1820–2000. New York: Pantheon.

31.

Hunter

John D.

2007. “Matplotlib: A 2D Graphics Environment.”Computing in Science & Engineering 9(3):90–95.

32.

Hwang

Jackelyn

. 2020. “Gentrification without Segregation? Race, Immigration, and Renewal in a Diversifying City.”City & Community 19(3):538–72.

33.

Hwang

Jackelyn

Hankinson

Michael

Brown

Kreg Steven

. 2015. “Racial and Spatial Targeting: Segregation and Subprime Lending within and across Metropolitan Areas.”Social Forces 93(3):1081–1108.

34.

Hwang

Jackelyn

Sampson

Robert J.

2014. “Divergent Pathways of Gentrification: Racial Inequality and the Social Order of Renewal in Chicago Neighborhoods.”American Sociological Review 79(4):726–51.

35.

Hyra

Derek S.

Squires

Gregory D.

Renner

Robert N.

Kirk

David S.

2013. “Metropolitan Segregation and the Subprime Lending Crisis.”Housing Policy Debate 23(1):177–98.

36.

Jackson

Kenneth T.

1985. Crabgrass Frontier: The Suburbanization of the United States. Oxford, UK: Oxford University Press.

37.

Jiang

Bin

Claramunt

Christophe

. 2016. “Topological Analysis of Urban Street Networks.”Environment and Planning B: Planning and Design 31(1):151–62.

38.

Kane

Michael J.

Emerson

John W.

Weston

Stephen

. 2013. “Scalable Strategies for Computing with Massive Data.”Journal of Statistical Software 55(14):1–19.

39.

Kim

Young-An

Hipp

John R.

2016. “Physical Boundaries and City Boundaries: Consequences for Crime Patterns on Street Segments?” Crime & Delinquency 64(2):227–54.

40.

Knaap

Elijah

Rey

Sergio

. 2023. “Segregated by Design? Street Network Topological Structure and the Measurement of Urban Segregation.”Environment and Planning B: Urban Analytics and City Science 51(7):1408–29.

41.

Korver-Glenn

Elizabeth

. 2022. Race Brokers: Housing Markets and Racial Segregation in 21st Century Urban America. New York: Oxford University Press.

42.

Korver-Glenn

Elizabeth

Roberto

Elizabeth

Binkovitz

Leah

Mayorga

Sarah

. 2024. “Barriers and Boundaries: How Residents Make Meaning of Segregating Built Environments.”Sociological Perspectives 67(4–6):261–88.

43.

Lee

Barrett A.

Reardon

Sean F.

Firebaugh

Glenn

Farrell

Chad R.

Matthews

Stephen A.

O’Sullivan

David

. 2008. “Beyond the Census Tract: Patterns and Determinants of Racial Segregation at Multiple Geographic Scales.”American Sociological Review 73(5):766–91.

44.

Lees

Loretta

Slater

Tom

Wyly

Elvin

. 2008. Gentrification. New York: Routledge.

45.

Legewie

Joscha

. 2018. “Living on the Edge: Neighborhood Boundaries and the Spatial Dynamics of Violent Crime.”Demography 55(5):1957–77.

46.

Legewie

Joscha

Schaeffer

Merlin

. 2016. “Contested Boundaries: Explaining Where Ethnoracial Diversity Provokes Neighborhood Conflict.”American Journal of Sociology 122(1):125–61.

47.

Liben-Nowell

David

Kleinberg

Jon

. 2007. “The Link-Prediction Problem for Social Networks.”Journal of the American Society for Information Science and Technology 58(7):1019–31.

48.

Lloyd

Richard

. 2006. Neo-Bohemia: Art and Commerce in the Post-industrial City. New York: Routledge.

49.

Logan

John R.

2000. “Ethnic Diversity Grows, Neighborhood Integration Lags.” Pp. 235–51 in Redefining Urban and Suburban America: Evidence from Census 2000, edited by Katz

Lang

Washington, DC: Brookings Institution Press.

50.

Logan

John R.

Kye

Samuel

Carlson

H. Jacob

Minca

Elisabeta

Schleith

Daniel

. 2023. “The Role of Suburbanization in Metropolitan Segregation After 1940.”Demography 60(1):281–301.

51.

Louf

Rémi

Barthelemy

Marc

. 2014. “A Typology of Street Patterns.”Journal of The Royal Society Interface 11(101):20140924.

52.

Massey

Douglas S.

Denton

Nancy A.

1988a. “The Dimensions of Residential Segregation.”Social Forces 67(2):281–315.

53.

Massey

Douglas S.

Denton

Nancy A.

1988b. “Suburbanization and Segregation in US Metropolitan Areas.”American Journal of Sociology 94(3):592–626.

54.

McPherson

Miller

Smith-Lovin

Lynn

Cook

James M.

2001. “Birds of a Feather: Homophily in Social Networks.”Annual Review of Sociology 27:415–44.

55.

Microsoft, and Steve Weston. 2022. “foreach: Foreach Looping Construct for R.” R Package Version 1.5.2. Retrieved September 29, 2025. https://CRAN.R-project.org/package=foreach.

56.

Mohl

Raymond A.

2008. “The Interstates and the Cities: The U.S. Department of Transportation and the Freeway Revolt, 1966–1973.”Journal of Policy History 20(2):193–226.

57.

Moritz

Philipp

Nishihara

Robert

Wang

Stephanie

Tumanov

Alexey

Liaw

Richard

Liang

Eric

Elibol

Melih

, et al. 2018. “Ray: A Distributed Framework for Emerging AI Applications.” arXiv. Retrieved September 29, 2025. https://arxiv.org/abs/1712.05889.

58.

Nelson

Robert K.

Winling

LaDale

, et al. 2023. “Mapping Inequality: Redlining in New Deal America.” Retrieved September 29, 2025. https://dsl.richmond.edu/panorama/redlining.

59.

Neuwirth

Erich

. 2022. “RColorBrewer: ColorBrewer Palettes.” R Package Version 1.1–3. Retrieved September 29, 2025. https://CRAN.R-project.org/package=RColorBrewer.

60.

Owens

Ann

. 2019. “Building Inequality: Housing Segregation and Income Segregation.”Sociological Science 6(19):497–525.

61.

Pebesma

Edzer J.

Bivand

Roger S.

2005. “Classes and Methods for Spatial Data in R.”R News 5(2).

62.

Popescul

Alexandrin

Ungar

Lyle H.

2003. “Statistical Relational Learning for Link Prediction.” Presented at IJCAI Workshop on Learning Statistical Models from Relational Data; August 9–11, 2003; Acapulco, Mexico.

63.

Rabin

Yale

. 1987. “The Roots of Segregation in the Eighties: The Role of Local Government Actions.” Pp. 208–26 in Divided Neighborhoods: Changing Patterns of Racial Segregation, edited by Tobin

G. A.

Newbury Park, CA: Sage.

64.

Roberto

Elizabeth

. 2018. “The Spatial Proximity and Connectivity Method for Measuring and Analyzing Residential Segregation.”Sociological Methodology 48(1):182–224.

65.

Roberto

Elizabeth

. 2024. “The Divergence Index: A Decomposable Measure of Segregation and Inequality.” arXiv. Retrieved September 29, 2025. https://arxiv.org/abs/1508.01167.

66.

Roberto

Elizabeth

Korver-Glenn

Elizabeth

. 2021. “The Spatial Structure and Local Experience of Residential Segregation.”Spatial Demography 9(3):277–307.

67.

Rothstein

Richard

. 2017. The Color of Law: A Forgotten History of How Our Government Segregated America. New York: Liveright.

68.

Rugh

Jacob S.

Albright

Len

Massey

Douglas S.

2015. “Race, Space, and Cumulative Disadvantage: A Case Study of the Subprime Lending Collapse.”Social Problems 62(2):186–218.

69.

Schindler

Sarah

. 2015. “Architectural Exclusion: Discrimination and Segregation Through Physical Design of the Built Environment.”Yale Law Journal 124:1934–2024.

70.

Smith

Neil

. 1979. “Toward a Theory of Gentrification: A Back to the City Movement by Capital, Not People.”Journal of the American Planning Association 45(4):538–48.

71.

Smith

Neil

. 1987. “Gentrification and the Rent Gap.”Annals of the Association of American Geography 77(3):462–65.

72.

Spielman

Seth

Harrison

Patrick

. 2013. “The Co-evolution of Residential Segregation and the Built Environment at the Turn of the 20th Century: A Schelling Model.”Transactions in GIS 18(1):25–45.

73.

Sugrue

Thomas J.

2005. The Origins of the Urban Crisis: Race and Inequality in Postwar Detroit. Princeton, NJ: Princeton University Press.

74.

U.S. Census Bureau. 2011. “2010 Census Summary File 1—United States” [Machine-readable data files]. Washington, DC: U.S. Census Bureau.

75.

U.S. Census Bureau. 2012. “2010 TIGER/Line Shapefiles” [Machine-readable data files]. Washington, DC: U.S. Census Bureau.

76.

Wacquant

Loïc J. D.

Wilson

William J.

1989. “The Cost of Racial and Class Exclusion in the Inner City.”Annals of the American Academy of Political and Social Science 501(1):8–25.

77.

Zukin

Sharon

. 1987. “Gentrification: Culture and Capital in the Urban Core.”Annual Review of Sociology 13:129–47.

78.

Zukin

Sharon

. 2009. Naked City: The Death and Life of Authentic Urban Places. Oxford, UK: Oxford University Press.