Abstract
Introduction
The analysis of spatial patterns uses one of the three fundamental forms of spatial data: points, lines, or areas. In the social sciences, the most common forms of spatial analysis are conducted on points and areas, with the latter accounting for the majority of such studies: understanding spatial patterns across census units, neighborhoods, counties, provinces/states, or countries, for example. However, the analysis of spatial point patterns is of interest to a number of disciplines including criminology (Ratcliffe, 2005), ecology (Perry et al., 2006), and epidemiology (Elliott et al., 2000).
One of the most commonly asked questions in the analysis of spatial point patterns is whether or not the spatial pattern follows one of the theoretical distributions: uniform, random, or clustered. Although this may be of some interest, it has been long known that human activities and their conditions are clustered in space (Gatrell et al., 1996). As such, a spatial point pattern test that can identify differences in the forms of clustering would be particularly instructive: is a phenomenon more clustered than expected, for example? Researchers may be interested in knowing how similar two crime types are with regard to their spatial pattern or two disease types. Identifying differences in crime and disease type spatial patterns could help researchers identify unique risk factors that may be addressed. Alternatively, if clusters of both can be identified, a common intervention may be possible.
When analyzing spatial (point) patterns, there are two forms of analysis: global and local. Global spatial analyses investigate the spatial pattern and provide an overall statistic for the entire study area; local spatial analyses investigate the same spatial pattern, but provide statistics for all areal units involved such that the output of the analyses can be mapped. Moreover, it is now well-understood that global spatial analyses can mask local spatial patterns (Anselin, 1995; Getis and Ord, 1992, 1996; Ord and Getis, 1995), and consequently, both forms of analysis should be considered if data are available.
In this article, a relatively recent spatial point pattern test is discussed. This spatial point pattern test identifies the similarity of two spatial point patterns. As such, it is not concerned with the theoretical distributions of uniform, random, and clustered. The test, its applications, and directions for future research are all covered. This test is area-based (e.g. census tracts and neighborhoods), as opposed to point-based 1 (measured actual and expected distances between points), nonparametric, and provides both global and local output. Because this spatial point pattern test is nonparametric, it does not rely on any underlying data-generating process or distribution, or require that particular assumptions be met. These assumptions usually involve particular statistical distributions for the data, but with nonparametric tests, no such assumptions are required. In the case of this spatial point pattern test, it is a permutation test that generates its own distribution for the purposes of statistical inference.
All that is required are two spatial point pattern files and an area-based file; the graphical user interface that performs the test, discussed below, can generate a grid of any size to act as the area-based file for analysis. The global output is an index of similarity,
Spatial point pattern test
The spatial point pattern test developed by Andresen (2009) addresses the issue of comparing the similarity of two point patterns without being concerned about the statistical concepts of randomness, uniformity, and clustering. This test is conceptually simple, but computationally intense. With the development of a graphical user interface (see Figure 1), the application of this test is straightforward. For the purpose of understanding the nature of the test itself, an illustration is provided here. There are a number of steps necessary to undertake the spatial point pattern test that are outlined below and illustrated in Figure 2—the steps are not discussed specifically in terms of Figure 2, rather this figure is included to aid in the understanding of the testing process through a visualization of the discussion.

Screenshot of the graphical user interface.

Steps for the spatial point pattern test.
The first step for the test is to identify the necessary data for analysis. Although a somewhat manifest statement, there are some data conditions necessary to conduct the test. At this time, although future developments are discussed further below, two geo-referenced point-based data sets for comparison are necessary. The geo-referencing is necessary because each point must be assigned to an individual areal unit of analysis. The areal unit data may take any number of different forms: a set of grids placed over the study area, neighborhood boundaries, census tracts, or some other form of census boundary areas (census block groups, dissemination areas, output areas, etc.). Although all of the examples just listed would have complete coverage over the study area, this is not necessary for the test. Rather, all of the geo-referenced point data need to be assigned to an area; this becomes particularly important for some research questions, described in detail below. Briefly, street segments have been used in applications of this spatial point pattern test; street segments are polyline files that do not have complete coverage over a study area, by definition of the data. During the geocoding process, points are often placed on the appropriate side of the street based on the address locator such that the points are not directly on the polyline. This can be overcome through the use of small-area buffers around each street segment to place each geo-referenced point within, but there will still be empty spaces between the buffers.
Once all of the necessary data are collected, the first decision that needs to be made is which geo-referenced point data set shall be deemed the base data set and the test data set; the base data are considered the baseline. From the perspective of undertaking the test, this choice may be of little consequence, but there may be “natural” choices. For example, if the spatial point patterns of a phenomenon are being compared over time, 2014 and 2015, a natural choice is to have the 2014 data as the base data set and the 2015 data as the test data set. In this situation, the implicit question is whether the 2015 data have a similar spatial pattern to the 2014 data: has something changed over the year?
The base data set is manipulated the least during the test. Each geo-referenced point is assigned to an areal unit, the number of points within each areal unit is aggregated, and the percentage of the points within each areal unit is calculated. This allows for the comparison of data sets with different numbers of points. This set of percentages will then be used to compare to the test data set that undergoes a Monte Carlo simulation process.
Similar to the base data set, the geo-referenced points within the test data set are assigned to an areal unit. At this stage, however, the process for the two data sets changes. Rather than calculating a single percentage for each areal unit, a Monte Carlo simulation is performed to create a confidence interval for each areal unit. A random sample (with replacement) of the test data set is undertaken, selecting 85% of the entire test data set—this value is based on Ratcliffe (2004), but can be modified in the graphical user interface. The percentages of points falling within each areal unit of this randomly sampled data set are then calculated and stored. This process is repeated a number of times in order to calculate a confidence interval for subsequent statistical testing (200 is used for convenient cut-offs to generate a 95% confidence interval). 2 All of these percentages within each areal unit can then be ranked, removing the top and bottom 2.5% of percentages to create a 95% nonparametric confidence interval. The percentage for the base and the upper–lower estimates for the test data are then compared. If the percentage of points for an areal unit in the base data set is within the corresponding confidence interval for the test data set, this areal unit is considered similar. This is repeated for each of the individual areal units of analysis.
This information can then be applied to calculate a global Index of Similarity,
where
The last decision to be made is whether the two spatial point patterns are to be considered similar, or not. This is done using the global
The general underlying principle for this spatial point pattern test is that the spatial pattern of the base data set is given (the percentages for the individual areal units are just calculated), but the spatial pattern for the test data set is one possible realization of the actual spatial point pattern. The randomization process maintains the underlying spatial pattern (data-generating process) of the test data set while creating sampling variation allowing for nonparametric statistical inference through the calculation of confidence intervals.
Applications of the spatial point pattern test
Previous applications of the spatial point pattern test
The first application of the spatial point pattern test is in the paper that initially develops the test (Andresen, 2009). In this first application, Andresen (2009) shows that the spatial point patterns of theft of vehicle, burglary (commercial and residential), and an aggregate of violent crime all appear to have similar spatial patterns. However, the analysis shows, using both census tracts and dissemination areas (the Canadian equivalent to the census block group in the United States), that their similarities are actually quite low:
In an investigation of crime concentrations and the importance of considering spatial scale when trying to understand spatial crime patterns, Andresen and Malleson (2011) consider the stability of spatial crime patterns through an application of the spatial point pattern test over time: same crime type, but investigating changes in its spatial patterns over years. Andresen and Malleson (2011) analyzed the crime types of assault, burglary, robbery, sexual assault, theft, theft of vehicle, and theft from vehicle considering census tracts, dissemination areas, and street segments. Overall, they found that when considering census tracts and dissemination areas, the spatial crime patterns were only relatively stable over time for the rare events of robbery and sexual assault. Only when the authors analyzed the stability of spatial crime patterns at the street segment level did stability/similarity over time emerge. However, this result was driven by the large percentage of street segments that did not have any crime; the
In an extension of the original work done by Andresen (2009), Andresen and Linning (2012) used the spatial point pattern test to investigate the similarity of spatial patterns of different crime types. The primary research question was whether it was appropriate to aggregate different crime types to their aggregate forms when the research is concerned with spatial patterns of crime: property crime and violent crime, for example. Andresen and Linning (2012) also considered three different spatial units of analysis (census tracts, dissemination areas, and street segments), but included crime data from Ottawa (commercial burglary, residential burglary, commercial robbery, individual robbery, other robbery, theft of vehicle, and a number of different aggregations) and Vancouver (the same crime types as Andresen and Malleson, 2011). Overall, they found that spatial crime patterns were not similar across all crime types for census tracts and dissemination areas—different forms of robbery in Ottawa were the exception. Only when street segments were used as the areal unit of analysis did the
Andresen and Malleson (2013b) explicitly investigate the impact of the modifiable areal unit problem using the spatial point pattern test using data from Vancouver, Canada, and Leeds, England. Specifically, Andresen and Malleson (2013b) consider the change in the spatial patterns of crime over time using two different areal units of analysis in each city. They then compare the results for the different spatial scales to investigate the percentage of the smaller areal units within the larger areal units that exhibit the same change. Generally speaking, these authors find that approximately one-half of the smaller areal units exhibit the same spatial change as the larger areal units that the smaller areal units fall within. Clearly, any results found for larger areal units are being driven by one-half of the smaller areal units. This shows the relationship between the modifiable areal unit problem and the ecological fallacy in this context—the ecological fallacy is committed when the researcher makes inference at a finer scale of resolution than is being analyzed, making inferences regarding individuals when neighborhoods are the unit of analysis, for example. Moreover, because of the nature of statistical testing, Andresen and Malleson (2013b) found that in some cases, none of the smaller areal units of analysis exhibited the same spatial change as the larger areal units. This was investigated in detail, and it was found that only when the data were aggregated to the larger areal units was (marginal) statistical significance obtained in the results. As such, care must be taken when making statistical inferences with spatial data.
In another consideration of (in)appropriate aggregation of crime data, Andresen and Malleson (2013a) investigated the similarity of spatial crime patterns across the different seasons of the year. Their research question was concerned with the value of yearly counts of crime, and corresponding rates, when considering any explanation of the respective patterns. If the spatial patterns of crime changed from season to season, are the spatial patterns of a yearly aggregate meaningful? Considering both census tracts and dissemination areas, Andresen and Malleson (2013a) found that the spatial patterns of crime were rather dissimilar for seven of the eight crime types investigated. Only robbery and sexual assault had
The next analysis of spatial point pattern similarity was undertaken by Andresen and Malleson (2014) in an analysis of crime displacement from a police foot patrol initiative. Most investigations of crime displacement (the movement of criminal activity because of a crime prevention initiative, such as a police foot patrol) have a treatment area and a catchment area that immediately surrounds the treatment area; this is done because it is assumed that crime may be displaced and move around the corner (Weisburd et al., 2006). Although there are still many questions to be answered regarding this process, research in this sub-field of crime prevention finds evidence for a diffusion of benefits: criminal activity decreases in both the treatment area and the catchment area (Guerette and Bowers, 2009; Johnson et al., 2014; Telep et al., 2014; Weisburd et al., 2006).
Andresen and Malleson (2014) ask the following question: in the presence of a diffusion of crime prevention benefits, does the spatial pattern of criminal activity change, or does criminal activity go down in the same way in all places? For example, suppose the levels of crime were the same in the treatment area with a crime prevention initiative and the catchment area without that crime prevention initiative: crime goes down everywhere to the same degree. However, crime could go down everywhere but at different rates in different places. Such a situation could be illustrated with a balloon: crime goes down everywhere such that the balloon now has less air in it, but it goes down more in the treatment area with the crime prevention initiative. This can be visualized by pushing down in the center of a deflated balloon. Crime has gone down everywhere, but the spatial pattern has changed such that there is now relatively more crime in the surrounding area. This situation may occur when offenders do move around the corner from the crime prevention initiative, but are less active there than before.
In their analyses, Andresen and Malleson (2014) found moderate support for a change in the spatial pattern of crime (in the presence of criminal activity decreasing in both the treatment area and the surrounding catchment area), indicating that the distribution of criminal activity shifts toward the surrounding area although that criminal activity decreases everywhere. This has important implications for further crime prevention initiatives because it shows a rational response from offenders: offenders decrease criminal activity, but shift their remaining criminal activity away from where the police are known to be.
Although they consider multiple methodologies to address their research question, Tompson et al. (2015) investigate the similarity of open-source crime data in the United Kingdom to actual police-recorded crime data. The issue is that in order to make criminal event data available to the public while simultaneously considering privacy and confidentiality, the developers of the open-source data undertake a process of geomasking that adds “noise” to the data such that actual locations of criminal events cannot be known. Tompson et al. (2015) were concerned whether this geomasking changed the underlying (and true) spatial crime patterns. Overall, these authors found that the similarity of these two data sources was not particularly high when considering small areas of analysis such as postal codes and output areas. However, when considering larger geographic areas, Tompson et al. (2015) found a high degree of similarity between the two different data sources for most crime types (lower layer super output areas and middle layer super output areas).
In another consideration of temporal aggregation, Andresen and Malleson (2015) investigated the similarity of spatial crime patterns for different days of the week for assault, burglary, robbery, sexual assault, theft, theft of vehicle, and theft from vehicle considering census tracts and dissemination areas. Similar to the seasonality research of Andresen and Malleson (2013a), this research found that the spatial patterns of crime changed significantly for different days of the week; the only exceptions were robbery and sexual assault with disseminations areas.
Extending the work of Andresen and Linning (2012), Melo et al. (2015) investigated the appropriateness of aggregating crime types in Campinas, Brazil. In their analysis, Melo et al. (2015) not only considered a number of spatial scales (ponderation areas, census tracts, and street segments) but considered the impact of zero-values beyond what has been discussed above. Melo et al. (2015) undertook a sensitivity analysis that only considered non-zero street segments to test the similarity of spatial point patterns in two ways; first, they considered a street segment non-zero if it had
In an extension of Andresen and Malleson (2013a), Linning (2015) investigated the spatial patterns of seasonality in Vancouver and Ottawa considering the micro-spatial unit of analysis, the street segment. Despite the presence of temporal seasonal patterns, Linning (2015) found no evidence for changing spatial patterns of crime in either city. This result shows the importance of the areal unit of analysis and the modifiable areal unit problem.
And most recently in the context of crime, Pereira et al. (in press) investigated the spatial patterns of homicide in Recife, Brazil. There has been a significant drop in homicides in Recife in recent years, and Pereira et al. (in press) found that these homicides were highly concentrated. Moreover, Pereira et al. (in press) found a high degree of spatial similarity from year to year while the homicide rate dropped almost 50% because so much of the city does not have any homicides. Even when considering non-zero areal units of analysis (census tracts and street segments), there was a moderately high degree of spatial similarity. Specifically, the use of the spatial point pattern test in conjunction with other analyses indicates that homicides simply stopped occurring in particular places in Recife.
At this time, Andresen (2010) is the only application of this spatial point pattern test that is outside of a criminological context—this potential is discussed further below. In this article, Andresen (2010) investigates the changing spatial patterns of interprovincial and international trade after the North American Free Trade Agreement entered into force in 1994. International trade theory predicts that when barriers to trade, such as tariffs, are removed, spatial patterns of interprovincial and international trade may change. Andresen’s (2010) data, however, are not points but values assigned to Canadian provinces and US states. In an effort to apply the spatial point pattern test to these data, Andresen (2010) uses the concept of a quasi-point: if British Columbia exports CAD5 million of goods to Texas, place 5 points within Texas for an analysis of changing trading patterns for British Columbia. With provinces and states being the units of analysis, as long as no inference is made at larger cartographic scales (smaller areas), issues regarding the ecological fallacy will not emerge (Openshaw, 1984a). In any event, the quasi-point proved to be instructive for the use of the spatial point pattern test. As expected, Andresen (2010) found that the spatial patterns of interprovincial and international trade changed substantially after the North American Free Trade Agreement entered into force. Moreover, the mapped output from the spatial point pattern test was able to show that the spatial pattern of trade shifted toward the United States, away from Canadian provinces, after the removal of trade barriers.
An example of the test and its output
Although instructive and containing interesting information for the spatial analysis of crime, the above discussions are limited without a complete presentation of the results. However, a full detailed account of all of these analyses is far beyond the scope of this article. As such, a detailed account of the seasonality study undertaken by Andresen and Malleson (2013a) is discussed here, with a map of the output from the spatial point pattern test.
As stated above, Andresen and Malleson (2013a) found that spatial crime patterns were not similar from season to season, an indication that aggregating criminal events to a yearly aggregate may be problematic. As shown by Linning (2015), this may not be the case at the micro-spatial unit of analysis, but this result may have important implications for understanding spatial crime patterns at more spatially aggregated levels.
Considering all crime types, Andresen and Malleson (2013a) found

Spatial point pattern test local output, all crimes, yearly aggregate versus summer: (a) Vancouver and points of interest and (b) spatial point pattern test local output.
Potential applications of the spatial point pattern test
The discussions above have shown the utility of this spatial point pattern test in the context of understanding the spatial patterns of crime, and one application within the international trade literature. However, the potential applications of this spatial point pattern test are in any context when the similarity between two spatial point patterns is of interest. In particular, this test is useful when the researcher is not particularly concerned whether the spatial pattern of their data is considered uniform, random, or clustered.
This is similar to the issue of how to calculate a crime rate and choose the appropriate denominator (Andresen, 2011). If one is interested in the spatial patterning of traffic collisions, the base consideration is not the theoretical distribution of a clustered phenomenon, but the underlying distribution of traffic overall: is the clustering of traffic collisions different than the overall clustering of traffic? If so, where? This has a number of applications within the social science, but also the health sciences, environmental studies, and urban planning.
Directions and considerations for the future
Thus far, the research on this spatial point pattern test has generated one methodological issue that must always be considered, a number of extensions to the spatial point pattern test and its software, and some theoretical implications for testing. Each is discussed in turn.
The methodological issue relates to the modifiable areal unit problem (Fotheringham and Wong, 1991; Openshaw, 1984b) and non-zero street segments. In the context of street segments, many social phenomena will have some degree of clustering. In Vancouver, there are just over 10,000 street segments that an analysis must consider. If the phenomenon under study is a relatively rare event, concentrations will necessarily appear in global statistics: a crime type that has 1000 events in a particular year will have a minimum level of concentration at 10% (1000 events divided by 10,000 street segments). In this situation, even if all of these criminal events move to a completely different set of locations within the city, the
As such, when considering the similarity of two spatial point patterns, one must be aware of this potential when analyzing data using a large number of small areal units of analysis. This is a particular form of the modifiable areal unit problem when the change in the
The first extension to the spatial point pattern test relates to the concept of the quasi-point introduced by Andresen (2010) in the context of international and interprovincial trade pattern changes. It is a relatively simple task to have the program that runs the spatial point pattern test to create quasi-points if the data are organized appropriately. For instance, if the data were organized such that areal unit of analysis
Second, the graphical user interface could be further developed to describe the differences and similarities between the two point patterns analyzed. For example, some brief descriptive statistics could be provided. The graphical user interface calculates the
Third, in case the standard distributions of uniform or random are of interest to the researcher, the graphical user interface could generate a random or uniform set of points within the study area (with an option for how many points). These points could then be used as the base data set to investigate how similar the spatial pattern of interest is to a random or uniform spatial pattern.
Fourth, further investigations into the threshold values of the
And fifth, a natural extension of the spatial point pattern test is to move beyond “bivariate” comparisons. At this time, the program for the spatial point pattern test is set up to compare two spatial point patterns. However, there is no reason why this test cannot be expanded into a multivariate context. In such a testing situation, one could find the degree of similarity across three or more spatial point patterns. Figure 4 shows four different possibilities for such a multivariate spatial point pattern test. The dashed horizontal line represents the base percentage of points for the test, and the vertical lines represent the confidence intervals for the various test data sets: comparing the spatial pattern of crime in 2000 (base) versus a number of subsequent years (test data sets). In these scenarios, the base percentage is always within the confidence intervals (a), initially within the confidence intervals but then the confidence intervals fall to zero representing the crime drop (b), sporadically similar for the base and subsequent test data sets (c), and never similar over time (d).

Multivariate spatial point pattern test.
Such a comparison would be particularly instructive for the crime and place literature that has been concerned with the stability of crime concentrations over time (Weisburd et al., 2012). Although pairs of years over time can be compared to investigate the stability (Andresen and Malleson, 2011), knowing whether places are consistently stable over time would be particularly instructive for the crime and place literature. As such, in a multivariate spatial context, this spatial point pattern test could investigate the stability of spatial crime patterns of 12–16 years, as investigated in the crime and place literature. Such an extension to the spatial point pattern test could prove to be important. It is possible, for example, that a spatial pattern may appear to be stable over time when undertaking pair-wise comparisons but be quite different when considering the entire study period. Such a situation could emerge when year-to-year patterns are similar but similar in different places in each pair-wise comparison. Only when considering the entire study period within a single test could spatial stability be truly confirmed or denied.
Finally, the development of different testing methodologies has implications for testing theories. For example, tests of theories within environmental criminology (place-based theories) often consider random and uniform distributions as the baseline. Are criminal events clustered, relative to a random distribution? However, we know that human activities, both criminal and non-criminal, are clustered in space. As such, testing to see whether a specific human activity is clustered is essentially tautological. A more sensible baseline distribution for the clustering of crime would be a representation of the road network. Road networks tend to have greater density in the central business district than the suburbs, so randomly placed points on a road network will appear clustered. A point-based representation of the street segments in a road network could be used as the base data set with crime data as the test data set to investigate any greater concentrations of crime in particular areas. This information could then be used when conducting theoretical testing, rather than solely relying upon census data.
Conclusion
This article has reviewed the development, applications, and future for a relatively recent spatial statistical test used in the social sciences. This is a spatial point pattern test that is nonparametric and finds the degree of similarity between two spatial point data sets. The output from this test is a global statistic of similarity, the
Although this spatial point pattern test is still in its infancy, its utility beyond identifying uniform, random, and clustered spatial point patterns has been shown here. Most human, social, and environmental processes are not random or uniform across space. Consequently, a test that allows for comparisons of spatial point patterns without a concern for these theoretical distributions is instructive for the social and environmental sciences.
There is still a lot of work to be done to make this spatial point pattern test more practical for uses on other contexts, but these are programming tasks with no need for further development of the test itself. As with the current graphical user interface (https://github.com/nickmalleson/spatialtest), future developments of the test will continue to remain freely available and open source. In addition to the flexibility enhancements discussed above, it is hoped that this new spatial point pattern test will soon be available as a library within the R Project for Statistical Computing (http://www.r-project.org/).
