We need a number of packages for this.
All our data files are in a folder called Data
at the same level as the current folder:
Datadir = "../Data" Datafile = function(f) { file.path(Datadir, f) }
You should make sure Datadir
is set to the place you’ve put the data files. If you have the tcltk
package, then you can use as.character(tcltk::tkchooseDirectory())
. Don’t put a trailing slash or backslash at the end of Datadir
or files might not be found.
Read The Street-level Crime Data
The crime data at the street level is supplied in a CSV file each month. We’ll read in one month and plot it. The coordinates are Ordnance Survey of Great Britain grid references, which is EPSG code 27700.
## Month Reported.by Falls.within Easting Northing Location Crime.type Context ## 1 2012-03 Cumbria Constabulary Cumbria Constabulary 351534 492353 On or near Dowker'S Lane Burglary NA ## 2 2012-03 Cumbria Constabulary Cumbria Constabulary 299210 528515 On or near Supermarket Burglary NA ## 3 2012-03 Cumbria Constabulary Cumbria Constabulary 351183 492741 On or near Serpentine Road Burglary NA ## 4 2012-03 Cumbria Constabulary Cumbria Constabulary 301277 524712 On or near Hallwood Road Burglary NA ## 5 2012-03 Cumbria Constabulary Cumbria Constabulary 299416 527779 On or near Sports/Recreation Area Burglary NA ## 6 2012-03 Cumbria Constabulary Cumbria Constabulary 361096 478662 On or near Chapel Lane Burglary NA
coordinates(streetCrime) = ~Easting + Northing proj4string(streetCrime) = CRS("+init=epsg:27700") plot(streetCrime) title("Street-level crime locations")
OSM Background
Can we do a nicer map using the OpenStreetMap
package? This gives us a function to download and print map tiles from Open Street Map, and other tile servers. These tiles use the Google Mercator coordinate system, so we have to transform our points to that. The osm()
function returns the CRS string for the tiles.
Using ggplot
If we add the coordinates back to the data frame we can use ggplot
to produce some nice plots:
library("ggplot2") streetCrime@data = cbind(streetCrime@data, coordinates(streetCrime)) ggplot(streetCrime@data) + geom_point(aes(x = Easting, y = Northing, colour = Crime.type)) + coord_equal() + opts(title = "Street Crime By Type")
Relative Rates
It is clear from the maps that there are more crimes in urban areas, because that’s where people live. What we really want to do is look at rates of crime relative to some baseline. We could use population if we had a good measure of baseline population, or we can compare one crime rate against all crimes.
The sparr
package provides functions for computing estimates of densities of point patterns using kernel smoothing. It has several methods for computing the kernel bandwidth, including adaptive methods. To use it we have to convert our points into an ppp
object from the spatstat
package. For the window containing the points we’ll use the convex hull. We’ll write a function that does the smoothing for a particular crime type. The function will convert the output from bivariate.density
into a raster
object. We’ll also read in some town locations and get the largest of them for some context when mapping.
## Warning message: RGL: unable to open X11 display
## Warning message: error in rgl_init
library("raster") coords = coordinates(streetCrime) cHull = as.matrix(coords[rev(chull(coords)), ]) data = ppp(x = coords[, 1], y = coords[, 2], window = owin(poly = cHull), marks = streetCrime$Crime.type)
## Warning message: data contain duplicated points
makeIntensity <- function(data, type) { d = bivariate.density(data = data, ID = type, intensity = TRUE, comment = FALSE, pilotH = diff(range(data$x))/10) raster(list(x = d$X, y = d$Y, z = d$Zm)) } allCrime = makeIntensity(data, NULL) asbos = makeIntensity(data, "Anti-social behaviour")
## Warning message: more than two distinct ID values detected in 'data'
drugs = makeIntensity(data, "Drugs")
## Warning message: more than two distinct ID values detected in 'data'
ukgaz = readOGR(Datadir, "ukgaz", verbose = FALSE) topTowns = ukgaz[order(ukgaz$pop, decreasing = TRUE)[1:8], ] colours = colorRampPalette(c("blue", "gray", "red"))(20)
ASBO ratio
The regional rate of ASBO crime to all crime is 0.5248
. We can see that Kendal seems higher and Workington lower than this average value.
Drugs Ratio
This shows almost the opposite pattern. The average drug crime ratio is 0.0333
, with Workington over double this, and Kendal much lower.
These kind of ‘heat maps’ are aften used to show crime maps, house fires, or other point-based data, but without reference to some kind of background population as a denominator they can’t display meaningful risk. The sparr
package has functions for computing relative risk with error calculations. Read the package documentation for more information.
Police Neighbourhood Areas
UK crime data is organised by areas known as neighbourhoods. The boundaries of the neighbourhoods are available as a shapefile. The coordinates are lat-long, so we convert to the same CRS as the street crime data. In this section we will aggregate the cases to neighbourhood level.
neighbourhoods = spTransform(readOGR(Datadir, "neighbourhoods"), CRS(proj4string(streetCrime)))
## OGR data source with driver: ESRI Shapefile ## Source: "../Data", layer: "neighbourhoods" ## with 46 features and 3 fields ## Feature type: wkbPolygon with 2 dimensions
Topological Colouring
Plain regions like in the previous map never look too interesting, but colouring everything one colour isn’t much better. Can we colour regions so that two regions with a common boundary don’t have the same colour? The theory tells us that we only need at most four colours to do this for a planar map. However, the algorithm to do that is very hard, and our map might not be strictly planar, with islands and holes here and there.
The following function uses poly2nb
from the spdep
package to compute the adjacency of regions, and then executes a ‘greedy’ algorithm. For each region it finds the smallest colour not present in any of its neighbours. If all the colours are currently being used by its neighbours then it has to start using a new one.
library("spdep") library("RColorBrewer") colourmap <- function(polys, snap) { colouring = list() length(colouring) = length(polys) nbl = poly2nb(polys, snap = snap) nextFreeColour = 1 for (node in 1:length(nbl)) { nbrColours = unlist(colouring[nbl[[node]]]) freeColour = min(setdiff((1:nextFreeColour), nbrColours)) colouring[[node]] = freeColour if (freeColour == nextFreeColour) { nextFreeColour = nextFreeColour + 1 } } unlist(colouring) } colouring = colourmap(neighbourhoods, snap = 100) palette(brewer.pal(max(colouring), "Set3")) plot(neighbourhoods, col = colouring) title("Topology colouring")
This map shouldn’t have any two adjacent regions with the same colour. The snap
had to be adjusted to make this work, however,because the map has some problems…
Digitizing Problems
Unfortunately the borders are very poorly digitized, and the edges between neighbourhoods do not match up very well. You can see this by zooming in on Carlisle and plotting with semi-transparent colours. This shows up the overlaps and the gaps.
Point overlays
We can use the over
function to count the number of crimes in each neighbourhood. This function does various things depending on the exact class of object passed to it. It is also fussy about making sure its arguments have the same coordinate system.
Because we know that our polygons aren’t well constructed, we’ll return the list of polygons that each point is in. We can then count how many by computing the length of the list element. We’ll then plot Carlisle and show those points that are not over exactly one region.
crimeN = over(SpatialPoints(streetCrime), SpatialPolygons(neighbourhoods@polygons), returnList = TRUE) nPoly = unlist(lapply(crimeN, length)) table(nPoly)
## nPoly ## 0 1 2 ## 6 4287 57
plot(neighbourhoods, xlim = c(336000, 344000), ylim = c(552000, 560000), col = rainbow(10, alpha = 0.5), border = NA) plot(streetCrime, cex = 0.25, pch = 19, add = TRUE, col = "black") plot(streetCrime[nPoly > 1, ], add = TRUE, pch = 19, col = "red") plot(streetCrime[nPoly < 1, ], add = TRUE, pch = 19, col = "blue") title("Points not in a single polygon")
Points in polygons
Conversely we can overlay polygons on points and get a list of the same length as the number of polygons. Each element contains the indices of points in that polygon. This way we will double-count points that are in more than one polygon, and drop those outside all the polygons.
crimeL = over(SpatialPolygons(neighbourhoods@polygons), SpatialPoints(streetCrime), returnList = TRUE) # how many crimes per area unlist(lapply(crimeL, length))
## [1] 399 129 120 93 66 100 33 31 86 145 156 244 54 62 62 47 35 126 90 115 74 205 41 65 270 122 190 40 92 127 93 81 ## [33] 7 38 43 53 32 21 24 152 33 142 195 20 38 10
We can then assign this vector to the neighbourhood object and use it to produce some maps. The default colours in spplot
aren’t particularly good, so we define a new palette from gray to red and set that.
colours2 = colorRampPalette(c("gray90", "red"))(20) sp.theme(set = TRUE, regions = list(col = colours2)) neighbourhoods$TotalCrime = unlist(lapply(crimeL, length)) neighbourhoods$TotalCrimeRateK = 1000 * neighbourhoods$TotalCrime/neighbourhoods$Population spplot(neighbourhoods, "TotalCrimeRateK", main = "Crimes/1000ppl/month from street data")
neighbourhoods@data[order(neighbourhoods$TotalCrimeRateK, decreasing = TRUE)[1:5], c("Name", "Population", "TotalCrimeRateK")]
## Name Population TotalCrimeRateK ## 24 Carlisle City Centre 2042 132.22 ## 23 Botcherby And Durranhill 1334 48.73 ## 0 Barrow Central 13706 29.11 ## 26 St. Aidans 6813 27.89 ## 31 Raffles / Newtown / Willowholme 5172 15.66
You’ll notice that most of the county is gray – this is because the distribution is very skew. Try taking a log-transform and mapping that.
Area Data
The monthly area crime data is supplied in a CSV file. We can read this in, join it to the polygons by matching the ID and Neighbourhood columns from each object. The map from this data looks broadly similar to the data from the street-level file.
## [1] 46 15
monthCrime[1:10, c("Neighbourhood", "All.crime.and.ASB")]
## Neighbourhood All.crime.and.ASB ## 1 GARS07 33 ## 2 GARS06 99 ## 3 GARS05 66 ## 4 GARS04 88 ## 5 GARS03 119 ## 6 GARS02 125 ## 7 GARS01 407 ## 8 GARY01 7 ## 9 GARS09 85 ## 10 GARS08 30
neighbourhoods$MonthlyCrime = monthCrime[match(neighbourhoods$ID, monthCrime$Neighbourhood), "All.crime.and.ASB"] neighbourhoods$MonthlyCrimeRate = 1000 * neighbourhoods$MonthlyCrime/neighbourhoods$Population sp.theme(set = TRUE, regions = list(col = colours2)) spplot(neighbourhoods, "MonthlyCrimeRate", main = "Crimes/1000ppl/month from monthly data")
Land Use/Cover Data
The land use data comes from the EU Corine Land Cover data set. This uses a coordinate system designed for use across Europe. We’ll transform our neighbourhoods to this system, and plot them, using two colours for contrast. The land use is a GeoTIFF file, and it comes with a text file describing the land use categories for creating legends.
The land use classification is hierarchical at three levels. We can read in the legend file and display the top level classes.
Then each of those levels divides down into subcategories. For example, the agricultural areas divides into:
unique(legend$LABEL2[legend$LABEL1 == "Agricultural areas"])
## [1] "Arable land" "Permanent crops" "Pastures" ## [4] "Heterogeneous agricultural areas"
and then those classifications are further subdivided, for example:
unique(legend$LABEL3[legend$LABEL2 == "Heterogeneous agricultural areas"])
## [1] "Annual crops associated with permanent crops" ## [2] "Complex cultivation patterns" ## [3] "Land principally occupied by agriculture, with significant areas of natural vegetation" ## [4] "Agro-forestry areas"
We’ll now read in the land use file itself and plot it with the raster
package.
landUse = raster(Datafile("cumbriaLandUse.tif")) plot(landUse) neighEU = spTransform(neighbourhoods, CRS(projection(landUse))) plot(neighEU, add = TRUE, border = "white", lwd = 3) plot(neighEU, add = TRUE, border = "black", lwd = 1)
The colours correspond to the land use categories of LABEL3
above.
Extracting Values
The extract
function is the workhorse for getting information out of rasters. We will get the land use code for each of our street crime points and add the information to the object. We use as.character
in order to look up from the row names of the legend data frame.
luCrime = unlist(extract(landUse, spTransform(streetCrime, CRS(projection(landUse))))) luCrime[luCrime == "0"] <- NA streetCrime$landUse = luCrime streetCrime$landUseCode3 = legend[as.character(luCrime), "LABEL3"] streetCrime$landUseCode2 = legend[as.character(luCrime), "LABEL2"] streetCrime$landUseCode1 = legend[as.character(luCrime), "LABEL1"] plot(streetCrime, col = legend[streetCrime$landUse, "Colour"], pch = 19) title("Street level crime by land use")
We can also use ggplot
to produce maps by crime type, coloured by broad land use code
ggplot(streetCrime@data) + geom_point(aes(x = Easting, y = Northing, col = landUseCode1)) + coord_equal() + facet_wrap(~Crime.type)
Land Use Classification
Some neighbourhoods are clearly more rural than others. By considering the pixels that each polygon covers we can come up with a simple classification scheme.
Polygon-Raster Overlay
The extract
function considers all polygons in a Spatial Polygons Data Frame and returns a list where the elements are the pixels in each polygon. We’ll then process this list to get a table of land use categories for each neighbourhood.
The lutable
function first removes any pixels in the ‘missing’ category, and then computes the fraction of the pixels in each category. We then make the list into a matrix with row and column names so we can then do a stacked barplot. Note there’s not enough room for all the labels, so only a few are shown.
We can show the legend in a separate plot.
This shows our regions are either mostly red (urban) or yellow/green (rural). For our simple classification scheme we’ll look at the number of pixels in the region that are in the broad classes defined in LABEL2
, and take the class with the most pixels. The classify
function here takes as its argument a vector of land use class codes and returns the label of the most common value of LABEL2
for those codes. We’ll then run that on the list of pixel values to get a vector classification that we’ll add to the spatial polygons.
classify <- function(x) { ### x is vector of raster values ## remove missing values inR = x[x != 0] tableValues = sort(table(legend[as.character(inR), "LABEL2"]), decreasing = TRUE) return(names(tableValues)[1]) } # example classify(c(1, 1, 1, 22, 22, 43, 22, 22))
## [1] "Heterogeneous agricultural areas"
Now we’ll plot the neighbourhoods with the new classes. To get colours for each of the classes we’ll average the colours for the levels in the class:
meanColour <- function(cs) { # convert colours to red-green-blue, and average components rgbc = apply(col2rgb(as.character(cs)), 1, mean) # convert r-g-b back to colours rgb(rgbc[1], rgbc[2], rgbc[3], max = 256) } # use tapply to compute mean colours within groups defined by LABEL2 colourTable = tapply(as.character(legend$Colour), as.character(legend$LABEL2), meanColour) colours3 = colourTable[levels(neighEU$Class)] sp.theme(set = TRUE, regions = list(col = colours3)) spplot(neighEU, "Class", main = "Classification system")
This map shows us a possible classification of urban and two types of rural neighbourhoods. Notice that the regions on the west coast, although they have some urban content, have more rural land area. The one industrial area in the south contains the shipyards at Barrow.
Crimes and Roads
In this section we’ll look at crime locations and roads. We have a shapefile of roads with classification in lat-long coordinates which we’ll convert to Ordnance Survey grid reference so we can plot thestreetCrime
points on it. We’ll also define some colours and widths so the plot looks a bit like the OpenStreetMap maps.
mainroads = spTransform(readOGR(Datadir, "mainroads"), CRS("+init=epsg:27700"))
## OGR data source with driver: ESRI Shapefile ## Source: "../Data", layer: "mainroads" ## with 3486 features and 3 fields ## Feature type: wkbLineString with 2 dimensions
summary(mainroads$type)
## motorway motorway_link primary primary_link road secondary secondary_link tertiary ## 198 75 610 9 193 629 3 945 ## tertiary_link trunk trunk_link ## 11 757 56
Next we can compute the distance from each point to the nearest point on the major roads of the road network. First we extract the motorways, trunk roads, and primary roads, then work with that.
We can use ggplot
to produce some graphics showing how the road distance varies for different crimes. A box plot shows the outliers well.
x45 = opts(axis.text.x = theme_text(angle = -45, hjust = 0, vjust = 1)) ggplot(streetCrime@data, aes(x = Crime.type, y = dRoad)) + x45 + geom_boxplot()
What can we get from this? Perhaps only that shoplifting happens close to major roads, but thats because shops tend to be close to major roads.
Space-Time Regional Data
We also have a data file of the crime counts in neighbourhoods for a number of months (We would have more months of data but the neighbourhood boundaries changed, so we can only go back so far). The file has one for each month for each neighbourhood, and lists the number of crimes in each category. This form is ideal for feeding into ggplot
to get a quick time-series plot.
## Month Force Neighbourhood All.crime.and.ASB Burglary ## 1 2011-09 Cumbria Constabulary GARS07 41 3 ## 2 2011-09 Cumbria Constabulary GARS06 141 3 ## 3 2011-09 Cumbria Constabulary GARS05 56 0 ## 4 2011-09 Cumbria Constabulary GARS04 82 2 ## 5 2011-09 Cumbria Constabulary GARS03 119 4 ## 6 2011-09 Cumbria Constabulary GARS02 90 2
To plot maps of criminal damage and arson over time, we’ll reformat the data into a matrix of neighbourhood by month and attach it to the neighbourhood map data object. We have to do some work with the months to get them into something that can be used in a formula for the spplot
function. We’ll just plot the first four months.
library("reshape2") criminalDamage = dcast(allCrime, Neighbourhood ~ Month, value.var = "Criminal.damage.and.Arson") months = names(criminalDamage)[-1] names(criminalDamage)[-1] = format(as.Date(sprintf("%s-01", months)), "%b.%Y") criminalDamage = criminalDamage[match(neighbourhoods$ID, criminalDamage$Neighbourhood), ] neighbourhoods@data = cbind(neighbourhoods@data, criminalDamage) names(neighbourhoods) <- make.names(names(neighbourhoods)) sp.theme(set = TRUE, regions = list(col = colours2, lwd = 2)) spplot(neighbourhoods, c("Sep.2011", "Oct.2011", "Nov.2011", "Dec.2011"), main = "Criminal Damage")
Static plots like that are fine for publications in dead-tree format, but for online work we can use technology to make animations.
Animation
The animation
package provides functions for creating frame-based animations. All you need to do is write the code that produces a plot as each frame of the animation.
library("animation") months = names(criminalDamage)[-1] ani.options(verbose = FALSE, outdir = getwd()) saveHTML({ ani.options(interval = 0.5, verbose = FALSE) for (month in months) { sp.theme(set = TRUE, regions = list(col = colours2, lwd = 2)) print(spplot(neighbourhoods, month, at = seq(0, 55, len = 20), main = month)) ani.pause() } }, img.name = "crime_ani", single.opts = "'utf8': false", autoplay = FALSE, interval = 0.5, imgdir = "crime_dir", htmlfile = "crimeAni.html", ani.height = 400, ani.width = 600, title = "Crime animation")

Lessons Learnt
- Creating spatial point data sets from scratch
- Making maps with base graphics,
ggplot
, andspplot
- Kernel Smoothing with
package:sparr
- Reading in vector data with
readOGR
- Reading in raster data with
raster
- Analysing raster data with
extract
- Transforming coordinates with
spTransform
- Making animations with the
animation
package