The scenario is that a pollution release from a research plant on the coast has resulted in a pollution plume over the county. Scientists have a model for the pollution density, and want to know which towns will be exposed to a level of pollution over 9000ng (nanograms) per square metre.
We need a number of packages for this.
We’ll compute a colour palette for plots, and set it as default for sp graphics:
library("RColorBrewer") colours = colorRampPalette(brewer.pal(7, "OrRd")[-1])(50) sp.theme(set = TRUE, regions = list(col = colours))
All our data files are in a folder called Data
at the same level as the current folder:
Datadir = "../Data" Datafile = function(f) { file.path(Datadir, f) }
You should make sure Datadir
is set to the place you’ve put the data files. If you have the tcltk
package, then you can use as.character(tcltk::tkchooseDirectory())
. Don’t put a trailing slash or backslash at the end of Datadir
or files might not be found.
Plume Model
We will use a simple model for the spread of pollution. The total pollution at a location at distance d and angle θ from the point source is given by:
Where α is the pollution level at distance zero, β is a distance scale factor, ϕ is the major angle of the pollution plume, and κ is the eccentricity. If κ is 0 then the plume becomes a circularly symmetric exponential decay, and as κ increases the plume becomes more stretched in the direction of the ϕ parameter. You can consider κ and ϕ as the wind strength and direction respectively.
We can create a plume function that computes the pollution for a given source src
at a number of destinations dst
given a set of parameters:
pfunc = function(d2, ang, a, b, k, phi) { ## plume value for distance-squared d2 a * exp(-(d2 * exp(-k * cos(ang - phi))^2/b^2)) } plume <- function(src, dst, a, b, k, phi) { ## plume value for src at dst src = coordinates(src) dst = coordinates(dst) d2 = (dst[, 1] - src[, 1])^2 + (dst[, 2] - src[, 2])^2 ang = atan2(dst[, 2] - src[, 2], dst[, 1] - src[, 1]) pfunc(d2, ang, a, b, k, phi) }
Load Data
We will use three data sets – the location of the pollution source, the location and population of towns from a UK gazetteer source, and a polygon boundary of Cumbria obtained from the gadm
web site.
Two of the datasets are in lat-long format, so we use spTransform
to convert them to Ordnance Survey grid references in metres with a little wrapper function. We then make a quick map with the source as a red dot and the towns as crosses.
toOSGB = function(s) { spTransform(s, CRS("+init=epsg:27700")) } cumbria = toOSGB(readOGR(Datadir, "cumbria", verbose = FALSE)) nuclear = toOSGB(readOGR(Datadir, "nuclear", verbose = FALSE)) ukgaz = readOGR(Datadir, "ukgaz", verbose = FALSE) plot(cumbria) plot(nuclear, add = TRUE, pch = 19, col = "red") plot(ukgaz, add = TRUE) title("Base map")
Compute the Plume
To map the plume we create a 500×500 raster that spans the polygon and then fill it with values computed by the plume
function. The call to SpatialPoints(plevel)
creates a set of coordinates of all the grid cells at which to compute the plume values. The plume parameters have been given to us by some scientists who tell us that’s what they are and that’s what we should use. You will notice that the wind has been blowing at an angle of π/4, for example. We plot the plume and the other data on a map.
plevel = raster(extent(cumbria), nrows = 500, ncols = 500) plevel[] = plume(nuclear, SpatialPoints(plevel), 12000, 400, 5, pi/4) projection(plevel) = proj4string(cumbria) plot(plevel, col = colours) plot(cumbria, add = TRUE) plot(nuclear, add = TRUE, pch = 19, col = "red") plot(ukgaz, add = TRUE) title("Plume Spread")
Urban Exposure
We could compute the exposure at each town by working out the distance and angle of each town from the source and calling the plume function. Instead we’ll use the extract
function to sample from the raster grid at the town locations. Then we can extract the towns that have an exposure over 9000ng/m2 and put them in a table.
ukgaz$exposure = extract(plevel, ukgaz) over9000 = ukgaz$exposure > 9000 dangerTowns = data.frame(ukgaz@data[over9000, c("Name", "pop", "exposure")]) dangerTowns
Name pop exposure 4 Keswick 5508 9063 28 Borrowdale 438 9355 43 Lorton 250 9158 46 Loweswater 209 10350 50 Buttermere 127 10713 97 Gosforth 1230 11135 100 Lamplugh 763 9323 109 Ennerdale Bridge 240 10536 110 Haile 237 10921 112 Ponsonby 92 11961
River Exposure
The authorities want to know which rivers have been exposed to a level over 9000ng/m2 of pollution. To compute this we will work out the 9000ng/m2 contour from the plume and intersect it with a river dataset. The following functions compute the polygonal contour of the plume function given a value.
invpfunc <- function(ang, a, b, k, phi, f) { ### inverse plume function - compute distance given f d2 = -log(f/a) * b^2/exp(-k * cos(ang - phi))^2 sqrt(d2) } contourPlume <- function(src, a, b, k, phi, f) { ### return a SpatialPolygon of a plume function at value f ### by sweeping round 360 degrees and computing the distance ang = seq(0, 2 * pi, len = 360) d = invpfunc(ang, a, b, k, phi, f = f) xy = coordinates(src) polys = SpatialPolygons(list(Polygons(list(Polygon(cbind(xy[, 1] + d * sin(ang), xy[, 2] + d * cos(ang)))), ID = 1))) proj4string(polys) = proj4string(src) polys }
First we’ll load the rivers and lakes data, then compute the 9000 contour and intersect it with the rivers.
rivers = toOSGB(readOGR(Datadir, "rivers", verbose = FALSE)) lakes = toOSGB(readOGR(Datadir, "naturalwater", verbose = FALSE)) nineK = contourPlume(nuclear, 12000, 400, 5, pi/4, 9000) rivers9k = gIntersection(nineK, rivers, byid = TRUE) plot(rivers9k, col = "blue", lwd = 3) plot(rivers, col = "blue", lty = 2, add = TRUE) plot(nineK, add = TRUE) plot(lakes, add = TRUE, col = "blue") title("River sections with >9000 exposure")
Lake Exposure
This region of England contains not only its highest mountain but also its deepest lake. We want to compute the total pollution that lands on the lakes. For this we use polygons from the natural water dataset.
The extract
function can now be used to sum the plume within polygons. We multiply by the grid cell area in metres and divide by 109 to get the total pollution in grams that has gone into the lake.
We can then plot the ten most polluted lakes by total pollution amount. Notice the use of the RColorBrewer
and lattice
packages to modify the default behaviour of the spplot
function. We add sp.text
to the panel function to label the lakes – the default label position sits right over the feature so we adjust its position slightly.
library("lattice") library("RColorBrewer") top10 = order(lakes$pollution, decreasing = TRUE)[1:10] spplot(lakes[top10, ], "pollution", panel = function(...) { panel.polygonsplot(...) sp.text(coordinates(lakes[top10, ]), lakes$name[top10], adj = 0) }, main = "Lake exposure")
lakes@data[top10, ]
## osm_id name type pollution ## 34 4580520 Derwent Water water 49.000 ## 35 4580524 Bassenthwaite Lake water 45.264 ## 44 4582862 Ullswater water 35.408 ## 39 4580535 Ennerdale Water water 32.499 ## 37 4580533 Crummock Water water 27.251 ## 40 4580536 Wast Water water 23.733 ## 38 4580534 Buttermere water 10.498 ## 36 4580532 Loweswater water 5.870 ## 46 4583133 Floutern Tarn water 2.703 ## 60 4583896 Greendale Tarn water 2.053
Simulating Sensor Samples
In this section we will briefly ‘break the fourth wall’ and talk about how we simulate the sample measurements for the next part.
Since there hasn’t really been a major pollution incident, we will have to simulate the results of sensor sampling. We could just take the values from the plevel
raster, but that would be exact, and statistics is never exact. We could add some independent random noise to the plume raster data for each sensor but that’s not very satisfactory either. That would be equivalent to saying our measurement process is very poor – and if we took the reading again at the same point we’d get a different value. This is true in reality – no instrument has infinite precision – but it shouldn’t swamp the rest of the variability.
What we’ll do instead is create a smoothly varying raster that is the value of a measurement taken at that point. The reason this varies from the plume value could be topology, or land cover, or rainfall, or some other spatially-continuous process. We could also add further independent random noise to these numbers to represent the imprecision of the device.
This function creates a number of points over a grid and then does a crude smoothing using the idw
function (of which more later). This is then added to the input raster to create a smoothly varying version of the input. The various parameters can change aspects of the output, but have been set at useful values for the case at hand. After this section of code, the values in raster sensorValues
will be the reading of a sensor placed at those locations.
library("automap") noisyRaster <- function(plevel, crs, npts = 300, idp = 4, a = 3000) { pts = spsample(SpatialPoints(plevel), npts, type = "random") pts = SpatialPointsDataFrame(pts, data.frame(Z = rnorm(npts, 0, 1))) proj4string(pts) = CRS(proj4string(ukgaz)) fk = idw(Z ~ 1, pts, SpatialPoints(plevel, proj4string = crs), idp = idp) fkp = plevel fkp[] = fk$var1.pred pmax = max(values(plevel)) return(fkp * a * (plevel/pmax) + plevel) } sensorValues = noisyRaster(plevel, CRS(proj4string(ukgaz)))
## [inverse distance weighted interpolation]
Now we can take the coordinates of the villages, sample from the simulated sensor values at those locations, create a SpatialPolygonsDataFrame
and save it to disk:
Great! Now, forget you did all that, and pretend that someone has gone to every village, taken a sensor reading, and put it all into a Shapefile for you. Read on…
Sampling Stations
At each town and village location a sampling system was set up to measure the pollution. The data was supplied as a point Shapefile with the measurements in an attribute called sample
. We’ll read this in and plot it.
Inverse Distance Weighting
The automap
package has some functions for producing smoothed maps from point-sampled data. Perhaps the simplest method is inverse distance weighting. In this, the value at any point is a weighted sum of the sampled values, where the weight is inversely related to the distance, so that further points have little influence.
The idw
function performs inverse distance weighting, and a few more lines convert the predictions to a raster data set.
library("automap") rIDW = raster(extent(cumbria), ncol = 100, nrow = 100, crs = projection(samples)) sampleFitIDW = idw(sample ~ 1, samples, SpatialPoints(rIDW, proj4string = CRS(proj4string(samples))))
## [inverse distance weighted interpolation]
This looks rather spotty. We could try adjusting some of the parameters or we could use a more statistically principled method.
Kriging
Kriging can be thought of as similar to inverse distance weighting, but the effect of distance is derived from inspecting the relationship between sample points and their separation. This enables the construction of a “variogram”, and from this both an estimate and a variance of the sample measurement can be made over a grid.
The autoKrige
function returns and object that contains the prediction estimates and variances, and the variogram parameters. We first set up an empty rasters, rKrige
to define the prediction locations, and call the autoKrige
function. Plotting the autoKrige
object shows the prediction, the standard error, and the variogram. The variogram is usually an increasing function with distance, showing that points further apart are less likely to be similar. Fitting variograms can be a fine art, with sometimes crucial selections of function parameters, ranges, and binning.
Exceedence Mapping
Because kriging gives us a prediction and standard error, we can make meaningful statements about the predictions. For example, we can compute the probability that the underlying value is over 9000 units by using the pnorm
function with the given means and standard deviations.
Then we can map areas that are definitely (p>0.95) above 9000 in red, those that are definitely (p<0.05) below 9000 in blue, and the rest of the area in gray, showing this is a `gray area’ where we cannot decide with confidence if the value is over 9000 units.
First we’ll create two functions to compute these exceedences as a raster and to plot them. We’ll code significantly low as zero, significantly high as two, and intermediate as one.
pOver = function(thresh, krigeFit) { ### convert kriging output to exceedence ### pnorm computes probabilities from a Normal/Gaussian distribution pValue = pnorm(thresh, krigeFit$krige_output$var1.pred, krigeFit$krige_output$var1.stdev, lower.tail = FALSE) ### create a raster object from the kriging coordinates and the probability pExceed = rasterFromXYZ(cbind(coordinates(krigeFit$krige_output), (pValue > 0.05) + (pValue > 0.95))) attr(pExceed, "thresh") = thresh return(pExceed) } eColours = c("#4040FF", "grey", "red") plotExceed = function(pe, overlay, region) { ### plot the exceedence with a couple of additional overlays thresh = attr(pe, "thresh") plot(pe, col = eColours, legend = FALSE) legend("topright", c(paste("Under ", thresh), "Gray Area", paste("Over ", thresh)), fill = eColours, bg = "white") plot(overlay, add = TRUE) plot(region, add = TRUE) }
Now we’ll compute the exceedence map from the kriging and plot it.
pK = pOver(9000, sampleFit) plotExceed(pK, nineK, cumbria) title("Exceedence Map")
Additional Sampling Locations
Now we have been given the capability to deploy 25 new sensors. Where do we put them? There’s little point putting them in the areas where we are certain of the values, so we’ll put them within the convex hull of the grey area from the kriging. The spsample
function can generate simple point sampling schemes, including random, gridded and the one we’ll use here, hexagonal. The hexagonal scheme might not generate exactly 25 sensor points, so we’ll add in some extra ones randomly to make up the numbers.
# grey cells are == 1 greyPts = SpatialPoints(xyFromCell(pK, 1:length(pK))[values(pK) == 1 & !is.na(values(pK)), ]) ch = chull(coordinates(greyPts)) ch = c(ch[length(ch)], ch) nNew = 25 newPts = spsample(Polygon(coordinates(greyPts[ch, ])), nNew, "hexagonal") extra = nNew - length(newPts) if (extra > 0) { more = spsample(Polygon(coordinates(greyPts[ch, ])), extra, "random") newPts = rbind(newPts, more) } plot(pK, col = eColours, legend = FALSE) points(newPts, pch = 19) title("New Sensor Locations")
Faking The Sensor Measurements
Now we have to fake the sensor measurements by sampling from that raster we created.
newPts2 = SpatialPointsDataFrame(newPts, data.frame(sample = extract(sensorValues, newPts))) proj4string(newPts2) = CRS(proj4string(samples)) writeOGR(newPts2, Datadir, "sensors", "ESRI Shapefile", overwrite = TRUE) rm(newPts2)
Now we have a Shapefile of simulated measurements at the new sensor locations that we can pretend someone gave us. Read on…
Kriging With The New Data
The new sensor data has been supplied to us in a Shapefile. We’ll read this in and merge it with the original settlement measurements. We run the kriging function again. We then re-use our functions to compute and plot the exceedence.
sensors = readOGR(Datadir, "sensors")
## OGR data source with driver: ESRI Shapefile ## Source: "../Data", layer: "sensors" ## with 25 features and 1 fields ## Feature type: wkbPoint with 2 dimensions
newData = rbind(sensors[, "sample"], samples[, "sample"]) sampleFit2 = autoKrige(sample ~ 1, newData)
## Warning message: Removed 1 duplicate observation(s) in input_data:
## coordinates sample ## 118 (301671, 507141) 3414 ## 129 (301671, 507141) 3414 ## [using ordinary kriging]
pK2 = pOver(9000, sampleFit2) plotExceed(pK2, nineK, cumbria) title("Exceedence Map")
From this second map you should be able to see the grey area is smaller – we can create a table of pixel count in each category for the two exceedence maps which shows this change:
Further Ideas
Instead of getting the sensor locations from the spsample
function, you could try manually placing the sensors using the locator()
function. Another possibility might be that sensors can only be placed on main roads – spsample
can take points from lines too!
Lessons Learnt
- Reading vector data with
readOGR()
- Reading raster data with
raster()
- Transforming vectors with
spTransform()
- Mapping with
plot()
andspplot()
- Point- and area-sampling rasters with
extract()
- Vector-vector intersection with
gIntersection()
- Simple kriging with the
automap
package - Converting kriged output to raster format
Coda
Because we simulated the sensor measurements, we can actually plot them. This is the underlying surface that the kriging was predicting:
plot(sensorValues)