-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathapplication.Rmd
363 lines (313 loc) · 16.5 KB
/
application.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
---
title: "Application: evaluating variation in connected bikeability against labour market 'demand'"
output:
github_document
---
## Introduction
This document describes code that underpins an applied data analysis of our connected bikeability scores.
Please cite:
Beecham, R., Yang, Y., Tait, C. and Lovelace, R. (2023) "Connected bikeability in London: which localities are better connected by bike and does this matter?", *Environment & Planning B: Urban Analytics and City Science*. DOI: [10.1177/23998083231165122](https://doi.org/10.1177/23998083231165122).
## Setup
### Required libraries
Required packages can be installed individually with `install.packages(<package_name>)`. Core packages are imported into the session with `library(<package_name>)`. Occasional use of packages is made with the `<package-name>::<function-name>()` syntax so as to avoid polluting the workspace.
```{r, load-packages, eval=FALSE}
pkgs <- c("tidyverse","sf", "here", "rmapshaper")
# If not already installed.
# install.packages(pkgs)
# Core packages
library(tidyverse)
library(sf)
# ggplot theme for paper
source(here("code","theme_paper.R"))
```
## Load Census OD data
The data analysis aims to explore how well London's cycle facilities, and the trips it supports as quantified by connected bikeability, meet the needs of its occupants. To do this we use 2011 Census OD commute data, disaggregating OD commutes according to occupation type. Data are released at middle layer super output area level (MSOA) -- [UK Data Service](https://www.ukdataservice.ac.uk) Table WU07AUK. Additionally, we load a boundary file for MSOAs from [ONS Open Geography](https://geoportal.statistics.gov.uk/search?collection=Dataset&sort=name&tags=all(BDY_MSOA%2CDEC_2011)).
```{r, load-census, eval=FALSE}
# Origin-Desination data by occupation published via UK Data Service.
commute_data <- read_csv(here("data", "wicid_output_occupation.csv"))
# MSOA boundary data (from https://geoportal.statistics.gov.uk/s)
# Simplified using rmapshaper
msoa_boundaries <- st_read(here("data", "msoa_boundaries.geojson"))
# Filter on all MSOAs within London.
region_boundaries <- st_read(here("data", "regions.geojson")) |>
filter(RGN20NM=="London")
temp <- msoa_boundaries |> st_filter(region_boundaries %>% select(RGN20NM))
# Not very simplified as City of London in so small.
temp <- temp |> rmapshaper::ms_simplify(keep=.1)
# Cast to OSGB.
temp <- temp %>% st_transform(crs=27700)
```
## Match MSOAs with bikeshare villages
![MSOAs and bikeshare villages](./figs/geogs.svg)
Next we identify all MSOAs that intersect with the LCHS boundary and filter the commute data on these MSOAs, both origin and destination.
```{r, match-msoas, eval=FALSE}
# Read in villages.
grid_real_sf <- st_read(here("data","grid_real_sf.geojson")) |> unique()
# Buffer around bikeshare scheme villages.
temp_buffer <- grid_real_sf |> filter(type=="real") |>
st_buffer(dist=0, .predictate=st_intersects) |>
summarise()
temp_filtered <- temp |> st_filter(temp_buffer, .predictate=st_intersects())
# Plot msoa and village geometries.
ggplot() +
geom_sf(data=temp_filtered, colour="#737373", size=.1, fill="transparent") +
geom_sf(data=temp_buffer, colour="#252525", fill="#67000d", size=0, alpha=.2)+
geom_sf(data=grid_real_sf %>% filter(type=="real"), colour="#252525", fill="#d9d9d9", size=.2, alpha=.4)+
labs(title="MSOAs allocated to villages")
# Filter all Census msoas with origins *and* destinations within the buffer.
commute_data_filtered <- commute_data |> rowwise() |>
mutate(
all_jobs=sum(all),
prof_jobs=sum(`1_managers_senior`,`2_professional`, `3_associate_professional`, na.rm=TRUE),
non_prof_jobs=all-prof_jobs
) |> ungroup() |>
select(origin_msoa, destination_msoa, prof_jobs, non_prof_jobs) |>
filter( origin_msoa %in% (temp_filtered |> pull(MSOA11CD)) & destination_msoa %in% (temp_filtered |> pull(MSOA11CD)) )
```
## Assign individual commutes to villages
The two geographies, MSOAs and villages, intersect in many different ways and cannot be easily reconciled. Our solution is to generate individual records for each commute -- to detach commutes from their aggregated MSOA-MSOA geography. For each observation (commuter record) we generate estimated point locations by random spatial sampling within the polygon area of that commute’s origin and destination MSOA. These estimated point locations are then used to assign commute origin and destination locations to the bikeshare villages in which they are contained.
We define a function for the random spatial sampling, using [`st_sample`](https://r-spatial.github.io/sf/reference/st_sample.html).
```{r, geo-sample, eval=FALSE}
# Random spatial sample within polygon.
# geo An sf MULTIPOLYGON object.
# n Desired sample size.
# Returns a tidy data frame (tibble) with ST_POINT geometry defining point locations.
geo_sample <- function(geo, n) {
return(
sf::st_sample(x=geo, size=n) |>
st_coordinates()
)
}
```
We then generate a large set ($n=2000$) of sampled points for each MSOA, held in `sampled_msoas`.
```{r, sample-msoas, eval=FALSE}
# For quick searching, generate sampled point locations for each MSOA.
# Resample from these locations to then generate origin and destination points
# for each commute.
sampled_msoas <- temp_filtered %>% select(msoa=MSOA11CD) |>
nest(data=-c(msoa)) |>
mutate( sampled_points=map(data,~geo_sample(geo=.x,n=2000) |> as_tibble(.name_repair=~c("east", "north"))) ) |>
unnest(-data) |> select(-data)
```
For each commute we then search in `sampled_msoas` to attach point locations -- and do this separately for professional and non-professional commutes. This is achieved by taking each MSOA-MSOA OD pair and sampling MSOA point locations by origin and destination according to the commute count (`non_prof_jobs` / `prof_jobs`) of that OD pair. Caution: may take c.15 mins to execute.
```{r, sample-commutes, eval=FALSE}
# Generate points for non-profs.
start <- Sys.time()
non_prof_points <- commute_data_filtered |> mutate(od_pair=paste0(origin_msoa,"-",destination_msoa)) |>
filter(non_prof_jobs>0) |>
nest(data=-c(od_pair)) |>
mutate(o_non_prof=
map( data, ~sample_n(sampled_msoas |> filter(msoa==.x |> pull(origin_msoa)), size=.x |> pull(non_prof_jobs)) |> as_tibble(.name_repair=~c("o_msoa","o_east", "o_north")) ),
d_non_prof=
map( data, ~sample_n( sampled_msoas |> filter(msoa==.x |> pull(destination_msoa)), size=.x |> pull(non_prof_jobs)) |> as_tibble(.name_repair=~c("d_msoa","d_east", "d_north")) ),
) |>
unnest(-data) |> select(-data)
print( Sys.time() - start )
start <- Sys.time()
prof_points <- commute_data_filtered |>
mutate(od_pair=paste0(origin_msoa,"-",destination_msoa)) |>
filter(prof_jobs>0) |>
nest(data=-c(od_pair)) |>
mutate(
o_non_prof=
map( data, ~sample_n(sampled_msoas |> filter(msoa==.x |> pull(origin_msoa)), size=.x |> pull(prof_jobs)) |> as_tibble(.name_repair=~c("o_msoa","o_east", "o_north")) ),
d_non_prof=
map( data, ~sample_n(sampled_msoas |> filter(msoa==.x |> pull(destination_msoa)), size=.x |> pull(prof_jobs)) |> as_tibble(.name_repair=~c("d_msoa","d_east", "d_north")) )
) |>
unnest(-data) |> select(-data)
print( Sys.time() - start )
```
Finally, commutes are assigned to the bikeshare village in which they are contained using [`st_join`](https://r-spatial.github.io/sf/reference/st_join.html), and we then summarise over village-village OD pairs.
```{r, assign-commutes, eval=FALSE}
ods_non_prof <- non_prof_points %>% select(-c(od_pair,o_msoa, d_msoa)) |>
st_as_sf(coords=c("o_east", "o_north"), crs=27700) |>
st_join(grid_real_sf |> filter(type=="real") |> select(o_village=name_agg), .predicate=st_intersects()) |>
st_drop_geometry() |>
st_as_sf(coords=c("d_east", "d_north"), crs=27700) |>
st_join(grid_real_sf |> filter(type=="real") |> select(d_village=name_agg), .predicate=st_intersects()) |>
st_drop_geometry() |>
filter(!is.na(o_village) & !is.na(d_village)) |>
group_by(o_village, d_village) |>
summarise(count=n())
ods_prof <- prof_points %>% select(-c(od_pair,o_msoa, d_msoa)) |>
st_as_sf(coords=c("o_east", "o_north"), crs=27700) |>
st_join(grid_real_sf |> filter(type=="real") %>% select(o_village=name_agg), .predicate=st_intersects()) |>
st_drop_geometry() |>
st_as_sf(coords=c("d_east", "d_north"), crs=27700) |>
st_join(grid_real_sf |> filter(type=="real") |> select(d_village=name_agg), .predicate=st_intersects()) |>
st_drop_geometry() |>
filter(!is.na(o_village) & !is.na(d_village)) |>
group_by(o_village, d_village) |>
summarise(count=n())
ods_joined <-
ods_prof |> rename(prof=count) |>
full_join(ods_non_prof |> rename(non_prof=count)) |>
group_by(d_village) |>
mutate(
expected=((prof+non_prof+0.0001)*(sum(prof, na.rm=TRUE)+0.0001))/(sum(prof, na.rm=TRUE)+sum(non_prof, na.rm=TRUE)+0.0001),
pearson=(prof-expected)/(sqrt(expected))) |>
ungroup()
write_csv(ods_joined, here("data", "ods_joined.csv"))
```
## Analyse alongside bikeability
![](./figs/focus.png)
```{r, eval=FALSE}
ods_full <- tibble(
o_village=rep(grid_real_sf |> filter(type=="real") |> pull(name_agg), times=66),
d_village=rep(grid_real_sf |> filter(type=="real") |> pull(name_agg), times=1, each=66)
)
simulated_data <- read_csv(here("data", "simulated_data.csv"))
plot_data_temp <- grid_real_sf |> filter(type=="real") |> select(-c(col, row)) |>
right_join(
ods_full |> left_join(ods_joined),
ods_joined, by=c("name_agg"="o_village")) |>
mutate(o_village=name_agg) |> rename(o_x=x, o_y=y) |>
left_join(grid_real_sf |> filter(type=="grid") %>% st_drop_geometry() |>
select(name_agg,col,row), by=c("d_village"="name_agg")
) |>
rename(d_x=col, d_y=row) |>
# Identify village in focus (edit this to switch between D-OD and O-DO matrix).
mutate(label=if_else(o_village==d_village,d_village,""),
focus=if_else(o_village==d_village,1,0)) |>
mutate(commute_count=prof+non_prof)
# TfL release of key cycle routes data
url <- "https://cycling.data.tfl.gov.uk/CycleRoutes/CycleRoutes.json"
infra <- st_read(url) |> st_transform(crs=27700)
infra_scheme <- infra |>
filter(Status=="Open") |> st_intersection(grid_real |> filter(type=="real"), st_crs(grid_real))
rivers <- st_read(here("data", "river_buffer.geojson"))
bikeability <- plot_data_temp |>
left_join(simulated_data) |>
filter(d_village %in% c("Covent Garden | Strand")) |>
ggplot() +
geom_sf(data= grid_real_sf %>% filter(type=="real") %>% summarise(), colour="#616161", fill="transparent", size=.4)+
geom_sf(aes(fill=index), colour="#ffffff", size=.05)+
geom_sf(data=. %>% filter(d_village==o_village), colour="#616161", size=.1)+
geom_sf(data=infra_scheme, colour="#252525", alpha=.9, size=.2) +
geom_sf(data=rivers, size=0, fill="#bdbdbd", alpha=1)+
geom_text(data=. %>% filter(focus==1),
aes(x=o_x, y=o_y, label=str_sub(label,1,1)),
colour="#252525", alpha=0.9, size=4, show.legend=FALSE,
hjust="centre", vjust="middle", family="Avenir Next")+
scale_fill_distiller(palette="PuBu", direction=1, na.value="#f7f7f7") +
labs(subtitle="bikeability") +
guides(fill="none")+
theme(
legend.position = "bottom",
axis.text=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(),
legend.key.size = unit(.45, 'cm'),
legend.title = element_text(size=12),
legend.text = element_blank()
)
jobs <- plot_data_temp |>
left_join(simulated_data) |>
filter(d_village %in% c("Covent Garden | Strand")) |>
ggplot() +
geom_sf(data=grid_real %>% filter(type=="real") %>% summarise(), colour="#616161", fill="transparent", size=.4)+
geom_sf(aes(fill=commute_count), colour="#ffffff", size=.05)+
geom_sf(data=. %>% filter(d_village==o_village), colour="#616161", size=.1)+
geom_sf(data=infra_scheme, colour="#252525", alpha=.9, size=.2) +
geom_sf(data=rivers, size=0, fill="#bdbdbd", alpha=1)+
geom_text(data=. %>% filter(focus==1),
aes(x=o_x, y=o_y, label=str_sub(label,1,1)),
colour="#252525", alpha=0.9, size=4, show.legend=FALSE,
hjust="centre", vjust="middle", family="Avenir Next")+
scale_fill_distiller(palette="PuBu", direction=1, na.value="#f7f7f7") +
labs(subtitle="commute counts") +
guides(fill="none")+
theme(
legend.position = "bottom",
axis.text=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(),
legend.key.size = unit(.45, 'cm'),
legend.title = element_text(size=12),
legend.text = element_blank()
)
pearson <- plot_data_temp |>
left_join(simulated_data) |>
filter(d_village %in% c("Covent Garden | Strand")) |>
ggplot() +
geom_sf(data=grid_real %>% filter(type=="real") %>% summarise(), colour="#616161", fill="transparent", size=.4)+
geom_sf(aes(fill=pearson), colour="#ffffff", size=.05)+
geom_sf(data=. %>% filter(d_village==o_village), colour="#616161", size=.1)+
geom_sf(data=infra_scheme, colour="#252525", alpha=.9, size=.2) +
geom_sf(data=rivers, size=0, fill="#bdbdbd", alpha=1)+
geom_text(data=. %>% filter(focus==1),
aes(x=o_x, y=o_y, label=str_sub(label,1,1)),
colour="#252525", alpha=0.9, size=4, show.legend=FALSE,
hjust="centre", vjust="middle", family="Avenir Next")+
scale_fill_distiller(
palette="PuOr", direction=1,
limits=c(-max(abs(plot_data_temp$pearson)),max(abs(plot_data_temp$pearson))),
guide = "colourbar", na.value="#f7f7f7"
)+
labs(subtitle="signed chi-scores") +
guides(fill="none")+
theme(
legend.position = "bottom",
axis.text=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(),
legend.key.size = unit(.45, 'cm'),
legend.title = element_text(size=12),
legend.text = element_blank()
)
count_hist <- plot_data_temp |>
filter(d_village %in% c("Covent Garden | Strand")) |>
ggplot(aes(commute_count)) +
geom_histogram(aes(fill = ..x..), colour="#616161", size=.2) +
scale_fill_distiller(
palette="PuBu", direction=1,
limits=c(-max(abs(plot_data_temp$commute_count)),max(abs(plot_data_temp$commute_count))),
guide = "none", na.value="#f7f7f7"
)+
facet_wrap(~d_village, nrow=2) +
labs(x="commute count", y="village OD count") +
theme(
panel.spacing=unit(0.015, "lines"),
axis.title.x=element_blank(),axis.title.y=element_blank(),axis.text.x = element_blank(),
strip.text.x = element_blank(), strip.text.y = element_blank(), axis.text.y = element_blank(),
panel.background = element_rect(fill="#ffffff", colour="#ffffff")
)
diff_hist <- plot_data_temp |>
filter(d_village %in% c("Covent Garden | Strand")) |>
filter(o_village!=d_village) |>
ggplot(aes(pearson)) +
geom_histogram(aes(fill = ..x..), colour="#616161", size=.2) +
scale_fill_distiller(
palette="PuOr", direction=1,
limits=c(-max(abs(plot_data_temp$pearson)),max(abs(plot_data_temp$pearson))),
guide = "none", na.value="#f7f7f7"
)+
theme(
panel.spacing=unit(0.015, "lines"),
axis.title.x=element_blank(),axis.title.y=element_blank(),axis.text.x = element_blank(),
strip.text.x = element_blank(), strip.text.y = element_blank(), axis.text.y = element_blank(),
panel.background = element_rect(fill="#ffffff", colour="#ffffff")
)
max_index <- plot_data_temp %>%
left_join(simulated_data |> group_by(o_village, d_village) |> summarise(index=mean(index))) |>
filter(d_village %in% c("Covent Garden | Strand")) |>
filter(o_village!=d_village) |>
summarise(max=max(index), min=min(index)) |> st_drop_geometry()
index_hist <- plot_data_temp |>
left_join(simulated_data |> group_by(o_village, d_village) |> summarise(index=mean(index))) |>
filter(d_village %in% c("Covent Garden | Strand")) |>
filter(o_village!=d_village) |>
ggplot(aes(index)) +
geom_histogram(aes(fill = ..x..), colour="#616161", size=.2) +
scale_fill_distiller(
palette="PuBu", direction=1,
limits=c(max_index$min,max_index$max+.001),
guide = "none", na.value="#f7f7f7"
)+
facet_wrap(~d_village, nrow=2) +
labs(x="commute count", y="village OD count") +
theme(
panel.spacing=unit(0.015, "lines"),
axis.title.x=element_blank(),axis.title.y=element_blank(),axis.text.x = element_blank(),
strip.text.x = element_blank(), strip.text.y = element_blank(), axis.text.y = element_blank(),
panel.background = element_rect(fill="#ffffff", colour="#ffffff")
)
maps <- bikeability + jobs + pearson
hists <- index_hist + count_hist + diff_hist
ggsave(here("figs", "focus_maps.png"), plot=maps,width=10, height=3, dpi=300)
ggsave(here("figs", "focus_hists.png"), plot=hists,width=10, height=2, dpi=300)
```