Skip to content

Commit

Permalink
Merge branch 'master' into 27-count-subscribers-single-region
Browse files Browse the repository at this point in the history
  • Loading branch information
jc-harrison authored Apr 14, 2020
2 parents 6ca1476 + 627f204 commit 9d8be92
Show file tree
Hide file tree
Showing 5 changed files with 100 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ This repository currently contains SQL code and descriptions for the following a
- [Aggregate 7: Total number of calls per region per day](aggregate_7.md)
- [Aggregate 8: Home location counts per region](aggregate_8.md)
- [Count of subscribers that are seen only in one region per region per day](count_subscribers_single_region.md)
- [Trips between consecutive locations per day](od_matrix_directed_consecutive_pairs.md)
- [Static resident counts per region per day](count_subscribers_home_region_per_day.md)

## Privacy

Expand Down
21 changes: 21 additions & 0 deletions count_subscribers_home_region_per_day.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Static residents count per region per day

## What is this?

This is the total number of subscribers who are not seen outside of their 'home location' in each region. See [intermediate_queries.md](intermediate_queries.md) for a definition of 'home location'.

## How to produce the aggregate

You can find the SQL code for producing this aggregate in [count_subscribers_home_region_per_day.sql](count_subscribers_home_region_per_day.sql).

To produce this aggregate, you need to run a sequence of queries in the following order. These are:

1. Home locations for all subscribers - [`home_locations`](intermediate_queries.sql#L5-L44)
See description under [Intermediate queries](intermediate_queries.md), and SQL code in [intermediate_queries.sql](intermediate_queries.sql).

2. Count of static residents per region per day - [`count_subscribers_home_region_per_day`](count_subscribers_home_region_per_day.sql#L5-L22)
_Description_: This query counts the number of subscribers who are only seen at their home location (as calculated in the previous query) in each region on each day.

## Usage and interpretation

This can be useful in combination with other aggregates. For example, counts of static subscribers per region can be scaled using home location counts to get proportions of residents who are adhering to mobility restrictions per region.
26 changes: 26 additions & 0 deletions count_subscribers_home_region_per_day.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.

CREATE TABLE count_subscribers_home_region_per_day
AS SELECT region,
call_date AS count_date,
count(*) AS subscriber_count
FROM (SELECT msisdn,
call_date
FROM home_locations -- See intermediate_queries.sql for code to create the home_locations table
INNER JOIN (SELECT calls.msisdn,
calls.call_date,
cells.region
FROM calls
INNER JOIN cells ON calls.location_id = cells.cell_id
WHERE (calls.call_date >= '2020-02-01')
AND (calls.call_date <= CURRENT_DATE)
GROUP BY call_date, msisdn, region) AS locs USING (msisdn)
GROUP BY msisdn, call_date
HAVING sum(((locs.region <> home_locations.region))::integer) = 0) AS at_home
INNER JOIN home_locations USING (msisdn)
GROUP BY region, call_date
HAVING count(*) >= 15
ORDER BY call_date,
region;
19 changes: 19 additions & 0 deletions od_matrix_directed_consecutive_pairs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Trips between consecutive locations per day

## What is this?

This is the number of ‘_consecutive_ directional connections’ between each pair of regions, each day. This is defined for each pair of regions as the number of unique subscribers that have visited the first region within a day, and next visited the second region that same day.

## How to produce the aggregate

You can find the SQL code for producing this aggregate in [od_matrix_directed_consecutive_pairs.sql](od_matrix_directed_consecutive_pairs.sql).

The query [`od_matrix_directed_consecutive_pairs`](od_matrix_directed_consecutive_pairs.sql#L5-30) is a standalone query which can be run by itself to produce the aggregate.

The first time you run this, you will need to include a timespan of data that includes the period before any mobility restrictions were enforced in your country, or before the first cases of COVID-19 were reported in your country. This is so that you can establish what ‘normal’ baseline behaviour looks like, and then see how this behaviour changed. We recommend that you include at least two weeks of ‘normal’ baseline data (i.e. the two weeks immediately before the announcement of restrictions or the outbreak), and preferably four weeks.

Once you have this baseline data, you can then run the query once every day, only looking at a single day’s data (yesterday).

## Usage and interpretation

This is similar to [Aggregate 6](aggregate_6.md), but includes the number of subscribers who 'stayed' within a region (i.e. they remained there long enough make more than one call). For example, the count for `region_from=A` and `region_to=A` is the number of unique subscribers who had a stay in region A on that day. In addition, and unlike [Aggregate 6](aggregate_6.md), a subscriber who called from region A, then B, then C would be counted in movements from A->B, and B->C but _not_ A->C.
32 changes: 32 additions & 0 deletions od_matrix_directed_consecutive_pairs.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.

CREATE TABLE od_matrix_directed_consecutive_pairs
AS WITH located AS (SELECT msisdn,
region,
call_date,
row_number() OVER (PARTITION BY msisdn, call_date
ORDER BY call_datetime ASC) AS rank
FROM calls
INNER JOIN cells ON calls.location_id = cells.cell_id
WHERE (calls.call_date >= '2020-03-01')
AND (calls.call_date <= CURRENT_DATE))

SELECT call_date,
region_from,
region_to,
count(*)
FROM (SELECT source.msisdn,
source.call_date,
source.region AS region_from,
sink.region AS region_to
FROM located AS source
INNER JOIN (SELECT msisdn,
region,
call_date,
rank - 1 AS rank
FROM located) AS sink USING (msisdn, call_date, rank)
GROUP BY msisdn, call_date, region_from, region_to) AS joined
GROUP BY call_date, region_from, region_to
HAVING count(*) >= 15;

0 comments on commit 9d8be92

Please sign in to comment.