I noticed this morning that one of my Datasette installations had the Geopoly SQLite extension enabled. I don't know how it got there - it has to be compiled specifically - but since it was there I decided to try it out.
The clue that it was enabled was seeing ENABLE_GEOPOLY
listed in compile options on the /-/versions
page.
Geopoly supports "a small subset of GeoJSON". It can only handle polygons, and they have to be "simple" polygons for which the boundary does not intersect itself.
It took me a bit of work to populate a database table. I started by exporting GeoJSON for North and South America from this page:
Here's the data I exported as a Gist.
To load that into SQLite I started by installing the datasette-write plugin and starting Datasette as root:
datasette install datasette-write
datasette --create data.db --root -o
Then I visited http://localhost:8001/-/write
and created a raw_data
table and inserted the GeoJSON:
create table raw_data (text);
I had to replace any single '
characters with ''
in the JSON to escape them, then I pasted in the following:
insert into raw_data (text) values ('{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",...');
This gave me a populated raw_data
table with a blob of JSON I could start playing with.
This next bit was really tricky.
I started by creating a new table for my countries:
create virtual table countries using geopoly(name, properties);
My GeoJSON data contained two types of object: polygons and multipolygons.
Neither of these can be inserted directly into a Geopoly table.
GeoJSON Polygons exist as a list, where the first item in the list is the shape and subsequent items are "holes" within that shape (lakes and suchlike).
Multipolygons are a list of polygons, for example the USA has separate polygons for Alaska and Hawaii and more.
Geopoly can only handle a simple polygon - so I needed to pick a single polygon for each of my countries.
After some experimenting, I came up with the following write query to populate the table with valid polygons:
insert into countries (name, properties, _shape)
select
json_extract(value, '$.properties.name') as name,
json_extract(value, '$.properties') as properties,
json_extract(value, '$.geometry.coordinates[0]') as _shape
from json_each(text, '$.features'), raw_data
where geopoly_blob(_shape) is not null
There are a few tricks going on here.
from json_each(text, '$.features'), raw_data
loops through every feature in the$.features
list inside thetext
field in thatraw_data
table I created earlier. The join there is a little unintuitive but it works for processing JSON in a table with a single row. Each item in that array is available asvalue
in the rest of the query.json_extract(value, '$.properties.name')
extracts thename
property from theproperties
object inside the GeoJSON feature - this is the name of the country.json_extract(value, '$.properties')
extracts the fullproperties
object as a JSON string.json_extract(value, '$.geometry.coordinates[0]')
extracts the first item from thecoordinates
array inside the GeoJSON geometry object. As discussed earlier, I decided to ignore the "holes" in each polygon and just store the outer-most shape.where geopoly_blob(_shape) is not null
is a trick I found to filter out invalid polygons - without this the entire query failed with an unhelpful SQL logic error.
This query turned out to do the right thing for all of the countries with a single polygon - but it skipped countries like the USA and Canada which used a multipolygon.
I eventually figured out I could import those countries as well with a second query:
insert into countries (name, properties, _shape)
select
json_extract(value, '$.properties.name') as name,
json_extract(value, '$.properties') as properties,
json_extract(value, '$.geometry.coordinates[0][0]') as _shape
from json_each(text, '$.features'), raw_data
where json_extract(value, '$.geometry.type') = 'MultiPolygon';
Here I'm using $.geometry.coordinates[0][0]
to extract that outer shape from the first polygon in the multipolygon.
This worked! Having run the above two queries I had a fully populated countries
table with 31 rows.
I can now use the geopoly_contains_point()
function to find which country a point is in:
select
name, properties
from
countries
where
geopoly_contains_point(_shape, -84.1067362, 9.9314029)
The latitude and longitude of San José, Costa Rica is 9.9314029, -84.1067362. This query returns Costa Rica.
I installed the datasette-geojson-map plugin by Chris Amico and used it to plot my countries on a map:
datasette install datasette-geojson-map
Then in the SQL interface:
select
name,
'{"coordinates": ['
|| geopoly_json(_shape) ||
'], "type": "Polygon"}' as geometry
from
countries
Here I'm using the geopoly_json(_shape)
function to turn the binary representation of the shape in the database back into a GeoJSON polygon. Then I'm concatenating that with a little bit of wrapping JSON to create a full GeoJSON feature that looks like this (coordinates truncated):
{"coordinates": [[[-82.5462,9.56613],[-82.9329,9.47681]]], "type": "Polygon"}
The plugin renders any GeoJSON in a column called geometry
. The output looked like this:
I just spotted that Argentina is missing from that map, presumably because it's a multipolygon and the first polygon in that sequence isn't the largest polygon covering that country.
The obvious problem with this approach is that I've over-simplified the polygons: the USA is missing two whole states and a bunch of territories, and other countries have been simplified in similar ways.
I think there's a reasonable way to handle this, with a bit more work. The trick would be to represent each country as multiple rows in the database, each row corresponding to one of the polygons in the multipolygon.
This could even be extended to handle holes in regular polygons, by running queries that can consider multiple rows and identify if a point falls within a hole and hence should not be considered as part of a country.