Skip to content

Latest commit

 

History

History
179 lines (154 loc) · 7.35 KB

related-rows-single-query.md

File metadata and controls

179 lines (154 loc) · 7.35 KB

Returning related rows in a single SQL query using JSON

When building database-backed applications you'll often find yourself wanting to return a row from the database along with its related rows.

A few examples:

  • Retrieving a list of congressional legislators and their terms, following a foreign key relationship
  • Return blog entries and their tags in one go, via a many-to-many table

You can do this in SQLite using the json_group_array() aggregation function. A couple of examples.

Legislators and their terms, via a foreign key

Simplified schema for this database:

CREATE TABLE [legislators] (
   [id] TEXT PRIMARY KEY,
   [name] TEXT,
   [bio_birthday] TEXT
);
CREATE TABLE [legislator_terms] (
   [legislator_id] TEXT REFERENCES [legislators]([id]),
   [type] TEXT,
   [state] TEXT,
   [start] TEXT,
   [end] TEXT,
   [party] TEXT
);

Here's a query that returns each legislator along with a JSON array of their terms:

select
  legislators.id,
  legislators.name,
  json_group_array(json_object(
    'type', legislator_terms.type,
    'state', legislator_terms.state,
    'start', legislator_terms.start,
    'end', legislator_terms.end,
    'party', legislator_terms.party
   )) as terms,
   count(*) as num_terms
from
  legislators join legislator_terms on legislator_terms.legislator_id = legislators.id
  group by legislators.id
order by
  id
limit
  10

And the result:

Screenshot of a query result. There is a terms column containing a JSON list of terms.

Note that this query does group by legislators.id which is allowed in SQLite but may not work in other databases, which might require group by legislators.id, legislators.name instead.

Tags on blog entries, via a many-to-many table

Simplified schema:

CREATE TABLE [blog_entry] (
   [id] INTEGER PRIMARY KEY,
   [title] TEXT
);

CREATE TABLE [blog_tag] (
   [id] INTEGER PRIMARY KEY,
   [tag] TEXT
);

CREATE TABLE [blog_entry_tags] (
   [id] INTEGER PRIMARY KEY,
   [entry_id] INTEGER,
   [tag_id] INTEGER,
   FOREIGN KEY([entry_id]) REFERENCES [blog_entry]([id]),
   FOREIGN KEY([tag_id]) REFERENCES [blog_tag]([id])
);

Query to retrieve entries with their tags:

select
  blog_entry.id,
  blog_entry.title,
  json_group_array(json_object('tag', blog_tag.tag)) as tags
from
  blog_entry
  join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
  join blog_tag on blog_tag.id = blog_entry_tags.tag_id
group by
  blog_entry.id
order by
  blog_entry.id desc

Result:

id title tags
8191 I don't know how to solve prompt injection [{"tag":"ai"},{"tag":"security"},{"tag":"openai"}]
8190 Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more [{"tag":"shotscraper"},{"tag":"datasette"},{"tag":"plugins"},{"tag":"datasettelite"},{"tag":"projects"},{"tag":"s3credentials"},{"tag":"weeknotes"}]
8189 Prompt injection attacks against GPT-3 [{"tag":"ai"},{"tag":"gpt3"},{"tag":"security"},{"tag":"openai"}]

There's a subtle bug in the above: if an entry has no tags at all it will be excluded from the query results entirely.

You can fix that using left joins like this:

select
  blog_entry.id,
  blog_entry.title,
  json_group_array(json_object('tag', blog_tag.tag)) as tags
from
  blog_entry
  left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
  left join blog_tag on blog_tag.id = blog_entry_tags.tag_id
where blog_entry.id < 4
group by
  blog_entry.id
order by
  blog_entry.id desc

This almost works, but it outputs the following returning {"tag": null} for entries with no tags:

id title tags
3 Todo list [{"tag":null}]
2 Blogging aint easy [{"tag":null}]
1 WaSP Phase II [{"tag":null}]

David Fetter showed me the solution:

select
  blog_entry.id,
  blog_entry.title,
  json_group_array(
    json_object('tag', blog_tag.tag) 
  ) filter (
    where
      blog_tag.tag is not null
  ) as tags
from
  blog_entry
  left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
  left join blog_tag on blog_tag.id = blog_entry_tags.tag_id
group by
  blog_entry.id
order by
  blog_entry.id

That extra filter on the aggregation does the trick!

Other databases

Other databases are capable of the same thing, but using different functions. PostgreSQL has json_agg() for example, which is also available in Django as JSONBAgg.

Here's an equivalent query in PostgreSQL syntax:

select
  blog_entry.id,
  title,
  slug,
  created,
  coalesce(json_agg(json_build_object(blog_tag.id, blog_tag.tag)) filter (
    where
      blog_tag.tag is not null
  ), json_build_array()) as tags
from
  blog_entry
  left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
  left join blog_tag on blog_entry_tags.tag_id = blog_tag.id
group by
  blog_entry.id
order by
  blog_entry.id

See that running here in django-sql-dashboard.