Skip to content

Latest commit

 

History

History
54 lines (40 loc) · 4.1 KB

sort-by-number-of-json-intersections.md

File metadata and controls

54 lines (40 loc) · 4.1 KB

Sort by number of JSON intersections

This post on Reddit asked how to run a query that takes a list of items (in this case ingredients) as the input and returns all rows with at least one of those items in a JSON list, ordered by the most matches.

I jumped on this as a great opportunity to demonstrate Datasette Lite as a tool for answering SQL questions.

Creating an example database in Datasette Lite

I started with the following SQL to create the demo table:

create table recipes (id integer primary key, name text, ingredients text);
insert into recipes values (1, 'Cake', '["flour", "sugar", "eggs"]');
insert into recipes values (2, 'Pancakes', '["flour", "eggs", "butter"]');
insert into recipes values (3, 'Waffles', '["flour", "milk", "eggs"]');
insert into recipes values (4, 'Pizza', '["flour", "sugar", "eggs", "cheese"]');

I actually got GitHub Copilot to write most of that for me:

In GitHub Copilot, I start with:  create table recipes (id integer primary key, name text, ingredients text);  I type "insert into" and it autocompletes to:  insert into recipes values (1, 'Pizza', 'Tomato Sauce, Cheese, and Pizza');  I  edit that to 'Cake' instead and it fills in "flour, sugar, eggs, butter'  Then I edit that to be a JSON array instead  It completes (with a few tweaks from me) with:  insert into recipes values (2, 'Pancakes', '["flour", "sugar", "eggs", "butter"]'); insert into recipes values (3, 'Pizza', '["vegetables", "cheese", "tomatoes", "sauce"]'); insert into recipes values (4, 'Spaghetti', '["pasta", "tomatoes", "sauce"]');

I saved that to a gist and opened it in Datasette Lite with this URL:

https://lite.datasette.io/?sql=https://gist.githubusercontent.com/simonw/1f8a91123ccefd8844187225b1832d7a/raw/5069075b86aa79358fbab3d4482d1d269077d632/recipes.sql

Solving the problem

Here's the SQL recipe I came up with to solve the question:

select id, name, ingredients, (
  select json_group_array(value) from json_each(ingredients)
  where value in (select value from json_each(:p0))
) as matching_ingredients
from recipes
where json_array_length(matching_ingredients) > 0
order by json_array_length(matching_ingredients) desc

Example of that query with an input for p0 of ["sugar", "cheese"], which returns:

id name ingredients matching_ingredients
4 Pizza ["flour", "sugar", "eggs", "cheese"] ["sugar","cheese"]
1 Cake ["flour", "sugar", "eggs"] ["sugar"]

The key trick here is the bit that creates that matching_ingredients column, which uses a sub-select like this:

select json_group_array(value) from json_each(ingredients)
where value in (select value from json_each(:p0))

json_group_array(value) is an aggregation function that turns the results into a JSON array.

where value in (select value from json_each('["sugar", "cheese"]') is the bit that calculates the intersection of the two JSON arrays.

Then at the end I use json_array_length() to remove rows with no overlap and then sort with the most matching ingredients first.