Engine: Add an API to allow adding database indexes #1944

ekigamba · 2023-03-31T08:40:51Z

Is your feature request related to a problem? Please describe.
Some FHIR Engine search queries, performed on the local database, become slow when there's a lot of data. An example is a query that needs to sort data by chronological order before returning the first X from offset Y and there are 8,000 records matching the query.
The search param tables have indexes formed from multiple columns which allows for faster filtering during search but have as much impact where only one field is used such as a sort.

Describe the solution you'd like
An API that allows adding a database index. This can be considered as a migration so that it's only run once and before the database is loaded. I'm open to suggestions on the design of this API

Describe alternatives you've considered
The team has tried rewriting some of the queries to better use the indexes in the filters and sort by multiple field/columns declared as indexes. This did not provide the performance improvements The bottle neck is sorting the records and this is needed before paging the results. Performing this operation at the database level is more efficient in terms of time and space.

Additional context

P2P used to share FHIR data between devices which are offline sends this data chronologically to allow resumption and updates and therefore needs to sort the data by the _lastUpdated. Some of these resource types have grown in numbers and queries to fetch this data have slowed down. An example is the DateTimeIndexEntity table which stores the _lastUpdated
Patient records shown on registers sometimes need to be shown in a specific order. This requires sorting and might slow down when applications scale and have more data.

The text was updated successfully, but these errors were encountered:

ekigamba · 2023-03-31T08:41:21Z

@jingtang10 Kindly review this and get back in case you have any questions or suggestions

santosh-pingle · 2023-04-03T06:05:54Z

@aditya-07 Can you please add this issue to Android Fhir Sdk project?

jingtang10 · 2023-04-03T14:27:34Z

@ekigamba can you share some exampel queries?

Is _lastUpdated the only index that you need to add? What resources do you need indexed by _lastUpdated?

An alternative solution could be, if _lastUpdated is the only index you need, we could create a special cased index for last update -- I can imagine this to be very useful for many use cases. And by doing that we might avoid the need to create a whole new totally custom API.

Also, does this work #1778 allow you to add a custom search param (which we'll use in indexing)? @aditya-07 FYI.

ekigamba · 2023-04-04T12:08:22Z

@jingtang10 So far this is the query that we used on p2p

CREATE INDEX IF NOT EXISTS `index_DateTimeIndexEntity_index_from` ON `DateTimeIndexEntity` (`index_from`)

I believe this is one use-case where adding a database index will speed up queries and it would mean that there might be other areas where individual implementers might get value from this

On the custom search param, the FHIR Resource properties that we filter by are already defined FHIR search parameter

aditya-07 · 2023-04-27T07:42:23Z

I believe this is one use-case where adding a database index will speed up queries and it would mean that there might be other areas where individual implementers might get value from this

@ekigamba Can you share the raw query that you use in P2P sync ?

ekigamba · 2023-04-27T14:00:08Z

@aditya-07

Here is the query

SELECT a.serializedResource
FROM ResourceEntity a
LEFT JOIN DateIndexEntity b
ON a.resourceType = b.resourceType AND a.resourceUuid = b.resourceUuid 
LEFT JOIN DateTimeIndexEntity c
ON a.resourceType = c.resourceType AND a.resourceUuid = c.resourceUuid
WHERE a.resourceUuid IN (
SELECT resourceUuid FROM DateTimeIndexEntity
WHERE resourceType = 'Task' AND index_name = '_lastUpdated' AND index_to >= '0'
)
AND (b.index_name = '_lastUpdated' OR c.index_name = '_lastUpdated')
ORDER BY b.index_from ASC, c.index_from ASC, a.id ASC
LIMIT 25 OFFSET 0

The query plan doesn't seem to change for this specific query but based on the idea that sorting uses indexes. The query plan changes for the query below

EXPLAIN QUERY PLAN
SELECT * FROM DateTimeIndexEntity 
ORDER BY index_from ASC
LIMIT 25 OFFSET 0

Here is the first query in action

Query performance before adding the index

Adding the index

Query performance after adding the index

santosh-pingle added the type:enhancement New feature or request label Apr 3, 2023

jingtang10 added this to Android FHIR SDK Apr 3, 2023

github-project-automation bot moved this to New in Android FHIR SDK Apr 3, 2023

jingtang10 added the P1 High priority issue label Apr 3, 2023

ellykits mentioned this issue Apr 12, 2023

Add index to DateTimeIndexEntity table index_from column #1964

Merged

7 tasks

jingtang10 closed this as completed in #1964 May 30, 2023

github-project-automation bot moved this from New to Complete in Android FHIR SDK May 30, 2023

ekigamba mentioned this issue Jul 6, 2023

[P2P] Possible p2p sync slow regression opensrp/fhircore#2536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engine: Add an API to allow adding database indexes #1944

Engine: Add an API to allow adding database indexes #1944

ekigamba commented Mar 31, 2023

ekigamba commented Mar 31, 2023

santosh-pingle commented Apr 3, 2023

jingtang10 commented Apr 3, 2023

ekigamba commented Apr 4, 2023

aditya-07 commented Apr 27, 2023

ekigamba commented Apr 27, 2023

Engine: Add an API to allow adding database indexes #1944

Engine: Add an API to allow adding database indexes #1944

Comments

ekigamba commented Mar 31, 2023

ekigamba commented Mar 31, 2023

santosh-pingle commented Apr 3, 2023

jingtang10 commented Apr 3, 2023

ekigamba commented Apr 4, 2023

aditya-07 commented Apr 27, 2023

ekigamba commented Apr 27, 2023