Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: Add an API to allow adding database indexes #1944

Closed
ekigamba opened this issue Mar 31, 2023 · 6 comments · Fixed by #1964
Closed

Engine: Add an API to allow adding database indexes #1944

ekigamba opened this issue Mar 31, 2023 · 6 comments · Fixed by #1964
Labels
P1 High priority issue type:enhancement New feature or request

Comments

@ekigamba
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Some FHIR Engine search queries, performed on the local database, become slow when there's a lot of data. An example is a query that needs to sort data by chronological order before returning the first X from offset Y and there are 8,000 records matching the query.
The search param tables have indexes formed from multiple columns which allows for faster filtering during search but have as much impact where only one field is used such as a sort.

Describe the solution you'd like
An API that allows adding a database index. This can be considered as a migration so that it's only run once and before the database is loaded. I'm open to suggestions on the design of this API

Describe alternatives you've considered
The team has tried rewriting some of the queries to better use the indexes in the filters and sort by multiple field/columns declared as indexes. This did not provide the performance improvements The bottle neck is sorting the records and this is needed before paging the results. Performing this operation at the database level is more efficient in terms of time and space.

Additional context

  • P2P used to share FHIR data between devices which are offline sends this data chronologically to allow resumption and updates and therefore needs to sort the data by the _lastUpdated. Some of these resource types have grown in numbers and queries to fetch this data have slowed down. An example is the DateTimeIndexEntity table which stores the _lastUpdated
  • Patient records shown on registers sometimes need to be shown in a specific order. This requires sorting and might slow down when applications scale and have more data.
@ekigamba
Copy link
Contributor Author

@jingtang10 Kindly review this and get back in case you have any questions or suggestions

@santosh-pingle santosh-pingle added the type:enhancement New feature or request label Apr 3, 2023
@santosh-pingle
Copy link
Collaborator

@aditya-07 Can you please add this issue to Android Fhir Sdk project?

@jingtang10
Copy link
Collaborator

@ekigamba can you share some exampel queries?

Is _lastUpdated the only index that you need to add? What resources do you need indexed by _lastUpdated?

An alternative solution could be, if _lastUpdated is the only index you need, we could create a special cased index for last update -- I can imagine this to be very useful for many use cases. And by doing that we might avoid the need to create a whole new totally custom API.

Also, does this work #1778 allow you to add a custom search param (which we'll use in indexing)? @aditya-07 FYI.

@jingtang10 jingtang10 added the P1 High priority issue label Apr 3, 2023
@ekigamba
Copy link
Contributor Author

ekigamba commented Apr 4, 2023

@jingtang10 So far this is the query that we used on p2p

CREATE INDEX IF NOT EXISTS `index_DateTimeIndexEntity_index_from` ON `DateTimeIndexEntity` (`index_from`)

I believe this is one use-case where adding a database index will speed up queries and it would mean that there might be other areas where individual implementers might get value from this

On the custom search param, the FHIR Resource properties that we filter by are already defined FHIR search parameter

@aditya-07
Copy link
Collaborator

I believe this is one use-case where adding a database index will speed up queries and it would mean that there might be other areas where individual implementers might get value from this

@ekigamba Can you share the raw query that you use in P2P sync ?

@ekigamba
Copy link
Contributor Author

@aditya-07

Here is the query

SELECT a.serializedResource
FROM ResourceEntity a
LEFT JOIN DateIndexEntity b
ON a.resourceType = b.resourceType AND a.resourceUuid = b.resourceUuid 
LEFT JOIN DateTimeIndexEntity c
ON a.resourceType = c.resourceType AND a.resourceUuid = c.resourceUuid
WHERE a.resourceUuid IN (
SELECT resourceUuid FROM DateTimeIndexEntity
WHERE resourceType = 'Task' AND index_name = '_lastUpdated' AND index_to >= '0'
)
AND (b.index_name = '_lastUpdated' OR c.index_name = '_lastUpdated')
ORDER BY b.index_from ASC, c.index_from ASC, a.id ASC
LIMIT 25 OFFSET 0

The query plan doesn't seem to change for this specific query but based on the idea that sorting uses indexes. The query plan changes for the query below

EXPLAIN QUERY PLAN
SELECT * FROM DateTimeIndexEntity 
ORDER BY index_from ASC
LIMIT 25 OFFSET 0

Here is the first query in action

Query performance before adding the index
Screenshot from 2023-04-27 16-52-04

Adding the index
Screenshot from 2023-04-27 16-52-09

Query performance after adding the index
Screenshot from 2023-04-27 16-52-13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority issue type:enhancement New feature or request
Projects
Status: Complete
Development

Successfully merging a pull request may close this issue.

4 participants