Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#9237): add functionality of getting people with pagination in cht-datasource #9266

Conversation

sugat009
Copy link
Member

@sugat009 sugat009 commented Jul 16, 2024

Description

Add functionality of getting people with pagination in cht-datasource.

Issue: #9237

Usage:

Person.v1.getPage(limit, skip); // limit and skip are optional with defaults as 100 and 0 respectively

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate

Compose URLs

If Build CI hasn't passed, these may 404:

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@sugat009 sugat009 added the Type: Feature Add something new label Jul 16, 2024
@sugat009 sugat009 requested a review from jkuester July 16, 2024 08:43
@sugat009 sugat009 self-assigned this Jul 16, 2024
@sugat009 sugat009 changed the title feat(#9237): Add functionality of getting people with pagination in cht-datasource feat(#9237): add functionality of getting people with pagination in cht-datasource Jul 16, 2024
Copy link
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was not able to make it through everything today (just ran out of time), but want to share my thoughts so far. Will continue going through this tomorrow!

shared-libs/cht-datasource/src/local/libs/doc.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/libs/doc.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/libs/doc.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/libs/lineage.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/index.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/remote/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/person.ts Outdated Show resolved Hide resolved
@sugat009
Copy link
Member Author

@jkuester I did not add unit tests for this function(

export const getResources = (context: RemoteDataContext, path: string) => async <T>(
) as NodeJS 20+ has an issue in which mocking fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:
export const getResource = (context: RemoteDataContext, path: string) => async <T>(

References: here and here.

@sugat009 sugat009 requested a review from jkuester July 19, 2024 15:27
@jkuester
Copy link
Contributor

NodeJS 20+ has an issue in which mocking fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:

🤔 These existing tests are working fine for me with node 20.11.0 (and are running in CI with that same version). What version of Node are you using?

Copy link
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Got a couple other nit-picks and suggestions, but all in all we are close to done here.

shared-libs/cht-datasource/src/local/libs/lineage.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/local/libs/lineage.spec.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/local/libs/doc.spec.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/local/libs/doc.spec.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/index.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/index.spec.ts Outdated Show resolved Hide resolved
@sugat009
Copy link
Member Author

NodeJS 20+ has an issue in which mocking fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:

🤔 These existing tests are working fine for me with node 20.11.0 (and are running in CI with that same version). What version of Node are you using?

I'm on Node version v20.12.2

@jkuester
Copy link
Contributor

I'm on Node version v20.12.2

😓 I can reproduce the error locally when I switched to 20.12.2. Going to do some more digging to see what the best response is here....

@jkuester
Copy link
Contributor

Actually, digging in more to the issues you linked it looks like this was caused by Node changes introduced in 20.12.0 and then subsequently fixed in 20.13.1.

Since this issue is localized to a few minor Node versions and is fixed in the latest releases of 20.x and 22.x, I think it is fine to not worry about it and @sugat009 you just need to update your Node version to 20.13.1+. 😅 (I confirmed it works fine on v20.15.1.)

Copy link
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like CI is failing for a simple linting error. Should be an easy fix. I added a few more nit-pick comments below, but otherwise LGTM! 🚀

shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/local/libs/lineage.spec.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/local/libs/lineage.spec.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/test/local/person.spec.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/libs/lineage.ts Outdated Show resolved Hide resolved

const docs = await getDocsByPage([personType.contactType], limit, skip);

return docs.filter((doc): doc is Person.v1.Person => isPerson(settings, doc, doc?._id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m5r @sugat009 I have realized a complication here while looking at the code over in #9281. Basically, we pass the limit through to the Couch view query and so we know docs.length === limit or we have reached the last page of results. However, before this function returns the persons, we are filtering the array. In theory it is possible that some of the results returned from queryDocsByKey could get filtered out (with our current validation logic I cannot come up with a feasible scenario where this could happen, but if/when we add more validations, this will become an issue).

The reason I think this filtering is a problem is because it could result in us giving back an array for the page that is < limit when there are still additional results to retrieve. Of course, this is bad for our getAll logic, but it will also probably be confusing to any other consumers of this code. (How would anyone know when to stop calling getPage?).

The only feasible option I can come up with here is to change the return type to Promise<Nullable<Person.v1.Person>[]> and just pass through null when a returned doc fails the isPerson check. If we try to do anything more fancy, like calling the queryDocsByKey again until we get the full limit of persons, it is going to mess up the skip value for any subsequent calls from the consumer... The only comfort I have here is that we do not really want folks directly calling this function anyway (and the getAll can still abstract away the Nullable). The REST api is going to be a bit janky, but I just don't see any other way around it.

Do either of you have other thoughts here? Ideas on better options?

Copy link
Member Author

@sugat009 sugat009 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm surprised that we even need a isPerson validator since we explicitly pass a valid personType to the database. Anyways, @jkuester is right, if in some way an entry is filtered by the isPerson function then the sanctity of the pagination would be disturbed, so let's have a null value in the place of the non-person doc.

As an alternative approach, instead of setting the document entry to null, we could introduce an invalid property or similar indicator. This would signify that the person document contains incorrect or problematic data, without completely removing the entry and let the consumers decide what to do with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that we even need a isPerson validator since we explicitly pass a valid personType to the database

This is true. I have gone back and forth on if we should keep the isPerson check since all it is doing currently is validating the contact type (which, in theory should always be correct since that is what we are querying one...). However, regardless of if we need to do any checks like this currently, I do think it is very important that we design these API patterns in such a way that we can filter data we get from Couch before returning it to the user. One of the main points of having cht-datasource is that not everything needs to be modeled directly in the db.... With that in mind, and seeing as the performance cost of isPerson is minimal, it makes sense to me to keep it for the sake of code consistency (and better support for future validations).

the sanctity of the pagination would be disturbed

Quote of the day! 🔥

we could introduce an invalid property or similar indicator

This is an interesting idea! 🤔 Similar to the Pouch API which returns the row objects with id value, etc, but may or may not actually have a doc. Ultimately, though, it seems like it might not actually be useful to consumers of this API (and anyone trying to do data validation will probably want to connect directly to the database).


Another approach we could take here would be to return some kind of page object like this:

Promise<{ entries: Person.v1.Person[], cursor: string }>

This is similar to the paging in GitHub's GraphQL api, where each request returns a different cursor that you can pass back in on subsequent requests to get the next page. In our case we could just return the proper skip value for the next Couch query, but the meaning/value of the cursor should be completely abstracted away from consumers of the cht-datasource api. Consumers can just treat it as a token string for getting the next page. That would give us flexibility in the future if we needed to change up some underlying details of how things work. In the context of the above discussion regarding isPerson, this approach would let us run followup queries to populate a full list of persons to return so that entries.length === limit while returning the proper skip value for the next page request as the cursor.

I don't know guys, the more I think about this approach, the more I like it! It decouples us more from the underlying Couch implementation, which I think is a win, giving us more flexibility for future innovation....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One variation on this new page object proposal would be to include a hasNextPage: boolean in the returned page object so that consumers know when more data is available. Then, we could clarify that the provided limit would be the max value possible for entries, (but some pages might be returned with less). This would free us from having to implement any kind of followup queries in the getPage logic (at least at this time). We could run the Couch query, do any necessary filtering, and return what is left.

Copy link
Member Author

@sugat009 sugat009 Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester Cursor-based pagination sounds great and would help if any record is filtered by the isPerson or any future filters. As for the implementation, should we do it via:

  1. overfetching method where we fetch more docs than the limit specifies such that if records are filtered out then, the over-fetched docs can fill in
  2. re-fetching method where we re-query the database with the amount of docs that were filtered out.

Usually, it is said to be better to minimize the no. of queries to the database which is even more true in our case. 😃 So, my vote is in the first method. What do you think?

Looping in @m5r .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the linked sites by @m5r and tried a bit on our couchdb view(medic-client/contacts_by_type). The startkey as a cursor approach does not seem to work for the view as all the keys are person types. We probably need to create a new view designed specifically for pagination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tried it yet, but at the end of that Couch guide, it notes that for views where the key is not unique, you will need to use a combination of startkey, startkey_docid, and limit. The startkey_docid is what will actual do the heavy lifting for our contacts_by_type query since we are not actually looking for a range of keys...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Josh, the fact that the key is the same is what enables couch to do the pagination here. I gave it a try locally and it should do exactly what we need:

Querying the view for persons:

$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]"
{"total_rows":6,"offset":3,"rows":[
{"id":"3e93a66c-06c9-4766-b68b-eaec896ee935","key":["person"],"value":"false false 3 grootchw"},
{"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"},
{"id":"de324069-da24-4ffa-89f3-f7f67b859d70","key":["person"],"value":"false false 3 grootpatient"}
]}

Then using the person with id 6892b0eb-7398-47a9-8e92-d40d3012f9f6 as cursor:

$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]&startkey_docid=6892b0eb-7398-47a9-8e92-d40d3012f9f6"
{"total_rows":6,"offset":4,"rows":[
{"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"},
{"id":"de324069-da24-4ffa-89f3-f7f67b859d70","key":["person"],"value":"false false 3 grootpatient"}
]}

Then limiting the number of rows that I get back:

$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]&startkey_docid=6892b0eb-7398-47a9-8e92-d40d3012f9f6&limit=1"
{"total_rows":6,"offset":4,"rows":[
{"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"}
]}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh! Gotcha. I was under the impression that the key also needed to be a uuid. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so just recapping further discussion from the Slack Thread, it turns out that Pouch does not have any support for the startkey_docid parameter. 😓 Also from reading the Couch docs it seems like the performance for skip significantly improved after Couch 1.2 (I guess the guide linked above was old?). The Pouch Docs still note that skip has "poor performance on IndexedDB/LevelDB", but there does not seem to be any other way to actually page queries with Pouch, so... 🤷

So the decision is to stick with just using skip to page the queries for now. We should still do the refactor to return the skip value as a generic cursor to consumers instead of actually exposing skip. That will let us passively refactor our implementation once pouchdb/pouchdb#7326 is completed.

@sugat009 sugat009 merged commit b20dc22 into 9193-api-endpoints-for-getting-contacts-by-type Aug 9, 2024
18 of 40 checks passed
@sugat009 sugat009 deleted the 9237-add-functionality-of-getting-people-with-pagination-in-cht-datasource branch August 9, 2024 13:33
sugat009 added a commit that referenced this pull request Aug 12, 2024
sugat009 added a commit that referenced this pull request Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Add something new
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants