-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(#9237): add functionality of getting people with pagination in cht-datasource #9266
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was not able to make it through everything today (just ran out of time), but want to share my thoughts so far. Will continue going through this tomorrow!
Co-authored-by: Joshua Kuestersteffen <[email protected]>
@jkuester I did not add unit tests for this function(
fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:
References: here and here. |
🤔 These existing tests are working fine for me with node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Got a couple other nit-picks and suggestions, but all in all we are close to done here.
I'm on Node version v20.12.2 |
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
😓 I can reproduce the error locally when I switched to |
Actually, digging in more to the issues you linked it looks like this was caused by Node changes introduced in Since this issue is localized to a few minor Node versions and is fixed in the latest releases of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like CI is failing for a simple linting error. Should be an easy fix. I added a few more nit-pick comments below, but otherwise LGTM! 🚀
|
||
const docs = await getDocsByPage([personType.contactType], limit, skip); | ||
|
||
return docs.filter((doc): doc is Person.v1.Person => isPerson(settings, doc, doc?._id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m5r @sugat009 I have realized a complication here while looking at the code over in #9281. Basically, we pass the limit
through to the Couch view query and so we know docs.length === limit
or we have reached the last page of results. However, before this function returns the persons, we are filtering the array. In theory it is possible that some of the results returned from queryDocsByKey
could get filtered out (with our current validation logic I cannot come up with a feasible scenario where this could happen, but if/when we add more validations, this will become an issue).
The reason I think this filtering is a problem is because it could result in us giving back an array for the page that is < limit
when there are still additional results to retrieve. Of course, this is bad for our getAll
logic, but it will also probably be confusing to any other consumers of this code. (How would anyone know when to stop calling getPage
?).
The only feasible option I can come up with here is to change the return type to Promise<Nullable<Person.v1.Person>[]>
and just pass through null
when a returned doc fails the isPerson
check. If we try to do anything more fancy, like calling the queryDocsByKey
again until we get the full limit of persons, it is going to mess up the skip
value for any subsequent calls from the consumer... The only comfort I have here is that we do not really want folks directly calling this function anyway (and the getAll
can still abstract away the Nullable
). The REST api is going to be a bit janky, but I just don't see any other way around it.
Do either of you have other thoughts here? Ideas on better options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I'm surprised that we even need a isPerson
validator since we explicitly pass a valid personType
to the database. Anyways, @jkuester is right, if in some way an entry is filtered by the isPerson
function then the sanctity of the pagination would be disturbed, so let's have a null
value in the place of the non-person doc.
As an alternative approach, instead of setting the document entry to null, we could introduce an invalid property or similar indicator. This would signify that the person
document contains incorrect or problematic data, without completely removing the entry and let the consumers decide what to do with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that we even need a isPerson validator since we explicitly pass a valid personType to the database
This is true. I have gone back and forth on if we should keep the isPerson
check since all it is doing currently is validating the contact type (which, in theory should always be correct since that is what we are querying one...). However, regardless of if we need to do any checks like this currently, I do think it is very important that we design these API patterns in such a way that we can filter data we get from Couch before returning it to the user. One of the main points of having cht-datasource is that not everything needs to be modeled directly in the db.... With that in mind, and seeing as the performance cost of isPerson
is minimal, it makes sense to me to keep it for the sake of code consistency (and better support for future validations).
the sanctity of the pagination would be disturbed
Quote of the day! 🔥
we could introduce an invalid property or similar indicator
This is an interesting idea! 🤔 Similar to the Pouch API which returns the row objects with id
value, etc, but may or may not actually have a doc
. Ultimately, though, it seems like it might not actually be useful to consumers of this API (and anyone trying to do data validation will probably want to connect directly to the database).
Another approach we could take here would be to return some kind of page object like this:
Promise<{ entries: Person.v1.Person[], cursor: string }>
This is similar to the paging in GitHub's GraphQL api, where each request returns a different cursor
that you can pass back in on subsequent requests to get the next page. In our case we could just return the proper skip
value for the next Couch query, but the meaning/value of the cursor
should be completely abstracted away from consumers of the cht-datasource api. Consumers can just treat it as a token string for getting the next page. That would give us flexibility in the future if we needed to change up some underlying details of how things work. In the context of the above discussion regarding isPerson
, this approach would let us run followup queries to populate a full list of persons to return so that entries.length === limit
while returning the proper skip
value for the next page request as the cursor
.
I don't know guys, the more I think about this approach, the more I like it! It decouples us more from the underlying Couch implementation, which I think is a win, giving us more flexibility for future innovation....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One variation on this new page object proposal would be to include a hasNextPage: boolean
in the returned page object so that consumers know when more data is available. Then, we could clarify that the provided limit
would be the max value possible for entries
, (but some pages might be returned with less). This would free us from having to implement any kind of followup queries in the getPage
logic (at least at this time). We could run the Couch query, do any necessary filtering, and return what is left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester Cursor-based pagination sounds great and would help if any record is filtered by the isPerson
or any future filters. As for the implementation, should we do it via:
overfetching
method where we fetch more docs than the limit specifies such that if records are filtered out then, the over-fetched docs can fill inre-fetching
method where we re-query the database with the amount of docs that were filtered out.
Usually, it is said to be better to minimize the no. of queries to the database which is even more true in our case. 😃 So, my vote is in the first method. What do you think?
Looping in @m5r .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the linked sites by @m5r and tried a bit on our couchdb view(medic-client/contacts_by_type
). The startkey
as a cursor approach does not seem to work for the view as all the keys are person types. We probably need to create a new view designed specifically for pagination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tried it yet, but at the end of that Couch guide, it notes that for views where the key is not unique, you will need to use a combination of startkey
, startkey_docid
, and limit
. The startkey_docid
is what will actual do the heavy lifting for our contacts_by_type query since we are not actually looking for a range of keys...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Josh, the fact that the key
is the same is what enables couch to do the pagination here. I gave it a try locally and it should do exactly what we need:
Querying the view for persons:
$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]"
{"total_rows":6,"offset":3,"rows":[
{"id":"3e93a66c-06c9-4766-b68b-eaec896ee935","key":["person"],"value":"false false 3 grootchw"},
{"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"},
{"id":"de324069-da24-4ffa-89f3-f7f67b859d70","key":["person"],"value":"false false 3 grootpatient"}
]}
Then using the person with id 6892b0eb-7398-47a9-8e92-d40d3012f9f6
as cursor:
$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]&startkey_docid=6892b0eb-7398-47a9-8e92-d40d3012f9f6"
{"total_rows":6,"offset":4,"rows":[
{"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"},
{"id":"de324069-da24-4ffa-89f3-f7f67b859d70","key":["person"],"value":"false false 3 grootpatient"}
]}
Then limiting the number of rows that I get back:
$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]&startkey_docid=6892b0eb-7398-47a9-8e92-d40d3012f9f6&limit=1"
{"total_rows":6,"offset":4,"rows":[
{"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"}
]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh! Gotcha. I was under the impression that the key
also needed to be a uuid. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so just recapping further discussion from the Slack Thread, it turns out that Pouch does not have any support for the startkey_docid
parameter. 😓 Also from reading the Couch docs it seems like the performance for skip
significantly improved after Couch 1.2
(I guess the guide linked above was old?). The Pouch Docs still note that skip
has "poor performance on IndexedDB/LevelDB", but there does not seem to be any other way to actually page queries with Pouch, so... 🤷
So the decision is to stick with just using skip
to page the queries for now. We should still do the refactor to return the skip value as a generic cursor
to consumers instead of actually exposing skip
. That will let us passively refactor our implementation once pouchdb/pouchdb#7326 is completed.
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
b20dc22
into
9193-api-endpoints-for-getting-contacts-by-type
Description
Add functionality of getting people with pagination in
cht-datasource
.Issue: #9237
Usage:
Code review checklist
Compose URLs
If Build CI hasn't passed, these may 404:
License
The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.