Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(#9237): add functionality of getting people with pagination in cht-datasource #9266
feat(#9237): add functionality of getting people with pagination in cht-datasource #9266
Changes from 10 commits
08e2b27
ab92f0b
de63197
105d5db
2b92b4c
ba2e27a
15068fa
a83a259
29a3ad5
8fbc245
a99de72
d87c258
eceeb75
1aab0e3
54d04cb
dcc3758
e800b1a
34b466c
4facbad
33d13fd
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m5r @sugat009 I have realized a complication here while looking at the code over in #9281. Basically, we pass the
limit
through to the Couch view query and so we knowdocs.length === limit
or we have reached the last page of results. However, before this function returns the persons, we are filtering the array. In theory it is possible that some of the results returned fromqueryDocsByKey
could get filtered out (with our current validation logic I cannot come up with a feasible scenario where this could happen, but if/when we add more validations, this will become an issue).The reason I think this filtering is a problem is because it could result in us giving back an array for the page that is
< limit
when there are still additional results to retrieve. Of course, this is bad for ourgetAll
logic, but it will also probably be confusing to any other consumers of this code. (How would anyone know when to stop callinggetPage
?).The only feasible option I can come up with here is to change the return type to
Promise<Nullable<Person.v1.Person>[]>
and just pass throughnull
when a returned doc fails theisPerson
check. If we try to do anything more fancy, like calling thequeryDocsByKey
again until we get the full limit of persons, it is going to mess up theskip
value for any subsequent calls from the consumer... The only comfort I have here is that we do not really want folks directly calling this function anyway (and thegetAll
can still abstract away theNullable
). The REST api is going to be a bit janky, but I just don't see any other way around it.Do either of you have other thoughts here? Ideas on better options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I'm surprised that we even need a
isPerson
validator since we explicitly pass a validpersonType
to the database. Anyways, @jkuester is right, if in some way an entry is filtered by theisPerson
function then the sanctity of the pagination would be disturbed, so let's have anull
value in the place of the non-person doc.As an alternative approach, instead of setting the document entry to null, we could introduce an invalid property or similar indicator. This would signify that the
person
document contains incorrect or problematic data, without completely removing the entry and let the consumers decide what to do with it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. I have gone back and forth on if we should keep the
isPerson
check since all it is doing currently is validating the contact type (which, in theory should always be correct since that is what we are querying one...). However, regardless of if we need to do any checks like this currently, I do think it is very important that we design these API patterns in such a way that we can filter data we get from Couch before returning it to the user. One of the main points of having cht-datasource is that not everything needs to be modeled directly in the db.... With that in mind, and seeing as the performance cost ofisPerson
is minimal, it makes sense to me to keep it for the sake of code consistency (and better support for future validations).Quote of the day! 🔥
This is an interesting idea! 🤔 Similar to the Pouch API which returns the row objects with
id
value, etc, but may or may not actually have adoc
. Ultimately, though, it seems like it might not actually be useful to consumers of this API (and anyone trying to do data validation will probably want to connect directly to the database).Another approach we could take here would be to return some kind of page object like this:
This is similar to the paging in GitHub's GraphQL api, where each request returns a different
cursor
that you can pass back in on subsequent requests to get the next page. In our case we could just return the properskip
value for the next Couch query, but the meaning/value of thecursor
should be completely abstracted away from consumers of the cht-datasource api. Consumers can just treat it as a token string for getting the next page. That would give us flexibility in the future if we needed to change up some underlying details of how things work. In the context of the above discussion regardingisPerson
, this approach would let us run followup queries to populate a full list of persons to return so thatentries.length === limit
while returning the properskip
value for the next page request as thecursor
.I don't know guys, the more I think about this approach, the more I like it! It decouples us more from the underlying Couch implementation, which I think is a win, giving us more flexibility for future innovation....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One variation on this new page object proposal would be to include a
hasNextPage: boolean
in the returned page object so that consumers know when more data is available. Then, we could clarify that the providedlimit
would be the max value possible forentries
, (but some pages might be returned with less). This would free us from having to implement any kind of followup queries in thegetPage
logic (at least at this time). We could run the Couch query, do any necessary filtering, and return what is left.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester Cursor-based pagination sounds great and would help if any record is filtered by the
isPerson
or any future filters. As for the implementation, should we do it via:overfetching
method where we fetch more docs than the limit specifies such that if records are filtered out then, the over-fetched docs can fill inre-fetching
method where we re-query the database with the amount of docs that were filtered out.Usually, it is said to be better to minimize the no. of queries to the database which is even more true in our case. 😃 So, my vote is in the first method. What do you think?
Looping in @m5r .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the linked sites by @m5r and tried a bit on our couchdb view(
medic-client/contacts_by_type
). Thestartkey
as a cursor approach does not seem to work for the view as all the keys are person types. We probably need to create a new view designed specifically for pagination.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tried it yet, but at the end of that Couch guide, it notes that for views where the key is not unique, you will need to use a combination of
startkey
,startkey_docid
, andlimit
. Thestartkey_docid
is what will actual do the heavy lifting for our contacts_by_type query since we are not actually looking for a range of keys...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Josh, the fact that the
key
is the same is what enables couch to do the pagination here. I gave it a try locally and it should do exactly what we need:Querying the view for persons:
Then using the person with id
6892b0eb-7398-47a9-8e92-d40d3012f9f6
as cursor:Then limiting the number of rows that I get back:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh! Gotcha. I was under the impression that the
key
also needed to be a uuid. Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so just recapping further discussion from the Slack Thread, it turns out that Pouch does not have any support for the
startkey_docid
parameter. 😓 Also from reading the Couch docs it seems like the performance forskip
significantly improved after Couch1.2
(I guess the guide linked above was old?). The Pouch Docs still note thatskip
has "poor performance on IndexedDB/LevelDB", but there does not seem to be any other way to actually page queries with Pouch, so... 🤷So the decision is to stick with just using
skip
to page the queries for now. We should still do the refactor to return the skip value as a genericcursor
to consumers instead of actually exposingskip
. That will let us passively refactor our implementation once pouchdb/pouchdb#7326 is completed.