-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic search search_after pagination #4473
Conversation
950d3ae
to
970c92f
Compare
course_discovery/apps/course_metadata/search_indexes/constants.py
Outdated
Show resolved
Hide resolved
course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py
Outdated
Show resolved
Hide resolved
course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py
Outdated
Show resolved
Hide resolved
03fc7da
to
1563bcc
Compare
1563bcc
to
aa2cfcd
Compare
40044f3
to
162887f
Compare
162887f
to
fa954e2
Compare
86cf6bf
to
61452a1
Compare
--------- | ||
ElasticSearch enforces a strict limit on the number of records that can be retrieved in a single query. | ||
This limit is controlled by the `MAX_RESULT_WINDOW` setting, which defaults to 10,000. | ||
When this limit is exceeded, data loss occurs in responses retrieved from the `api/v1/search/all/` endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the issue was restricted to search/all only, any endpoint using ES (like catalog endpoints) would also face the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated ✅
|
||
To address this issue, we need a more efficient way to paginate large query results. | ||
The solution must allow for seamless and reliable pagination without imposing excessive resource demands on the system. | ||
Furthermore, it should ensure that the existing search functionality and search responses remain unaffected in the current version of the endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular line serves no purpose w.r.t context.
course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py
Outdated
Show resolved
Hide resolved
course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py
Outdated
Show resolved
Hide resolved
@@ -99,6 +101,41 @@ class CustomPageNumberPagination(PageNumberPagination): | |||
page_size_query_param = 'page_size' | |||
|
|||
|
|||
class SearchAfterPagination(PageNumberPagination): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can move this to pagination.py
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an existing CustomPageNumberPagination
class in viewsets.py
. Ideally, both classes should be moved to pagination.py
. However, following the current pattern, I created the new pagination class in the same file for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i think we can move both of these classes to pagination.py
@@ -77,6 +77,9 @@ class ProgramDocument(BaseDocument, OrganizationsMixin): | |||
def prepare_aggregation_key(self, obj): | |||
return 'program:{}'.format(obj.uuid) | |||
|
|||
def prepare_aggregation_uuid(self, obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[thinking] Can we add prepare_aggregation_uuid
to BaseDocument since it is the same across all the Model Documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the same, as the model names are different; each aggregation UUID includes the model name followed by the UUID.
9de1d21
to
d559188
Compare
course_discovery/apps/course_metadata/search_indexes/serializers/common.py
Outdated
Show resolved
Hide resolved
d559188
to
8ed1875
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a one comment to address, Overall very nice work on this 🎉
course_discovery/apps/course_metadata/search_indexes/serializers/common.py
Outdated
Show resolved
Hide resolved
8ed1875
to
a051fb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, functionality wise. Great work 👏
@@ -0,0 +1,131 @@ | |||
from course_discovery.apps.course_metadata.search_indexes import documents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add Docstring please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little consistency among the docstrings in classes would be nice i.e. add docstrings for all classes except for meta. 🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ✅
@@ -0,0 +1,113 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please add docstring in all the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ✅
|
||
|
||
@ddt.ddt | ||
class AggregateSearchViewSetV2Tests(mixins.LoginMixin, ElasticsearchTestMixin, mixins.APITestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little docstring with updated descriptions for classes and tests would be really nice since the unit tests are a little hard to understand by the first look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ✅
if search_after: | ||
query_params["search_after"] = search_after | ||
response = self.client.get(self.list_path, data=query_params) | ||
assert response.status_code == 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally, please update normal assert ==
to assertEqual to make it look cool.
|
||
class Meta(CourseRunSearchDocumentSerializer.Meta): | ||
document = CourseRunDocument | ||
fields = CourseRunSearchDocumentSerializer.Meta.fields + SEARCH_INDEX_ADDITIONAL_FIELDS_V2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally, you can turn these into a mixin like:
class V2SerializerMixin:
"""
Mixin to extend the fields attribute in the Meta class of serializers.
"""
@staticmethod
def extend_fields(base_fields):
return base_fields + SEARCH_INDEX_ADDITIONAL_FIELDS_V2
and use it like
fields = V2SerializerMixin.extend_fields(CourseRunSearchDocumentSerializer.Meta.fields)
@@ -1,3 +1,5 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File Docstring plis.
a051fb2
to
7d731c9
Compare
Testing instructions:
GET /api/v2/search/all/
page_size
parameter limits the number of results returned.search_after
parameter enables pagination by returning results that appear after a specified document.Example:
GET http://localhost:18381/api/v2/search/all/?page_size=1&search_after=[1728521062000,%22program:0b41be9d-10eb-4e10-ba2e-b2e47479e7b9%22]