Elastic search search_after pagination #4473

hamza-56 · 2024-10-23T09:16:46Z

Testing instructions:

GET /api/v2/search/all/

page_size parameter limits the number of results returned.
search_after parameter enables pagination by returning results that appear after a specified document.

Example:

GET http://localhost:18381/api/v2/search/all/?page_size=1&search_after=[1728521062000,%22program:0b41be9d-10eb-4e10-ba2e-b2e47479e7b9%22]

course_discovery/apps/course_metadata/search_indexes/constants.py

course_discovery/apps/edx_elasticsearch_dsl_extensions/search.py

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py

course_discovery/apps/api/v2/urls.py

course_discovery/apps/api/v2/views/search.py

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py

docs/decisions/0030-use-elasticsearch-search-after.rst

DawoudSheraz · 2024-12-13T17:36:48Z

docs/decisions/0030-use-elasticsearch-search-after.rst

+---------
+ElasticSearch enforces a strict limit on the number of records that can be retrieved in a single query. 
+This limit is controlled by the `MAX_RESULT_WINDOW` setting, which defaults to 10,000. 
+When this limit is exceeded, data loss occurs in responses retrieved from the `api/v1/search/all/` endpoint. 


Not sure the issue was restricted to search/all only, any endpoint using ES (like catalog endpoints) would also face the issues.

Updated ✅

docs/decisions/0030-use-elasticsearch-search-after.rst

DawoudSheraz · 2024-12-13T17:38:30Z

docs/decisions/0030-use-elasticsearch-search-after.rst

+
+To address this issue, we need a more efficient way to paginate large query results. 
+The solution must allow for seamless and reliable pagination without imposing excessive resource demands on the system. 
+Furthermore, it should ensure that the existing search functionality and search responses remain unaffected in the current version of the endpoint.


This particular line serves no purpose w.r.t context.

docs/decisions/0030-use-elasticsearch-search-after.rst

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py

AfaqShuaib09 · 2024-12-16T11:20:23Z

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py

@@ -99,6 +101,41 @@ class CustomPageNumberPagination(PageNumberPagination):
    page_size_query_param = 'page_size'


+class SearchAfterPagination(PageNumberPagination):


nit: we can move this to pagination.py file

There is already an existing CustomPageNumberPagination class in viewsets.py. Ideally, both classes should be moved to pagination.py. However, following the current pattern, I created the new pagination class in the same file for now.

hmm, i think we can move both of these classes to pagination.py

AfaqShuaib09 · 2024-12-16T11:21:20Z

course_discovery/apps/course_metadata/search_indexes/documents/program.py

@@ -77,6 +77,9 @@ class ProgramDocument(BaseDocument, OrganizationsMixin):
    def prepare_aggregation_key(self, obj):
        return 'program:{}'.format(obj.uuid)

+    def prepare_aggregation_uuid(self, obj):


[thinking] Can we add prepare_aggregation_uuid to BaseDocument since it is the same across all the Model Documents?

It is not the same, as the model names are different; each aggregation UUID includes the model name followed by the UUID.

course_discovery/apps/api/v2/views/search.py

docs/decisions/0030-use-elasticsearch-search-after.rst

course_discovery/apps/api/v2/tests/__init__.py

course_discovery/apps/api/v2/tests/test_views/test_search.py

course_discovery/apps/course_metadata/search_indexes/serializers/common.py

AfaqShuaib09

Just a one comment to address, Overall very nice work on this 🎉

course_discovery/apps/course_metadata/search_indexes/serializers/common.py

Ali-D-Akbar

Looks good overall, functionality wise. Great work 👏

Ali-D-Akbar · 2024-12-19T20:12:29Z

course_discovery/apps/api/v2/serializers.py

@@ -0,0 +1,131 @@
+from course_discovery.apps.course_metadata.search_indexes import documents


nit: Add Docstring please

A little consistency among the docstrings in classes would be nice i.e. add docstrings for all classes except for meta. 🫡

Ali-D-Akbar · 2024-12-19T20:14:12Z

course_discovery/apps/api/v2/tests/test_views/test_search.py

@@ -0,0 +1,113 @@
+import json


nit: Please add docstring in all the file.

Ali-D-Akbar · 2024-12-23T20:15:33Z

course_discovery/apps/api/v2/tests/test_views/test_search.py

+
+
+@ddt.ddt
+class AggregateSearchViewSetV2Tests(mixins.LoginMixin, ElasticsearchTestMixin, mixins.APITestCase):


A little docstring with updated descriptions for classes and tests would be really nice since the unit tests are a little hard to understand by the first look.

Ali-D-Akbar · 2024-12-23T20:17:21Z

course_discovery/apps/api/v2/tests/test_views/test_search.py

+        if search_after:
+            query_params["search_after"] = search_after
+        response = self.client.get(self.list_path, data=query_params)
+        assert response.status_code == 200


Optionally, please update normal assert == to assertEqual to make it look cool.

Ali-D-Akbar · 2024-12-23T20:20:33Z

course_discovery/apps/api/v2/serializers.py

+
+    class Meta(CourseRunSearchDocumentSerializer.Meta):
+        document = CourseRunDocument
+        fields = CourseRunSearchDocumentSerializer.Meta.fields + SEARCH_INDEX_ADDITIONAL_FIELDS_V2


Optionally, you can turn these into a mixin like:

class V2SerializerMixin: """ Mixin to extend the fields attribute in the Meta class of serializers. """ @staticmethod def extend_fields(base_fields): return base_fields + SEARCH_INDEX_ADDITIONAL_FIELDS_V2

and use it like
fields = V2SerializerMixin.extend_fields(CourseRunSearchDocumentSerializer.Meta.fields)

Ali-D-Akbar · 2024-12-23T20:28:17Z

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py

@@ -1,3 +1,5 @@
+import json


File Docstring plis.

chore: elastic search search_after pagination

864c81d

hamza-56 self-assigned this Oct 23, 2024

hamza-56 added 5 commits October 31, 2024 14:39

chore: enabled 'sort' in search response

438988c

chore: fix searchafter pagination

68b975e

refactor: search after search

fc36e37

chore: add missing items in get_paginated_response

197c9bb

refactor: AggregateSearchViewSet v2

970c92f

hamza-56 force-pushed the hamza/PROD-4012 branch from 950d3ae to 970c92f Compare November 28, 2024 10:53

DawoudSheraz reviewed Nov 28, 2024

View reviewed changes

course_discovery/apps/course_metadata/search_indexes/constants.py Outdated Show resolved Hide resolved

DawoudSheraz reviewed Nov 28, 2024

View reviewed changes

course_discovery/apps/edx_elasticsearch_dsl_extensions/search.py Show resolved Hide resolved

DawoudSheraz reviewed Nov 28, 2024

View reviewed changes

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py Outdated Show resolved Hide resolved

hamza-56 marked this pull request as ready for review November 29, 2024 00:46

refactor: SearchAfterSearchh

0d3017e

hamza-56 requested review from zawan-ila, AfaqShuaib09 and Ali-D-Akbar November 29, 2024 00:57

DawoudSheraz reviewed Dec 2, 2024

View reviewed changes

course_discovery/apps/api/v2/urls.py Show resolved Hide resolved

DawoudSheraz reviewed Dec 2, 2024

View reviewed changes

course_discovery/apps/api/v2/views/search.py Show resolved Hide resolved

DawoudSheraz reviewed Dec 2, 2024

View reviewed changes

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py Show resolved Hide resolved

DawoudSheraz reviewed Dec 2, 2024

View reviewed changes

course_discovery/apps/edx_elasticsearch_dsl_extensions/viewsets.py Outdated Show resolved Hide resolved

hamza-56 force-pushed the hamza/PROD-4012 branch from 03fc7da to 1563bcc Compare December 2, 2024 11:23

fix: ci_quality checks

aa2cfcd

hamza-56 force-pushed the hamza/PROD-4012 branch from 1563bcc to aa2cfcd Compare December 2, 2024 13:12

hamza-56 and others added 3 commits December 2, 2024 18:14

Merge branch 'master' into hamza/PROD-4012

68a942b

fix: ci_quality checks

6291fba

feat: v2 search document serializers

34df2bd

hamza-56 force-pushed the hamza/PROD-4012 branch from 40044f3 to 162887f Compare December 3, 2024 00:28

fix: ci_quality checks

fa954e2

hamza-56 force-pushed the hamza/PROD-4012 branch from 162887f to fa954e2 Compare December 3, 2024 10:19

chore: add ADR

2e3bb77

hamza-56 and others added 5 commits December 12, 2024 05:26

chore: add docstrings

c52d9b1

Merge branch 'master' into hamza/PROD-4012

1dad603

chore: add more details in ADR

691bafe

chore: move new serializers to v2

02d2691

fix: ci_quality checks

61452a1

hamza-56 force-pushed the hamza/PROD-4012 branch from 86cf6bf to 61452a1 Compare December 12, 2024 13:49

DawoudSheraz reviewed Dec 13, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Outdated Show resolved Hide resolved

DawoudSheraz reviewed Dec 13, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Outdated Show resolved Hide resolved

DawoudSheraz reviewed Dec 13, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Show resolved Hide resolved

DawoudSheraz reviewed Dec 13, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Outdated Show resolved Hide resolved

DawoudSheraz reviewed Dec 13, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Outdated Show resolved Hide resolved

DawoudSheraz reviewed Dec 13, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Show resolved Hide resolved

AfaqShuaib09 reviewed Dec 16, 2024

View reviewed changes

docs/decisions/0030-use-elasticsearch-search-after.rst Outdated Show resolved Hide resolved

hamza-56 and others added 2 commits December 18, 2024 07:08

chore: add more details in ADR

db3e10b

Merge branch 'master' into hamza/PROD-4012

c317af8

hamza-56 force-pushed the hamza/PROD-4012 branch from 9de1d21 to d559188 Compare December 19, 2024 03:05

AfaqShuaib09 reviewed Dec 19, 2024

View reviewed changes

hamza-56 force-pushed the hamza/PROD-4012 branch from d559188 to 8ed1875 Compare December 19, 2024 13:57

hamza-56 requested a review from AfaqShuaib09 December 19, 2024 13:57

AfaqShuaib09 approved these changes Dec 23, 2024

View reviewed changes

course_discovery/apps/course_metadata/search_indexes/serializers/common.py Outdated Show resolved Hide resolved

hamza-56 force-pushed the hamza/PROD-4012 branch from 8ed1875 to a051fb2 Compare December 23, 2024 10:15

Ali-D-Akbar approved these changes Dec 23, 2024

View reviewed changes

chore: fix coverage

7d731c9

hamza-56 force-pushed the hamza/PROD-4012 branch from a051fb2 to 7d731c9 Compare December 26, 2024 09:52

Merge branch 'master' into hamza/PROD-4012

b9f8f7d

hamza-56 merged commit c3261d4 into master Dec 26, 2024
14 checks passed

hamza-56 deleted the hamza/PROD-4012 branch December 26, 2024 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic search search_after pagination #4473

Elastic search search_after pagination #4473

hamza-56 commented Oct 23, 2024 •

edited

Loading

DawoudSheraz Dec 13, 2024

hamza-56 Dec 18, 2024

DawoudSheraz Dec 13, 2024

AfaqShuaib09 Dec 16, 2024

hamza-56 Dec 18, 2024 •

edited

Loading

AfaqShuaib09 Dec 18, 2024

AfaqShuaib09 Dec 16, 2024

hamza-56 Dec 18, 2024

AfaqShuaib09 left a comment

Ali-D-Akbar left a comment

Ali-D-Akbar Dec 19, 2024

Ali-D-Akbar Dec 23, 2024

hamza-56 Dec 26, 2024

Ali-D-Akbar Dec 19, 2024

hamza-56 Dec 26, 2024

Ali-D-Akbar Dec 23, 2024

hamza-56 Dec 26, 2024

Ali-D-Akbar Dec 23, 2024

Ali-D-Akbar Dec 23, 2024

Ali-D-Akbar Dec 23, 2024

		@@ -99,6 +101,41 @@ class CustomPageNumberPagination(PageNumberPagination):
		page_size_query_param = 'page_size'


		class SearchAfterPagination(PageNumberPagination):

		@@ -0,0 +1,131 @@
		from course_discovery.apps.course_metadata.search_indexes import documents



		@ddt.ddt
		class AggregateSearchViewSetV2Tests(mixins.LoginMixin, ElasticsearchTestMixin, mixins.APITestCase):

Elastic search search_after pagination #4473

Elastic search search_after pagination #4473

Conversation

hamza-56 commented Oct 23, 2024 • edited Loading

Testing instructions:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hamza-56 Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AfaqShuaib09 left a comment

Choose a reason for hiding this comment

Ali-D-Akbar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hamza-56 commented Oct 23, 2024 •

edited

Loading

hamza-56 Dec 18, 2024 •

edited

Loading