Add performance improvement blog #2522

kolchfa-aws · 2024-01-04T00:41:29Z

Description

Adds the OpenSearch performance improvements blog

Issues Resolved

Closes #2477

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Signed-off-by: Fanit Kolchina <[email protected]>

getsaurabh02 · 2024-01-05T19:26:33Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+* Queries for ascending and descending sort-after-timestamp saw a significant performance improvement of up to 70x overall. The optimizations introduced (such as [#6424](https://github.com/opensearch-project/OpenSearch/pull/6424) and [#8167](https://github.com/opensearch-project/OpenSearch/issues/8167)) extend across various numeric types, including but not limited to `int`, `short`, `float`, `double`, `date`, and others.
+
+* Other popular queries such as `search_after` saw an about 60x reduction in latency, attributed to the improvements made in the area involving optimally skipping segments during search (see [#7453](https://github.com/opensearch-project/OpenSearch/pull/7453)). The `search_after` queries can be used as the recommended alternative to scroll queries for a better search experience.


Can we add a line item after:

Implementation support for match_only_text field to optimize on storage and indexing/search latency for text queries is in progress (#11039).

getsaurabh02 · 2024-01-05T19:27:17Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+* Hourly aggregations and multi-term aggregations also demonstrated improvement, varying from 5% to 35%, attributed to similar time-series improvements discussed previously.
+
+* `date_histograms` and `date_histogram_agg` queries exhibited either comparable or slightly decreased performance, ranging from 5% to around 20% in multi-node environments. These issues are actively being addressed as part of the ongoing project efforts (see [#11083](https://github.com/opensearch-project/OpenSearch/pull/11083)).


Also a line item under Time Series:

For the date histogram aggregations, there are upcoming changes aiming to improve the performance by rounding-down dates to the nearest interval (such as year, quarter, month, week, day) using SIMD (#11194).

Signed-off-by: Fanit Kolchina <[email protected]>

getsaurabh02 · 2024-01-05T22:07:56Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+OpenSearch is a community-driven, open source search and analytics suite used by developers to ingest, search, visualize, and analyze data. [Introduced in January 2021](https://aws.amazon.com/blogs/opensource/stepping-up-for-a-truly-open-source-elasticsearch/), the OpenSearch Project originated as an open source fork of Elasticsearch 7.10.2. OpenSearch 1.0 was released for production usage in [July 2021](https://opensearch.org/blog/opensearch-general-availability-announcement/) and is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) (ALv2), with the complete codebase [published to GitHub](https://github.com/opensearch-project).  The project has consistently focused on improving performance of its core open source engine for high volume indexing and low latency search operations. OpenSearch aims to provide the best experience for every user through driving down latency and improving efficiency.
+
+In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the [planned roadmap](https://github.com/orgs/opensearch-project/projects/153/views/1) of improvements in open source. We'll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) to the state just before the OpenSearch fork, Elasticsearch 7.10.2. We'll highlight continuous advancements made in the OpenSearch core engine, ongoing feature enhancements centered around the popular log analytics and search use cases, and plans to drive improvements for which we are seeking community collaboration.


Suggesting a below rewrite of this paragraph, also attributing the community for advancements:

"In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the planned roadmap of improvements in open source. We’ll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) with a specific focus on its advancements, to the state just before the OpenSearch fork. For this purpose, we have chosen Elasticsearch 7.10.2 to represent the baseline where OpenSearch was forked from, allowing us to measure all changes that were delivered after the fork (OpenSearch 1.0-2.11). These progressions were realized through collaborative efforts with the community, and OpenSearch is actively seeking to enhance community engagement, specifically in the field of improving performance."

Signed-off-by: Fanit Kolchina <[email protected]>

pajuric · 2024-01-09T23:02:53Z

_posts/2024-01-03-opensearch-performance-improvements.md

+categories:
+    - technical-posts
+    - community
+meta_keywords: 


Please add the following meta:

meta_keywords: OpenSearch performance improvements, OpenSearch roadmap, high volume indexing, low latency search
meta_description: Learn more about the OpenSearch Project roadmap and how the project improved the performance of its core open source engine to drive down latency and improve efficiency.

natebower

@kolchfa-aws @getsaurabh02 Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!

_authors/dagney.markdown

_authors/pallp.markdown

_authors/rishabhsi.markdown

_authors/sisurab.markdown

natebower · 2024-01-10T12:23:40Z

_authors/sisurab.markdown

@@ -6,4 +6,4 @@ github: getsaurabh02
 linkedin: getsaurabh02
 ---

-**Saurabh** is a Senior Software Engineer working on OpenSearch at Amazon Web Services. He is passionate about solving problems in the large-scale distributed systems. He is an active contributor to OpenSearch.
+**Saurabh Singh** is an Engineering Lead working on OpenSearch at Amazon Web Services, leading the core search performance space. He is passionate about solving problems in the large-scale distributed systems. He is an active OpenSearch contributor.


Is the last sentence necessary here?

@getsaurabh02 What do you think?

I am fine with either.

natebower · 2024-01-10T14:29:23Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+**Setup**: OpenSearch 2.11.0 single node (r5.2xlarge) with 64 GB RAM and 32 GB heap. Index settings: 1 shard and 0 replicas.
+
+**`nyc_taxis` workload results:** The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements.


Suggested change

**`nyc_taxis` workload results:** The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements.

**`nyc_taxis` workload results**: The following table provides a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements.

natebower · 2024-01-10T14:29:58Z

_posts/2024-01-03-opensearch-performance-improvements.md

+    </tr>
+</table>
+
+**`http_logs` workload results:** The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements.


Suggested change

**`http_logs` workload results:** The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements.

**`http_logs` workload results**: The following table provides a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements.

natebower · 2024-01-10T14:31:56Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+* * *
+
+*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.*


Suggested change

*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.*

*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik in writing this blog post. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.*

natebower · 2024-01-10T14:32:42Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+* * *
+
+*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.*


Confirm that "McCandles" shouldn't be "McCandless".

natebower · 2024-01-10T14:39:19Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+* An increase in performance with aggregate queries on workloads such as `nyc_taxis` workload, showcasing an improvement ranging between 50% to 70% over the default configuration.
+* The log analytics use cases for range queries demonstrated an improvement of around 65%.
+* Aggregation queries with hourly data aggregations, such as those for the `http_logs` `hourly_agg` workload, demonstrated a boost of up to 50% in performance.


"such as those for the hourly_agg operation on the http_logs workload" (hourly_agg is not a workload)?

@getsaurabh02 Could you confirm that we can make this change?

looks good to me, OR "such as those for the http_logs workload"
yes - hourly_agg is not a workload)

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws · 2024-01-10T17:30:56Z

@pajuric The blog is ready to publish. Thanks!

pajuric · 2024-01-10T17:45:34Z

@nateynateynate @dtaivpp @krisfreedain - This blog is ready to push live. If possible, can we get this out by 12PM PST today, please.

nateynateynate

Looks good to me! Great job!

getsaurabh02 · 2024-01-10T18:52:08Z

_posts/2024-01-03-opensearch-performance-improvements.md

+
+## Appendix: Detailed execution and results
+
+If you're interested in the details of the performance benchmarks we used, exploring the methodologies behind their execution, or examining the comprehensive results, keep reading. For OpenSearch users interested in establishing benchmarks and replicating these runs, we've provided comprehensive setup details alongside each result. This section provides the core engine performance comparison between the latest OpenSearch version (OpenSearch 2.11) and the state just before the OpenSearch fork, Elasticsearch 7.10.2, with a mid-point performance measurement on OpenSearch 2.3. 


Can we add a line at the end of this paragraph:

"Also, we've identified items in the performance roadmap that require active enhancements due to observed regressions in specific areas."

ohh looks like its merged! 👍

Add performance improvement blog

7810baf

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws self-assigned this Jan 4, 2024

kolchfa-aws requested review from elfisher, AMoo-Miki, nknize, krisfreedain, peterzhuamazon, CEHENKLE, dtaivpp and nateynateynate as code owners January 4, 2024 00:41

kolchfa-aws added 9 commits January 4, 2024 08:02

Fix typo

1738c6b

Signed-off-by: Fanit Kolchina <[email protected]>

Add Rishab as an author and style changes

2cacb12

Signed-off-by: Fanit Kolchina <[email protected]>

Modify background color placement

c6d9520

Signed-off-by: Fanit Kolchina <[email protected]>

Update Pallavi bio

888bb9f

Signed-off-by: Fanit Kolchina <[email protected]>

Update infographic

6c1708d

Signed-off-by: Fanit Kolchina <[email protected]>

Add Palavi linkedin

d8ed11e

Signed-off-by: Fanit Kolchina <[email protected]>

Fit tables and give image padding

8e10865

Signed-off-by: Fanit Kolchina <[email protected]>

Added contributor section

3922a7b

Signed-off-by: Fanit Kolchina <[email protected]>

Change acknowledgement format

e82323a

Signed-off-by: Fanit Kolchina <[email protected]>

getsaurabh02 reviewed Jan 5, 2024

View reviewed changes

kolchfa-aws and others added 2 commits January 5, 2024 15:19

Update infographic

47c9048

Signed-off-by: Fanit Kolchina <[email protected]>

Delete _config_temp.yml

bcd8e36

getsaurabh02 reviewed Jan 5, 2024

View reviewed changes

kolchfa-aws added 3 commits January 5, 2024 17:24

Revise Saurabh's bio

407bf40

Signed-off-by: Fanit Kolchina <[email protected]>

Rewrite intro paragraph

14db443

Signed-off-by: Fanit Kolchina <[email protected]>

Add more contributors and Dagney's linkedin

7ef5b35

Signed-off-by: Fanit Kolchina <[email protected]>

pajuric reviewed Jan 9, 2024

View reviewed changes

natebower reviewed Jan 10, 2024

View reviewed changes

kolchfa-aws and others added 2 commits January 10, 2024 09:51

Apply suggestions from code review

a4243af

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

Address editorial comments

a85be30

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws and others added 5 commits January 10, 2024 12:09

More editorial comments

2811a85

Signed-off-by: Fanit Kolchina <[email protected]>

Update _posts/2024-01-03-opensearch-performance-improvements.md

9f6c640

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

Changed Oracle storage name

8d1fd9f

Signed-off-by: Fanit Kolchina <[email protected]>

Remove last sentence from Saurabh's bio

1057626

Signed-off-by: Fanit Kolchina <[email protected]>

Change blog date

8f57d0f

Signed-off-by: Fanit Kolchina <[email protected]>

nateynateynate approved these changes Jan 10, 2024

View reviewed changes

nateynateynate merged commit 4409204 into opensearch-project:main Jan 10, 2024
3 of 4 checks passed

getsaurabh02 reviewed Jan 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance improvement blog #2522

Add performance improvement blog #2522

kolchfa-aws commented Jan 4, 2024

getsaurabh02 Jan 5, 2024

getsaurabh02 Jan 5, 2024

kolchfa-aws Jan 5, 2024

getsaurabh02 Jan 5, 2024 •

edited

Loading

pajuric Jan 9, 2024

natebower left a comment

natebower Jan 10, 2024

kolchfa-aws Jan 10, 2024

getsaurabh02 Jan 10, 2024

natebower Jan 10, 2024

natebower Jan 10, 2024

natebower Jan 10, 2024

natebower Jan 10, 2024

natebower Jan 10, 2024

kolchfa-aws Jan 10, 2024

getsaurabh02 Jan 10, 2024

kolchfa-aws commented Jan 10, 2024

pajuric commented Jan 10, 2024

nateynateynate left a comment

getsaurabh02 Jan 10, 2024

getsaurabh02 Jan 10, 2024


		* Queries for ascending and descending sort-after-timestamp saw a significant performance improvement of up to 70x overall. The optimizations introduced (such as [#6424](https://github.com/opensearch-project/OpenSearch/pull/6424) and [#8167](https://github.com/opensearch-project/OpenSearch/issues/8167)) extend across various numeric types, including but not limited to `int`, `short`, `float`, `double`, `date`, and others.

		* Other popular queries such as `search_after` saw an about 60x reduction in latency, attributed to the improvements made in the area involving optimally skipping segments during search (see [#7453](https://github.com/opensearch-project/OpenSearch/pull/7453)). The `search_after` queries can be used as the recommended alternative to scroll queries for a better search experience.


		* Hourly aggregations and multi-term aggregations also demonstrated improvement, varying from 5% to 35%, attributed to similar time-series improvements discussed previously.

		* `date_histograms` and `date_histogram_agg` queries exhibited either comparable or slightly decreased performance, ranging from 5% to around 20% in multi-node environments. These issues are actively being addressed as part of the ongoing project efforts (see [#11083](https://github.com/opensearch-project/OpenSearch/pull/11083)).


		OpenSearch is a community-driven, open source search and analytics suite used by developers to ingest, search, visualize, and analyze data. [Introduced in January 2021](https://aws.amazon.com/blogs/opensource/stepping-up-for-a-truly-open-source-elasticsearch/), the OpenSearch Project originated as an open source fork of Elasticsearch 7.10.2. OpenSearch 1.0 was released for production usage in [July 2021](https://opensearch.org/blog/opensearch-general-availability-announcement/) and is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) (ALv2), with the complete codebase [published to GitHub](https://github.com/opensearch-project). The project has consistently focused on improving performance of its core open source engine for high volume indexing and low latency search operations. OpenSearch aims to provide the best experience for every user through driving down latency and improving efficiency.

		In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the [planned roadmap](https://github.com/orgs/opensearch-project/projects/153/views/1) of improvements in open source. We'll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) to the state just before the OpenSearch fork, Elasticsearch 7.10.2. We'll highlight continuous advancements made in the OpenSearch core engine, ongoing feature enhancements centered around the popular log analytics and search use cases, and plans to drive improvements for which we are seeking community collaboration.


		Setup: OpenSearch 2.11.0 single node (r5.2xlarge) with 64 GB RAM and 32 GB heap. Index settings: 1 shard and 0 replicas.

		`nyc_taxis` workload results: The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements.

	`nyc_taxis` workload results: The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements.
	`nyc_taxis` workload results: The following table provides a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements.

	`http_logs` workload results: The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements.
	`http_logs` workload results: The following table provides a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements.


		* * *

		We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.


		## Appendix: Detailed execution and results

		If you're interested in the details of the performance benchmarks we used, exploring the methodologies behind their execution, or examining the comprehensive results, keep reading. For OpenSearch users interested in establishing benchmarks and replicating these runs, we've provided comprehensive setup details alongside each result. This section provides the core engine performance comparison between the latest OpenSearch version (OpenSearch 2.11) and the state just before the OpenSearch fork, Elasticsearch 7.10.2, with a mid-point performance measurement on OpenSearch 2.3.

Add performance improvement blog #2522

Add performance improvement blog #2522

Conversation

kolchfa-aws commented Jan 4, 2024

Description

Issues Resolved

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

getsaurabh02 Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natebower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolchfa-aws commented Jan 10, 2024

pajuric commented Jan 10, 2024

nateynateynate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

getsaurabh02 Jan 5, 2024 •

edited

Loading