-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance improvement blog #2522
Add performance improvement blog #2522
Conversation
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
|
||
* Queries for ascending and descending sort-after-timestamp saw a significant performance improvement of up to 70x overall. The optimizations introduced (such as [#6424](https://github.com/opensearch-project/OpenSearch/pull/6424) and [#8167](https://github.com/opensearch-project/OpenSearch/issues/8167)) extend across various numeric types, including but not limited to `int`, `short`, `float`, `double`, `date`, and others. | ||
|
||
* Other popular queries such as `search_after` saw an about 60x reduction in latency, attributed to the improvements made in the area involving optimally skipping segments during search (see [#7453](https://github.com/opensearch-project/OpenSearch/pull/7453)). The `search_after` queries can be used as the recommended alternative to scroll queries for a better search experience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a line item after:
- Implementation support for match_only_text field to optimize on storage and indexing/search latency for text queries is in progress (#11039).
|
||
* Hourly aggregations and multi-term aggregations also demonstrated improvement, varying from 5% to 35%, attributed to similar time-series improvements discussed previously. | ||
|
||
* `date_histograms` and `date_histogram_agg` queries exhibited either comparable or slightly decreased performance, ranging from 5% to around 20% in multi-node environments. These issues are actively being addressed as part of the ongoing project efforts (see [#11083](https://github.com/opensearch-project/OpenSearch/pull/11083)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a line item under Time Series:
- For the date histogram aggregations, there are upcoming changes aiming to improve the performance by rounding-down dates to the nearest interval (such as year, quarter, month, week, day) using SIMD (#11194).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Fanit Kolchina <[email protected]>
|
||
OpenSearch is a community-driven, open source search and analytics suite used by developers to ingest, search, visualize, and analyze data. [Introduced in January 2021](https://aws.amazon.com/blogs/opensource/stepping-up-for-a-truly-open-source-elasticsearch/), the OpenSearch Project originated as an open source fork of Elasticsearch 7.10.2. OpenSearch 1.0 was released for production usage in [July 2021](https://opensearch.org/blog/opensearch-general-availability-announcement/) and is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) (ALv2), with the complete codebase [published to GitHub](https://github.com/opensearch-project). The project has consistently focused on improving performance of its core open source engine for high volume indexing and low latency search operations. OpenSearch aims to provide the best experience for every user through driving down latency and improving efficiency. | ||
|
||
In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the [planned roadmap](https://github.com/orgs/opensearch-project/projects/153/views/1) of improvements in open source. We'll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) to the state just before the OpenSearch fork, Elasticsearch 7.10.2. We'll highlight continuous advancements made in the OpenSearch core engine, ongoing feature enhancements centered around the popular log analytics and search use cases, and plans to drive improvements for which we are seeking community collaboration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting a below rewrite of this paragraph, also attributing the community for advancements:
"In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the planned roadmap of improvements in open source. We’ll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) with a specific focus on its advancements, to the state just before the OpenSearch fork. For this purpose, we have chosen Elasticsearch 7.10.2 to represent the baseline where OpenSearch was forked from, allowing us to measure all changes that were delivered after the fork (OpenSearch 1.0-2.11). These progressions were realized through collaborative efforts with the community, and OpenSearch is actively seeking to enhance community engagement, specifically in the field of improving performance."
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
categories: | ||
- technical-posts | ||
- community | ||
meta_keywords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the following meta:
meta_keywords: OpenSearch performance improvements, OpenSearch roadmap, high volume indexing, low latency search
meta_description: Learn more about the OpenSearch Project roadmap and how the project improved the performance of its core open source engine to drive down latency and improve efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws @getsaurabh02 Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!
_authors/sisurab.markdown
Outdated
@@ -6,4 +6,4 @@ github: getsaurabh02 | |||
linkedin: getsaurabh02 | |||
--- | |||
|
|||
**Saurabh** is a Senior Software Engineer working on OpenSearch at Amazon Web Services. He is passionate about solving problems in the large-scale distributed systems. He is an active contributor to OpenSearch. | |||
**Saurabh Singh** is an Engineering Lead working on OpenSearch at Amazon Web Services, leading the core search performance space. He is passionate about solving problems in the large-scale distributed systems. He is an active OpenSearch contributor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the last sentence necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@getsaurabh02 What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with either.
|
||
**Setup**: OpenSearch 2.11.0 single node (r5.2xlarge) with 64 GB RAM and 32 GB heap. Index settings: 1 shard and 0 replicas. | ||
|
||
**`nyc_taxis` workload results:** The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**`nyc_taxis` workload results:** The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements. | |
**`nyc_taxis` workload results**: The following table provides a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements. |
</tr> | ||
</table> | ||
|
||
**`http_logs` workload results:** The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**`http_logs` workload results:** The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements. | |
**`http_logs` workload results**: The following table provides a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements. |
|
||
* * * | ||
|
||
*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* | |
*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik in writing this blog post. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* |
|
||
* * * | ||
|
||
*We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm that "McCandles" shouldn't be "McCandless".
|
||
* An increase in performance with aggregate queries on workloads such as `nyc_taxis` workload, showcasing an improvement ranging between 50% to 70% over the default configuration. | ||
* The log analytics use cases for range queries demonstrated an improvement of around 65%. | ||
* Aggregation queries with hourly data aggregations, such as those for the `http_logs` `hourly_agg` workload, demonstrated a boost of up to 50% in performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"such as those for the hourly_agg
operation on the http_logs
workload" (hourly_agg
is not a workload)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@getsaurabh02 Could you confirm that we can make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, OR "such as those for the http_logs
workload"
yes - hourly_agg is not a workload)
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@pajuric The blog is ready to publish. Thanks! |
@nateynateynate @dtaivpp @krisfreedain - This blog is ready to push live. If possible, can we get this out by 12PM PST today, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Great job!
|
||
## Appendix: Detailed execution and results | ||
|
||
If you're interested in the details of the performance benchmarks we used, exploring the methodologies behind their execution, or examining the comprehensive results, keep reading. For OpenSearch users interested in establishing benchmarks and replicating these runs, we've provided comprehensive setup details alongside each result. This section provides the core engine performance comparison between the latest OpenSearch version (OpenSearch 2.11) and the state just before the OpenSearch fork, Elasticsearch 7.10.2, with a mid-point performance measurement on OpenSearch 2.3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a line at the end of this paragraph:
"Also, we've identified items in the performance roadmap that require active enhancements due to observed regressions in specific areas."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh looks like its merged! 👍
Description
Adds the OpenSearch performance improvements blog
Issues Resolved
Closes #2477
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.