[BUG] BWC Tests get failed with different shard number for text chunking processor #690

yuye-aws · 2024-04-12T02:09:33Z

What is the bug?

The BWC tests get failed after changing the shard number in index. The error is due to "index not found" even if we assert the response after index creation is true.

How can one reproduce the bug?

Follow the code change in these two issues to change the index setting: #684 #685. And then run the BWC tests.

What is the expected behavior?

The BWC tests should keep consistent and regardless of the index setting.

What is your host/environment?

Linux environment on Github.

Do you have any screenshots?

Error log in these two issues: #684 #685

Do you have any additional context?

No

yuye-aws · 2024-04-17T02:22:31Z

Changing the index setting may not affect other bwc tests. I will close the PR: #685

yuye-aws · 2024-04-23T06:09:44Z

Reopned PR for debugging: #685

yuye-aws · 2024-04-23T06:10:14Z

I will take a look into this feature in the next few days. Feel free to assign this issue to me.

yuye-aws · 2024-05-07T04:19:34Z

Latest observation: BWC tests get passed for 2.14.0-SNAPSHOT but still fail in 2.13.0.

yuye-aws · 2024-05-07T04:39:56Z

BWC tests are failing in version 2.13.0 because the neural search code are fetched from url: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch. Here are a few options to resolve this issue:

Disable text chunking BWC tests for version 2.13.
Fetch artifacts for 2.13 BWC tests: https://artifacts.opensearch.org/releases/bundle/opensearch.
Simply close this issue and do nothing with the code because BWC tests is already passing with shard number at three.
Replace the bundle in https://ci.opensearch.org/ci/dbc/distribution-build-opensearch.
Change the version from "2.13.0" to "2.13.0-SNAPSHOT" in backwards_compatibility_tests_workflow.yml

yuye-aws · 2024-05-07T08:39:58Z

I personally prefer Option 1 and 3.

vibrantvarun · 2024-05-09T16:25:23Z

@martin-gaievski and @navneet1v what are your thoughts on this? Shall we go for 2.13.1?

martin-gaievski · 2024-05-09T16:46:20Z

I think we need a proper fix for this issue, BWC should be executed otherwise we're flying blind. Sounds like options 1 and 3 are essentially ignoring the failure.
What are differences for artifacts that are in https://artifacts.opensearch.org/releases/bundle/opensearch and https://ci.opensearch.org/ci/dbc/distribution-build-opensearch? For 2.13 release that is already finalize some couple of months back it should be same in both locations.

yuye-aws · 2024-05-10T01:06:48Z

https://artifacts.opensearch.org/releases/bundle/opensearch is for unreleased version. These snapshots will be updated according to our latest code.

https://ci.opensearch.org/ci/dbc/distribution-build-opensearch is for released version. They are fixed once the certain version get released.

yuye-aws · 2024-05-10T01:08:42Z

I think we need a proper fix for this issue, BWC should be executed otherwise we're flying blind. Sounds like options 1 and 3 are essentially ignoring the failure.

Actually the failure has been fixed since 2.14. Only open source 2.13 has this failure when user misconfigure their index with improper shard number.

yuye-aws · 2024-05-14T03:11:12Z

Hi @martin-gaievski and @vibrantvarun , what's your suggestion on the "proper fix" for this issue?

vibrantvarun · 2024-05-15T16:26:24Z

Releasing 2.13.1 is the only option for bwc tests to pick up the artifact from ci url . However, you can once try running the tests with 2.13.0-SNAPSHOT. If it passes with 2.13.0-SNAPSHOT I have no issues in closing this.

cc: @martin-gaievski

martin-gaievski · 2024-05-15T23:19:10Z

+1 to @vibrantvarun comment, having successful test run for 2.13.0 should be enough

yuye-aws · 2024-05-16T00:36:28Z

Releasing 2.13.1 is the only option for bwc tests to pick up the artifact from ci url . However, you can once try running the tests with 2.13.0-SNAPSHOT. If it passes with 2.13.0-SNAPSHOT I have no issues in closing this.

cc: @martin-gaievski

I can raise a PR changing the snapshot to 2.13.0-SNAPSHOT, but I am not sure whether it makes sense to change the workflow due to a single test, even when the test is actually passing with the current setting.

yuye-aws · 2024-05-20T15:24:53Z

Releasing 2.13.1 is the only option for bwc tests to pick up the artifact from ci url . However, you can once try running the tests with 2.13.0-SNAPSHOT. If it passes with 2.13.0-SNAPSHOT I have no issues in closing this.

cc: @martin-gaievski

I am not sure whether you mean we need to update our CI workflow with this PR: #752

yuye-aws · 2024-05-20T15:26:00Z

From my observation, CI still gets failed in https://github.com/opensearch-project/neural-search/actions/runs/9157136803/job/25172907185?pr=684 with updated CI workflow. We can deep dive this error during the meeting.

martin-gaievski · 2024-05-21T20:48:04Z

I've restarted that run for 2.13.0-SNAPSHOT and this time it's successful https://github.com/opensearch-project/neural-search/actions/runs/9157136803/job/25247195168?pr=684. Same versions for different platform is also green - https://github.com/opensearch-project/neural-search/actions/runs/9157136803/job/25247196478?pr=684.

It's looks ok at the first glance, I understand we may have flaky tests. @vibrantvarun do you have context, is this a flaky test issue or it was failing constantly?

yuye-aws · 2024-05-22T00:54:54Z

Here is the failing link: https://github.com/opensearch-project/neural-search/actions/runs/9157136803/job/25172907185?pr=684. It is the first time this issue gets falky. I am wondering whether this issue is related to BWC workflow. Does @vibrantvarun know more details?

martin-gaievski · 2024-05-22T04:21:29Z

I've made few test runs for #684 , got mixed results https://github.com/opensearch-project/neural-search/actions/runs/9157136803/job/25255088325?pr=684. I think now it's more like a flaky test, as per my understanding it's different from what we had initially when test always failed.

jdk11, linux, PASS
jdk11, win, PASS
jdk17, linux, PASS
jdk17, win, FAIL
jdk21, linux, FAIL
jdk21, win, PASS

yuye-aws · 2024-05-22T04:26:52Z

Thanks @martin-gaievski for providing more results? Can we check the snapshots in https://artifacts.opensearch.org/releases/bundle/opensearch?

yuye-aws · 2024-05-23T01:17:31Z

Our conclusion:
Try running 2.13.0-SNAPSHOT BWC tests on local machine. If the tests get pass with shard number 1, I will paste the results and @vibrantvarun will close this issue. Also try 2.14.0 BWC tests.

yuye-aws · 2024-06-11T03:55:10Z

Latest update: 2.13.0-SNAPSHOT does not include the PR fixing the bug.

yuye-aws · 2024-06-11T03:55:55Z

BWC tests are failing in version 2.13.0 because the neural search code are fetched from url: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch. Here are a few options to resolve this issue:

Disable text chunking BWC tests for version 2.13.

Fetch artifacts for 2.13 BWC tests: https://artifacts.opensearch.org/releases/bundle/opensearch.

Simply close this issue and do nothing with the code because BWC tests is already passing with shard number at three.

Replace the bundle in https://ci.opensearch.org/ci/dbc/distribution-build-opensearch.

Change the version from "2.13.0" to "2.13.0-SNAPSHOT" in backwards_compatibility_tests_workflow.yml

Option 5 is not valid.

martin-gaievski · 2024-06-11T21:11:25Z

Please try following steps:

build custom tarball of opensearch, include only required components - OpenSearch common-utils k-NN ml-commons neural-search. For that you need a setup of https://github.com/opensearch-project/opensearch-build/
change build.gradle to point to that tarball, easiest way is to use local file url https://github.com/opensearch-project/neural-search/blob/main/qa/build.gradle#L108
run bwc locally and share results in this PR

yuye-aws · 2024-06-17T10:47:46Z

Hi @martin-gaievski and @vibrantvarun ! I have double confirmed that 2.13 branch can fix the BWC test PR. First, here is the error log when directly running BWC tests. bwc_test_error_logs.txt Then, I follow your steps. I build a 2.13.0 jar file for neural search plugin and then replace it. Here is the successful log:

./gradlew ':qa:restart-upgrade:testAgainstNewCluster' --tests "org.opensearch.neuralsearch.bwc.TextChunkingProcessorIT.testTextChunkingProcessor_E2EFlow" -Dtests.bwc.version=2.13.0-SNAPSHOT
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 8.4
  OS Info               : Linux 5.10.218-186.862.amzn2int.x86_64 (amd64)
  JDK Version           : 17 (Amazon Corretto JDK)
  JAVA_HOME             : /usr/lib/jvm/java-17-amazon-corretto.x86_64
  Random Testing Seed   : D094611A8B47EBAB
  In FIPS 140 mode      : false
=======================================

> Task :generatePomFileForNebulaPublication
Maven publication 'nebula' pom metadata warnings (silence with 'suppressPomMetadataWarningsFor(variant)'):
  - Variant testFixturesApiElements:
      - Declares capability org.opensearch:neural-search-test-fixtures:3.0.0.0-SNAPSHOT which cannot be mapped to Maven
  - Variant testFixturesRuntimeElements:
      - Declares capability org.opensearch:neural-search-test-fixtures:3.0.0.0-SNAPSHOT which cannot be mapped to Maven
These issues indicate information that is lost in the published 'pom' metadata file, which may be an issue if the published library is consumed by an old Gradle version or Apache Maven.
The 'module' metadata file, which is used by Gradle 6+ is not affected.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.4/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 2m 37s
38 actionable tasks: 14 executed, 24 up-to-date

yuye-aws · 2024-06-17T10:49:03Z

I have also tested with ./gradlew publishToMavenLocal and then run the bwc tests. You can confirm that the BWC test would get passed after publishing to .m2 repository.

yuye-aws · 2024-06-17T10:49:33Z

Since 2.13.0-SNAPSHOT bwc tests is good, can we close this issue?

yuye-aws · 2024-06-18T03:03:07Z

Closing this issue as it has been resolved.

yuye-aws added bug Something isn't working untriaged labels Apr 12, 2024

yuye-aws mentioned this issue Apr 12, 2024

Test: bwc test for text chunking processor #661

Merged

5 tasks

vibrantvarun assigned yuye-aws Apr 24, 2024

zhichao-aws removed the untriaged label Apr 24, 2024

yuye-aws changed the title ~~[BUG] BWC Tests get failed with different index setting~~ [BUG] BWC Tests get failed with different index setting in text chunking processor Apr 27, 2024

yuye-aws mentioned this issue Apr 27, 2024

Fix: text chunking processor ingestion bug on multi-node cluster #713

Merged

5 tasks

yuye-aws changed the title ~~[BUG] BWC Tests get failed with different index setting in text chunking processor~~ [BUG] BWC Tests get failed with different shard number for text chunking processor May 7, 2024

yuye-aws mentioned this issue May 20, 2024

Fix: update github workflow fow text chunking BWC test #752

Closed

5 tasks

yuye-aws closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] BWC Tests get failed with different shard number for text chunking processor #690

[BUG] BWC Tests get failed with different shard number for text chunking processor #690

yuye-aws commented Apr 12, 2024 •

edited

Loading

yuye-aws commented Apr 17, 2024

yuye-aws commented Apr 23, 2024

yuye-aws commented Apr 23, 2024

yuye-aws commented May 7, 2024

yuye-aws commented May 7, 2024 •

edited

Loading

yuye-aws commented May 7, 2024

vibrantvarun commented May 9, 2024

martin-gaievski commented May 9, 2024

yuye-aws commented May 10, 2024

yuye-aws commented May 10, 2024

yuye-aws commented May 14, 2024

vibrantvarun commented May 15, 2024

martin-gaievski commented May 15, 2024

yuye-aws commented May 16, 2024

yuye-aws commented May 20, 2024

yuye-aws commented May 20, 2024

martin-gaievski commented May 21, 2024

yuye-aws commented May 22, 2024

martin-gaievski commented May 22, 2024 •

edited

Loading

yuye-aws commented May 22, 2024

yuye-aws commented May 23, 2024 •

edited

Loading

yuye-aws commented Jun 11, 2024

yuye-aws commented Jun 11, 2024

martin-gaievski commented Jun 11, 2024

yuye-aws commented Jun 17, 2024

yuye-aws commented Jun 17, 2024

yuye-aws commented Jun 17, 2024

yuye-aws commented Jun 18, 2024

[BUG] BWC Tests get failed with different shard number for text chunking processor #690

[BUG] BWC Tests get failed with different shard number for text chunking processor #690

Comments

yuye-aws commented Apr 12, 2024 • edited Loading

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

yuye-aws commented Apr 17, 2024

yuye-aws commented Apr 23, 2024

yuye-aws commented Apr 23, 2024

yuye-aws commented May 7, 2024

yuye-aws commented May 7, 2024 • edited Loading

yuye-aws commented May 7, 2024

vibrantvarun commented May 9, 2024

martin-gaievski commented May 9, 2024

yuye-aws commented May 10, 2024

yuye-aws commented May 10, 2024

yuye-aws commented May 14, 2024

vibrantvarun commented May 15, 2024

martin-gaievski commented May 15, 2024

yuye-aws commented May 16, 2024

yuye-aws commented May 20, 2024

yuye-aws commented May 20, 2024

martin-gaievski commented May 21, 2024

yuye-aws commented May 22, 2024

martin-gaievski commented May 22, 2024 • edited Loading

yuye-aws commented May 22, 2024

yuye-aws commented May 23, 2024 • edited Loading

yuye-aws commented Jun 11, 2024

yuye-aws commented Jun 11, 2024

martin-gaievski commented Jun 11, 2024

yuye-aws commented Jun 17, 2024

yuye-aws commented Jun 17, 2024

yuye-aws commented Jun 17, 2024

yuye-aws commented Jun 18, 2024

yuye-aws commented Apr 12, 2024 •

edited

Loading

yuye-aws commented May 7, 2024 •

edited

Loading

martin-gaievski commented May 22, 2024 •

edited

Loading

yuye-aws commented May 23, 2024 •

edited

Loading