-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backward compatibility tests do not run with the security plugin enabled #3056
Comments
I think this is the general idea for core: opensearch-project/OpenSearch#8900. I am not sure about actually inputting the setting however since I don't see analogs for the type of setting I am creating here. I will have to look into it more. Note: This would be the last checkbox Update 7/27: Still cannot figure out how to apply the configuration to the tests. I reached out to Sarat and Vacha since they authored this blog in 2021: https://opensearch.org/blog/bwc-testing-for-opensearch/. |
[Triage] @scrawfor99 is handling this issue. |
opensearch-project/OpenSearch#8900 addresses part 1 of this issue. https://github.com/scrawfor99/security/tree/testClusterChanges outlines some of the changes likely required for the security side of things. To run test copy the certs from the linked branch in the BWC resources and then use |
Not sure how to diagnose this yet:
And
|
@scrawfor99 Do you have a pull request/branch where you get that reproduction when attempting to start the bwc tests? Let me know and I'll take a look |
Hi @peternied, I do it is linked here: #3056 (comment) |
Here is how I set up my machine, and I can see those errors...
|
Managed to get some of the scenarios running, still sees some errors, here is the commit that does the most to get that unblocked stephen-crawford@568c468 Functional branch - basic readme on how to get started https://github.com/peternied/security/blob/testClusterChanges/bwc-test/README.md Unblocked
Still broken
|
Thanks @peternied for making some changes and getting the first parts unstuck. Based on my understanding, I think we need to get opensearch-project/OpenSearch#8900 merged and backported to 2.x until certain tasks including mixedClusterTask will pass. When I looked at the
Judging on this, I would expect that we need the main changes that allow running with security to be backported to the old version we wanted to come from before this will work. For example to have mixed start at 2.9 we will need 2.9 to be able to run a test cluster with Security installed first. Otherwise we will encounter failures on the old version. Running the tests from the same version to itself (i.e. mixed cluster from 3.0.0.0 -> 3.0.0.0) causes issues because the security plugin directory will already exist and cause
to be thrown |
@scrawfor99 Since we have some of the tests working could you file a separate issue to track the fixing mixed cluster task? Seems like the backporting the test harness fixes into 2.X might be needed, or maybe there is something else wrong. I didn't dive in more, but cluster not starting up could be due to an issue with the nodes connecting to the cluster or something else that needs more digging. |
Hi @peternied, no problem. I am not sure what this will mean for this issue though... At what point will we be able to consider it done do you think? I also determined that any time we run |
I'm not worried about the meta issue vs actual issues so long as the we can unblock #2802 I was thinking of having a separate issue because maybe there is some work that can happen in parallel before we've got everything in place. If your instincts say to keep it one issue lets do it. |
Currently looking into how normal upgrades work because they must handle plugins somehow and we want to mimic that in the test handling in core. The code of interest is: https://github.com/opensearch-project/OpenSearch/tree/main/distribution/tools/upgrade-cli/src/main/java/org/opensearch/upgrade. I then asked a couple of the build team about how it works and basically, we create a completely separate cluster and plugin of the new version then the data gets moved over. This means that during a rolling upgrade we always have a cluster of a single node which is the old node we are replacing. Looks like we can get the installed plugins with
|
Failure in mixedClusterTest is actually from this:
The first and second nodes update fine but the final node fails to join the cluster. |
I noticed in the failing mixedClusterTest that we run into what looks like a node trying to join itself:
The target and source node are both Seems like this is expected however? Issue occurs after version upgrade where node does not properly initialize:
|
First issue in updated node occurs before update:
This seems standard however as I see it appear elsewhere on the non-upgrading nodes as well. After that we transition into some basic config logging
Again, this appears correct.
Other node components are then loaded, an address is published, and the node starts.
Node then loads configuration and looks for dynamic host lists. It repeats that it is waiting for the cluster be available several times. |
Then the node starts joining others to form a cluster:
For some reason 0-2 is listed twice while 0-1 is not listed. The node then states that it successfully joined itself:
BackendRegistry continually states that cluster is not initialized while
|
I opened a manual backport against 2.x in core. |
I was able to get all tests passing on a change from 2.8 to 2.9. Here are the steps to reproduce: Use this build of 2.9: https://github.com/scrawfor99/OpenSearch/tree/2.9.0 For each of these, I cherry picked the commits to my backport in the 2.x line which can be found here: opensearch-project/OpenSearch#9444. I then ran Finally I ran:
Which grabbed the localDistro of the earlier version file which I had created. I am skeptical that part is truly necessary however. Here are the test results parsed from the output:
This appears to me that all tests pass since the assertions of the tests are all based around calling the SecurityBackwardsCompatabilityIT:
|
Blocked pending direction forward -- do we want to work on fixing whatever is wrong with main or prioritize merging (#2802)? |
Going to be splitting this into a 2.x fix and a main fix for now. That will mean that the changes in the working 2.x will be submitted as a PR to the 2.x line and then we will follow up on main from a separate issue. |
### Description Opening up a PR to describe the issues faced with BWC tests with the security plugin installed and solicit feedback. I plan to forward port this change to main, but first wanted to show this working for 2.9 -> 2.10 tests (as of the time of this writing). Thanks to the work that @scrawfor99 did in [core](opensearch-project/OpenSearch#8900) to supply security settings to testClusters to be able to run the initial wait for cluster yellow checks with a URL that includes the right protocol (`https` when security is enabled) along with a username and password to authenticate the request. I ran into 4 hurdles to get this to run: 1. Initially the cluster didn't form. After a lot of frustration, I ended up finding that by supplying `network.bind_host` and `network.publish_host` to both 127.0.0.1 it resolved the issue. These could probably be combined into a single `network.host`, but I chose to keep them separated. 2. I had issue testing changes to the gradle build-tools after making changes locally. This was the most frustrating hurdle, but ultimately the solution was to change the [`opensearch.version` setting in `bwc-test/build.gradle`](https://github.com/opensearch-project/security/blob/2.x/bwc-test/build.gradle#L47) to `2.10.0-SNAPSHOT`. This value is specifically used as the version of the gradle build-tools that the [BWC tests use](https://github.com/opensearch-project/security/blob/main/bwc-test/build.gradle#L58). The changes I made locally didn't reflect because I was publishing to maven local from the 2.x branch (currently 2.10) and it was looking for 2.9.0-SNAPSHOT artifacts. After updating the value it found my maven local snapshots. For this artifact you can produce maven local snapshots using `./gradlew :build-tools:publishToMavenLocal` from the respective branch in the core repo. 3. After the waitForYellow checks were able to run successfully, the REST Client in the SecurityBackwardsCompatibilityIT was also having problems connecting to the cluster because it didn't recognize the certificates of the server. I ended up using the overly trustworthy route where there is no SSL verification for the REST Client used in this test. I borrowed this implementation from [k-NN's ODFERestTestCase](https://github.com/opensearch-project/k-NN/blob/2.x/src/testFixtures/java/org/opensearch/knn/ODFERestTestCase.java#L118-L141) which is widely used in the plugin ecosystem. There is an open issue to abstract this class into common-utils. More work can be done here to ensure the rest-high-level-client runs with a truststore with the root certificate. 4. The last hurdle I faced was a WarningFailureException where the REST Client could not deserialize the cluster health response because of a warning that was returned with the response about the request including system indices. According to this [comment](opensearch-project/OpenSearch#1108 (comment)), this may only be enabled in snapshots. To fix this, I set preserve cluster to true which [bypasses the method](https://github.com/opensearch-project/OpenSearch/blob/main/test/framework/src/main/java/org/opensearch/test/rest/OpenSearchRestTestCase.java#L364) where the error was thrown. * Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation) Enhancement ### Issues Resolved #3056 ### Check List - [ ] New functionality includes testing - [ ] New functionality has been documented - [ ] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). --------- Signed-off-by: Craig Perkins <[email protected]>
Opening up a PR to describe the issues faced with BWC tests with the security plugin installed and solicit feedback. I plan to forward port this change to main, but first wanted to show this working for 2.9 -> 2.10 tests (as of the time of this writing). Thanks to the work that @scrawfor99 did in [core](opensearch-project/OpenSearch#8900) to supply security settings to testClusters to be able to run the initial wait for cluster yellow checks with a URL that includes the right protocol (`https` when security is enabled) along with a username and password to authenticate the request. I ran into 4 hurdles to get this to run: 1. Initially the cluster didn't form. After a lot of frustration, I ended up finding that by supplying `network.bind_host` and `network.publish_host` to both 127.0.0.1 it resolved the issue. These could probably be combined into a single `network.host`, but I chose to keep them separated. 2. I had issue testing changes to the gradle build-tools after making changes locally. This was the most frustrating hurdle, but ultimately the solution was to change the [`opensearch.version` setting in `bwc-test/build.gradle`](https://github.com/opensearch-project/security/blob/2.x/bwc-test/build.gradle#L47) to `2.10.0-SNAPSHOT`. This value is specifically used as the version of the gradle build-tools that the [BWC tests use](https://github.com/opensearch-project/security/blob/main/bwc-test/build.gradle#L58). The changes I made locally didn't reflect because I was publishing to maven local from the 2.x branch (currently 2.10) and it was looking for 2.9.0-SNAPSHOT artifacts. After updating the value it found my maven local snapshots. For this artifact you can produce maven local snapshots using `./gradlew :build-tools:publishToMavenLocal` from the respective branch in the core repo. 3. After the waitForYellow checks were able to run successfully, the REST Client in the SecurityBackwardsCompatibilityIT was also having problems connecting to the cluster because it didn't recognize the certificates of the server. I ended up using the overly trustworthy route where there is no SSL verification for the REST Client used in this test. I borrowed this implementation from [k-NN's ODFERestTestCase](https://github.com/opensearch-project/k-NN/blob/2.x/src/testFixtures/java/org/opensearch/knn/ODFERestTestCase.java#L118-L141) which is widely used in the plugin ecosystem. There is an open issue to abstract this class into common-utils. More work can be done here to ensure the rest-high-level-client runs with a truststore with the root certificate. 4. The last hurdle I faced was a WarningFailureException where the REST Client could not deserialize the cluster health response because of a warning that was returned with the response about the request including system indices. According to this [comment](opensearch-project/OpenSearch#1108 (comment)), this may only be enabled in snapshots. To fix this, I set preserve cluster to true which [bypasses the method](https://github.com/opensearch-project/OpenSearch/blob/main/test/framework/src/main/java/org/opensearch/test/rest/OpenSearchRestTestCase.java#L364) where the error was thrown. * Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation) Enhancement opensearch-project#3056 - [ ] New functionality includes testing - [ ] New functionality has been documented - [ ] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). --------- Signed-off-by: Craig Perkins <[email protected]>
Issue
When BWC tests are executing they call
GET _node/{NODE_ID}/plugins
and check to see if the security plugin is in the list. This doesn't check that the securty plugin is operational. After inspection, it is not operational. The BWC tests should run with all features of the security plugin enabled otherwise we cannot determine if backwards incompatible changes are being assessed.no handler found for uri
#3053Sub issues to address
Additional Context
I'm seeing the following line that seems troublesome, this would imply that the security plugin features won't be available, doing more digging
After looking at the configuration settings, (Thanks for paired debugging with me @parasjain1) these test don't actually start up the security plugin at all, the following configuration line needs to be switched from false -> true.
security/bwc-test/build.gradle
Line 128 in 8063e1b
However, after attempting to startup the node, copying configuration from SecureRestClientBuilder.java, there are still many errors coming in from the cluster:
These errors are happening because the cluster startup check is hardcoded to http, need to make this parameterized
Branch that can be used to kickstart this effort main...peternied:security:bwc-ssl
The text was updated successfully, but these errors were encountered: