Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward compatibility tests do not run with the security plugin enabled #3056

Closed
3 of 6 tasks
peternied opened this issue Jul 26, 2023 · 21 comments
Closed
3 of 6 tasks
Assignees
Labels
backwards-compatibility bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@peternied
Copy link
Member

peternied commented Jul 26, 2023

Issue

When BWC tests are executing they call GET _node/{NODE_ID}/plugins and check to see if the security plugin is in the list. This doesn't check that the securty plugin is operational. After inspection, it is not operational. The BWC tests should run with all features of the security plugin enabled otherwise we cannot determine if backwards incompatible changes are being assessed.

Sub issues to address

  • Allow test clusters to run with TLS OpenSearch#8900
  • Enable security plugin on BWC tests
  • Correct audit logging source for tests
  • Modify the both the client and admin client to use SSL connection
  • Update the test case to allow specifying the use that is used to make the connection
  • (Recommended) Add a test case that calls 'whoami' and confirms the identity of the caller

Additional Context

I'm seeing the following line that seems troublesome, this would imply that the security plugin features won't be available, doing more digging

» WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster1-0] OpenSearch Security plugin installed but disabled. This can expose your configuration (including passwords) to the public.

After looking at the configuration settings, (Thanks for paired debugging with me @parasjain1) these test don't actually start up the security plugin at all, the following configuration line needs to be switched from false -> true.

node.setting("plugins.security.disabled", "true")

However, after attempting to startup the node, copying configuration from SecureRestClientBuilder.java, there are still many errors coming in from the cluster:

» ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [securityBwcCluster1-0] Exception during establishing a SSL connection: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 4...0a
» io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 4..a
» at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1225) ~[netty-handler-4.1.94.Final.jar:4.1.94.Final]

These errors are happening because the cluster startup check is hardcoded to http, need to make this parameterized

WaitForHttpResource wait = new WaitForHttpResource("http", getFirstNode().getHttpSocketURI(), nodes.size());
[Source]

Branch that can be used to kickstart this effort main...peternied:security:bwc-ssl

@github-actions github-actions bot added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Jul 26, 2023
@peternied peternied removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Jul 26, 2023
@peternied peternied added bug Something isn't working backwards-compatibility labels Jul 26, 2023
@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jul 26, 2023

I think this is the general idea for core: opensearch-project/OpenSearch#8900. I am not sure about actually inputting the setting however since I don't see analogs for the type of setting I am creating here. I will have to look into it more.

Note: This would be the last checkbox

Update 7/27: Still cannot figure out how to apply the configuration to the tests. I reached out to Sarat and Vacha since they authored this blog in 2021: https://opensearch.org/blog/bwc-testing-for-opensearch/.

@stephen-crawford
Copy link
Contributor

[Triage] @scrawfor99 is handling this issue.

@stephen-crawford stephen-crawford added the triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. label Jul 31, 2023
@stephen-crawford
Copy link
Contributor

stephen-crawford commented Aug 11, 2023

opensearch-project/OpenSearch#8900 addresses part 1 of this issue.

https://github.com/scrawfor99/security/tree/testClusterChanges outlines some of the changes likely required for the security side of things.

To run test copy the certs from the linked branch in the BWC resources and then use ./gradlew bwcTestSuite -Dtests.opensearch.http.protocol=https -Dtests.opensearch.username=admin -Dtests.opensearch.password=admin -PcustomDistributionUrl="/Users/steecraw/OpenSearch/build/distribution/local/opensearch-3.0.0-SNAPSHOT.tar.gz" -i

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Aug 11, 2023

Not sure how to diagnose this yet:

java.security.AccessControlException: access denied ("java.io.FilePermission" "/Users/steecraw/security/bwc-test/build/classes/java/opensearch-node.pem" "read")
	at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
	at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
	at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)
	at java.base/java.lang.SecurityManager.checkRead(SecurityManager.java:756)
	at java.base/sun.nio.fs.UnixPath.checkRead(UnixPath.java:780)
	at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:49)
	at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
	at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
	at java.base/java.nio.file.Files.isDirectory(Files.java:2322)
	at org.opensearch.commons.rest.SecureRestClientBuilder.resolve(SecureRestClientBuilder.java:282)
	at org.opensearch.commons.rest.SecureRestClientBuilder.createSSLContext(SecureRestClientBuilder.java:247)
	at org.opensearch.commons.rest.SecureRestClientBuilder.createRestClientBuilder(SecureRestClientBuilder.java:202)
	at org.opensearch.commons.rest.SecureRestClientBuilder.build(SecureRestClientBuilder.java:159)
	at org.opensearch.security.bwc.SecurityBackwardsCompatibilityIT.buildClient(SecurityBackwardsCompatibilityIT.java:108)
	at org.opensearch.test.rest.OpenSearchRestTestCase.initClient(OpenSearchRestTestCase.java:211)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:972)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	

And

java.lang.NullPointerException: Cannot invoke "org.opensearch.client.RestClient.performRequest(org.opensearch.client.Request)" because the return value of "org.opensearch.test.rest.OpenSearchRestTestCase.adminClient()" is null
	at org.opensearch.test.rest.OpenSearchRestTestCase.ensureNoInitializingShards(OpenSearchRestTestCase.java:964)
	at org.opensearch.test.rest.OpenSearchRestTestCase.cleanUpCluster(OpenSearchRestTestCase.java:364)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)

@peternied
Copy link
Member Author

@scrawfor99 Do you have a pull request/branch where you get that reproduction when attempting to start the bwc tests? Let me know and I'll take a look

@stephen-crawford
Copy link
Contributor

Hi @peternied, I do it is linked here: #3056 (comment)

@peternied
Copy link
Member Author

Here is how I set up my machine, and I can see those errors...

cd ~/git/OpenSearch/
git remote add scrawfor99 https://github.com/scrawfor99/OpenSearch.git
git fetch scrawfor99
git checkout scrawfor99/bwcFix
git merge origin/main
./gradlew build-tools:publishToMavenLocal
./gradlew distribution:archives:linux-tar:assemble

cd ../security
git remote add scrawfor99 https://github.com/scrawfor99/OpenSearch.git
git checkout scrawfor99/testClusterChanges
git merge origin/main
./gradlew assemble
mkdir -p ${GIT_PROJECT_ROOT}/security/bwc-test/src/test/resources/3.0.0.0
cp ${GIT_PROJECT_ROOT}/security/build/distributions/opensearch-security-3.0.0.0-SNAPSHOT.zip ${GIT_PROJECT_ROOT}/security/bwc-test/src/test/resources/3.0.0.0/opensearch-security-3.0.0.0-SNAPSHOT.zip
./gradlew -p bwc-test bwcTestSuite -Dtests.opensearch.http.protocol=https -Dtests.opensearch.username=admin -Dtests.opensearch.password=admin -PcustomDistributionUrl="${GIT_PROJECT_ROOT}/opensearch/distribution/archives/linux-tar/build/distributions/opensearch-min-3.0.0-SNAPSHOT-linux-x64.tar.gz" -i

@peternied
Copy link
Member Author

peternied commented Aug 14, 2023

Managed to get some of the scenarios running, still sees some errors, here is the commit that does the most to get that unblocked stephen-crawford@568c468

Functional branch - basic readme on how to get started https://github.com/peternied/security/blob/testClusterChanges/bwc-test/README.md

Unblocked

  • securityBwcCluster#fullRestartClusterTask
  • securityBwcCluster#oldVersionClusterTask0
  • securityBwcCluster#oldVersionClusterTask1

Still broken

  • securityBwcCluster#mixedClusterTask
Execution failed for task ':securityBwcCluster#mixedClusterTask'.
> `cluster{::securityBwcCluster0}` failed to wait for cluster health yellow after 40 SECONDS
    IO error while waiting cluster
    503 Service Unavailable

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Aug 15, 2023

Thanks @peternied for making some changes and getting the first parts unstuck. Based on my understanding, I think we need to get opensearch-project/OpenSearch#8900 merged and backported to 2.x until certain tasks including mixedClusterTask will pass.

When I looked at the mixedClusterTask, the description reads:

// Upgrades one node of the old cluster to new OpenSearch version with upgraded plugin version
// This results in a mixed cluster with 2 nodes on the old version and 1 upgraded node.
// This is also used as a one third upgraded cluster for a rolling upgrade.

Judging on this, I would expect that we need the main changes that allow running with security to be backported to the old version we wanted to come from before this will work. For example to have mixed start at 2.9 we will need 2.9 to be able to run a test cluster with Security installed first. Otherwise we will encounter failures on the old version.

Running the tests from the same version to itself (i.e. mixed cluster from 3.0.0.0 -> 3.0.0.0) causes issues because the security plugin directory will already exist and cause

ERROR: plugin directory [/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster1-0/distro/3.0.0-INTEG_TEST/plugins/opensearch-security] already exists; if you need to update the plugin, uninstall it first using command 'remove opensearch-security'
Waiting for reaper to exit normally

to be thrown

@peternied
Copy link
Member Author

@scrawfor99 Since we have some of the tests working could you file a separate issue to track the fixing mixed cluster task? Seems like the backporting the test harness fixes into 2.X might be needed, or maybe there is something else wrong. I didn't dive in more, but cluster not starting up could be due to an issue with the nodes connecting to the cluster or something else that needs more digging.

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Aug 15, 2023

Hi @peternied, no problem. I am not sure what this will mean for this issue though... At what point will we be able to consider it done do you think?

I also determined that any time we run upgradeNodeAndPluginToNextVersion we need to be able to first uninstall the security plugin. So that will require further changes in core.

@peternied
Copy link
Member Author

I'm not worried about the meta issue vs actual issues so long as the we can unblock #2802

I was thinking of having a separate issue because maybe there is some work that can happen in parallel before we've got everything in place. If your instincts say to keep it one issue lets do it.

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Aug 16, 2023

Currently looking into how normal upgrades work because they must handle plugins somehow and we want to mimic that in the test handling in core.

The code of interest is: https://github.com/opensearch-project/OpenSearch/tree/main/distribution/tools/upgrade-cli/src/main/java/org/opensearch/upgrade.

I then asked a couple of the build team about how it works and basically, we create a completely separate cluster and plugin of the new version then the data gets moved over. This means that during a rolling upgrade we always have a cluster of a single node which is the old node we are replacing.

Looks like we can get the installed plugins with

@SuppressForbidden(reason = "Retrieve information on installed plugins.")
    private List<String> fetchPluginsFromUrl(final String url) {
        final List<String> plugins = new ArrayList<>();
        try {
            final URL esUrl = new URL(url + "/_cat/plugins?format=json&local=true");
            final HttpURLConnection conn = (HttpURLConnection) esUrl.openConnection();
            conn.setRequestMethod("GET");
            conn.setConnectTimeout(1000);
            conn.connect();
            if (conn.getResponseCode() == 200) {
                final StringBuilder json = new StringBuilder();
                final Scanner scanner = new Scanner(esUrl.openStream());
                while (scanner.hasNext()) {
                    json.append(scanner.nextLine());
                }
                scanner.close();
                final ObjectMapper mapper = new ObjectMapper();
                final Map<String, String>[] response = mapper.readValue(json.toString(), Map[].class);
                for (Map<String, String> plugin : response) {
                    plugins.add(plugin.get("component"));
                }
            }
            return plugins;
        } catch (IOException e) {
            throw new RuntimeException("Error retrieving elasticsearch plugin details, " + e);
        }
    }

@stephen-crawford
Copy link
Contributor

Failure in mixedClusterTest is actually from this:

[2023-08-16T16:51:10,047][INFO ][o.o.s.a.i.AuditLogImpl   ] [securityBwcCluster1-2] .opendistro_security is used as internal security index.
[2023-08-16T16:51:10,047][INFO ][o.o.s.a.i.AuditLogImpl   ] [securityBwcCluster1-2] Internal index used for posting audit logs is null
[2023-08-16T16:51:10,048][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster1-2] Hot-reloading of audit configuration is enabled
[2023-08-16T16:51:10,051][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster1-2] Node 'securityBwcCluster1-2' initialized
[2023-08-16T16:51:12,409][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster1-2] received diff cluster state version [39] with uuid [7NdxpHS5TT6jDYFoTa66_g], diff size [196]
[2023-08-16T16:51:14,680][DEBUG][o.o.c.c.LeaderChecker    ] [securityBwcCluster1-2] leader [{securityBwcCluster1-0}{UrHguxEtQG-IFR6e3IwHBw}{MXMAI8_LQ9aoCB-eBMtXRw}{127.0.0.1}{127.0.0.1:63821}{dimr}{testattr=test, upgraded=true, shard_indexing_pressure_enabled=true}] disconnected
[2023-08-16T16:51:14,682][INFO ][o.o.c.c.Coordinator      ] [securityBwcCluster1-2] cluster-manager node [{securityBwcCluster1-0}{UrHguxEtQG-IFR6e3IwHBw}{MXMAI8_LQ9aoCB-eBMtXRw}{127.0.0.1}{127.0.0.1:63821}{dimr}{testattr=test, upgraded=true, shard_indexing_pressure_enabled=true}] failed, restarting discovery
org.opensearch.transport.NodeDisconnectedException: [securityBwcCluster1-0][127.0.0.1:63821][disconnected] disconnected
[2023-08-16T16:51:14,683][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster1-2] onLeaderFailure: coordinator becoming CANDIDATE in term 4 (was FOLLOWER, lastKnownLeader was [Optional[{securityBwcCluster1-0}{UrHguxEtQG-IFR6e3IwHBw}{MXMAI8_LQ9aoCB-eBMtXRw}{127.0.0.1}{127.0.0.1:63821}{dimr}{testattr=test, upgraded=true, shard_indexing_pressure_enabled=true}]])
[2023-08-16T16:51:14,685][INFO ][o.o.c.s.ClusterApplierService] [securityBwcCluster1-2] cluster-manager node changed {previous [{securityBwcCluster1-0}{UrHguxEtQG-IFR6e3IwHBw}{MXMAI8_LQ9aoCB-eBMtXRw}{127.0.0.1}{127.0.0.1:63821}{dimr}{testattr=test, upgraded=true, shard_indexing_pressure_enabled=true}], current []}, term: 4, version: 39, reason: becoming candidate: onLeaderFailure
[2023-08-16T16:51:14,691][WARN ][o.o.c.NodeConnectionsService] [securityBwcCluster1-2] failed to connect to {securityBwcCluster1-0}{UrHguxEtQG-IFR6e3IwHBw}{MXMAI8_LQ9aoCB-eBMtXRw}{127.0.0.1}{127.0.0.1:63821}{dimr}{testattr=test, upgraded=true, shard_indexing_pressure_enabled=true} (tried [1] times)
org.opensearch.transport.ConnectTransportException: [securityBwcCluster1-0][127.0.0.1:63821] connect_exception
	at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1074) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.core.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:215) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:57) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:72) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:63821
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
	... 7 more
[2023-08-16T16:51:14,708][ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [securityBwcCluster1-2] Exception during establishing a SSL connection: java.net.SocketException: Connection reset
java.net.SocketException: Connection reset
	at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426) ~[?:?]
	at org.opensearch.transport.CopyBytesSocketChannel.readFromSocketChannel(CopyBytesSocketChannel.java:155) ~[transport-netty4-client-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.transport.CopyBytesSocketChannel.doReadBytes(CopyBytesSocketChannel.java:140) ~[transport-netty4-client-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.96.Final.jar:4.1.96.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-08-16T16:51:14,708][ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [securityBwcCluster1-2] Exception during establishing a SSL connection: java.net.SocketException: Connection reset
java.net.SocketException: Connection reset
	at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426) ~[?:?]
	at org.opensearch.transport.CopyBytesSocketChannel.readFromSocketChannel(CopyBytesSocketChannel.java:155) ~[transport-netty4-client-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.transport.CopyBytesSocketChannel.doReadBytes(CopyBytesSocketChannel.java:140) ~[transport-netty4-client-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.96.Final.jar:4.1.96.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.96.Final.jar:4.1.96.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]

The first and second nodes update fine but the final node fails to join the cluster.

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Aug 17, 2023

I noticed in the failing mixedClusterTest that we run into what looks like a node trying to join itself:


[2023-08-17T17:07:17,088][INFO ][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] failed to join {securityBwcCluster0-0}{nIIK5MlvS9KOJF81nv9v8w}{2qkpyY1iROGFeyQY4y9hIQ}{127.0.0.1}{127.0.0.1:53701}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={securityBwcCluster0-0}{nIIK5MlvS9KOJF81nv9v8w}{2qkpyY1iROGFeyQY4y9hIQ}{127.0.0.1}{127.0.0.1:53701}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, minimumTerm=1, optionalJoin=Optional[Join{term=2, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-0}{nIIK5MlvS9KOJF81nv9v8w}{2qkpyY1iROGFeyQY4y9hIQ}{127.0.0.1}{127.0.0.1:53701}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{nIIK5MlvS9KOJF81nv9v8w}{2qkpyY1iROGFeyQY4y9hIQ}{127.0.0.1}{127.0.0.1:53701}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [securityBwcCluster0-0][127.0.0.1:53701][internal:cluster/coordination/join]

The target and source node are both 0-0 suggesting that the node tries to join itself.

Seems like this is expected however?

Issue occurs after version upgrade where node does not properly initialize:

2023-08-17T17:07:23,317][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received diff cluster state version [25] with uuid [2454OcENRFKitANW060m2Q], diff size [449]
[2023-08-17T21:07:25.067123Z] [BUILD] Stopping node
[2023-08-17T21:07:25.085545Z] [BUILD] Switch version from 2.9.0 to 3.0.0-SNAPSHOT
[2023-08-17T21:07:25.085922Z] [BUILD] Configuring custom cluster specific distro directory: /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/distro/3.0.0-INTEG_TEST
[2023-08-17T21:07:25.181954Z] [BUILD] Setting up 6 additional config files
[2023-08-17T21:07:25.183565Z] [BUILD] Copying additional config files from distro [/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/distro/3.0.0-INTEG_TEST/config/jvm.options.d, /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/distro/3.0.0-INTEG_TEST/config/opensearch.yml, /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/distro/3.0.0-INTEG_TEST/config/log4j2.properties, /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/distro/3.0.0-INTEG_TEST/config/jvm.options]
[2023-08-17T21:07:25.184620Z] [BUILD] installing 1 plugins in a single transaction
[2023-08-17T21:07:26.461Z] [BUILD] installed plugins
[2023-08-17T21:07:26.461187Z] [BUILD] Creating opensearch keystore with password set to []
[2023-08-17T21:07:27.076738Z] [BUILD] Installing 0modules
[2023-08-17T21:07:27.076896Z] [BUILD] Starting OpenSearch process
[2023-08-17T17:07:28,457][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] unable to load JNA native support library, native methods will be disabled.
java.lang.UnsatisfiedLinkError: Can't load library: /Users/steecraw/Library/Caches/JNA/temp/jna2795729481167693804.tmp
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2393) ~[?:?]
	at java.lang.Runtime.load0(Runtime.java:755) ~[?:?]
	at java.lang.System.load(System.java:1953) ~[?:?]
	at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:1018) ~[jna-5.5.0.jar:5.5.0 (b0)]
	at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:988) ~[jna-5.5.0.jar:5.5.0 (b0)]
	at com.sun.jna.Native.<clinit>(Native.java:195) ~[jna-5.5.0.jar:5.5.0 (b0)]
	at java.lang.Class.forName0(Native Method) ~[?:?]
	at java.lang.Class.forName(Class.java:375) ~[?:?]
	at org.opensearch.bootstrap.Natives.<clinit>(Natives.java:60) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:123) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:191) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:404) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:180) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:171) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) [opensearch-cli-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cli.Command.main(Command.java:101) [opensearch-cli-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:137) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:103) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
[2023-08-17T17:07:28,461][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot check if running as root because JNA is not available
[2023-08-17T17:07:28,461][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot install system call filter because JNA is not available
[2023-08-17T17:07:28,461][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot register console handler because JNA is not available
[2023-08-17T17:07:28,462][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot getrlimit RLIMIT_NPROC because JNA is not available
[2023-08-17T17:07:28,462][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot getrlimit RLIMIT_AS because JNA is not available
[2023-08-17T17:07:28,462][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot getrlimit RLIMIT_FSIZE because JNA is not available
[2023-08-17T17:07:28,523][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] version[3.0.0-SNAPSHOT], pid[6503], build[zip/c32055911fd2118c25fe3bf88b56894a95b69235/2023-08-16T20:54:17.859586Z], OS[Mac OS X/13.5/aarch64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.5/17.0.5+8]
[2023-08-17T17:07:28,524][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] JVM home [/Users/steecraw/.sdkman/candidates/java/17.0.5-tem]
[2023-08-17T17:07:28,527][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] JVM arguments [-Xshare:auto, -Dopensearch.networkaddress.cache.ttl=60, -Dopensearch.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/tmp, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=logs, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.util.concurrent.ForkJoinPool.common.threadFactory=org.opensearch.secure_sm.SecuredForkJoinWorkerThreadFactory, -Xms512m, -Xmx512m, -ea, -esa, -DtestKey=testValue, -XX:MaxDirectMemorySize=268435456, -Dopensearch.path.home=/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/distro/3.0.0-INTEG_TEST, -Dopensearch.path.conf=/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config, -Dopensearch.distribution.type=zip, -Dopensearch.bundled_jdk=false]
[2023-08-17T17:07:28,527][WARN ][o.o.n.Node               ] [securityBwcCluster0-0] version [3.0.0-SNAPSHOT] is a pre-release version of OpenSearch and is not suitable for production
[2023-08-17T17:07:28,790][INFO ][o.o.s.s.t.SSLConfig      ] [securityBwcCluster0-0] SSL dual mode is disabled
[2023-08-17T17:07:28,791][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] OpenSearch Config path is /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config
[2023-08-17T17:07:28,890][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] JVM supports TLSv1.3
[2023-08-17T17:07:28,891][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] Config directory is /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/, from there the key- and truststore files are resolved relatively
[2023-08-17T17:07:29,209][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] TLS Transport Client Provider : JDK
[2023-08-17T17:07:29,209][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] TLS Transport Server Provider : JDK
[2023-08-17T17:07:29,209][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] TLS HTTP Provider             : JDK
[2023-08-17T17:07:29,209][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] Enabled TLS protocols for transport layer : [TLSv1.3, TLSv1.2]
[2023-08-17T17:07:29,210][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] Enabled TLS protocols for HTTP layer      : [TLSv1.3, TLSv1.2]
[2023-08-17T17:07:29,219][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Clustername: securityBwcCluster0
[2023-08-17T17:07:29,223][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Directory /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config has insecure file permissions (should be 0700)
[2023-08-17T17:07:29,223][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/kirk.pem has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,223][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/esnode-key.pem has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,223][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/root-ca.pem has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,223][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Directory /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/jvm.options.d has insecure file permissions (should be 0700)
[2023-08-17T17:07:29,224][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/kirk-key.pem has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,224][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/kirk-keystore.jks has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,224][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/opensearch.yml has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,224][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/esnode.pem has insecure file permissions (should be 0600)
[2023-08-17T17:07:29,227][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] loaded module [transport-netty4]
[2023-08-17T17:07:29,227][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] loaded plugin [opensearch-security]
[2023-08-17T17:07:29,237][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in opensearch.yml
[2023-08-17T17:07:29,238][INFO ][o.o.e.ExtensionsManager  ] [securityBwcCluster0-0] ExtensionsManager initialized
[2023-08-17T17:07:29,250][INFO ][o.o.e.NodeEnvironment    ] [securityBwcCluster0-0] using [1] data paths, mounts [[/System/Volumes/Data (/dev/disk3s5)]], net usable_space [271.2gb], net total_space [460.4gb], types [apfs]
[2023-08-17T17:07:29,251][INFO ][o.o.e.NodeEnvironment    ] [securityBwcCluster0-0] heap size [512mb], compressed ordinary object pointers [true]
[2023-08-17T17:07:29,306][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] node name [securityBwcCluster0-0], node ID [nIIK5MlvS9KOJF81nv9v8w], cluster name [securityBwcCluster0], roles [ingest, remote_cluster_client, data, cluster_manager]
[2023-08-17T17:07:29,790][WARN ][o.o.s.c.Salt             ] [securityBwcCluster0-0] If you plan to use field masking pls configure compliance salt e1ukloTsQlOgPquJ to be a random string of 16 chars length identical on all nodes
[2023-08-17T17:07:29,811][INFO ][o.o.s.a.i.AuditLogImpl   ] [securityBwcCluster0-0] Message routing enabled: true
[2023-08-17T17:07:29,832][INFO ][o.o.s.f.SecurityFilter   ] [securityBwcCluster0-0] <NONE> indices are made immutable.
[2023-08-17T17:07:29,962][INFO ][o.o.t.NettyAllocator     ] [securityBwcCluster0-0] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}]
[2023-08-17T17:07:30,018][INFO ][o.o.d.DiscoveryModule    ] [securityBwcCluster0-0] using discovery type [zen] and seed hosts providers [settings, file]
[2023-08-17T17:07:30,141][WARN ][o.o.g.DanglingIndicesState] [securityBwcCluster0-0] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2023-08-17T17:07:30,225][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] initialized
[2023-08-17T17:07:30,225][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] starting ...
[2023-08-17T17:07:30,277][INFO ][o.o.t.TransportService   ] [securityBwcCluster0-0] publish_address {127.0.0.1:53882}, bound_addresses {[::1]:53881}, {127.0.0.1:53882}
[2023-08-17T17:07:30,277][INFO ][o.o.t.TransportService   ] [securityBwcCluster0-0] Remote clusters initialized successfully.
[2023-08-17T17:07:30,438][WARN ][o.o.b.BootstrapChecks    ] [securityBwcCluster0-0] system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2023-08-17T17:07:30,438][INFO ][o.o.c.c.Coordinator      ] [securityBwcCluster0-0] cluster UUID [YTTotJadT1O-5eH62k-y9g]
[2023-08-17T17:07:30,440][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-0] startInitialJoin: coordinator becoming CANDIDATE in term 3 (was null, lastKnownLeader was [Optional.empty])
[2023-08-17T17:07:30,441][WARN ][o.o.d.FileBasedSeedHostsProvider] [securityBwcCluster0-0] expected, but did not find, a dynamic hosts list at [/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/unicast_hosts.txt]
[2023-08-17T17:07:30,450][INFO ][o.o.h.AbstractHttpServerTransport] [securityBwcCluster0-0] publish_address {127.0.0.1:53884}, bound_addresses {[::1]:53883}, {127.0.0.1:53884}
[2023-08-17T17:07:30,451][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] started
[2023-08-17T17:07:30,451][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Node started
[2023-08-17T17:07:30,451][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Will attempt to create index .opendistro_security and default configs if they are absent
[2023-08-17T17:07:30,452][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] 0 OpenSearch Security modules loaded so far: []
[2023-08-17T17:07:30,452][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Background init thread started. Install default config?: true
[2023-08-17T17:07:30,452][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Wait for cluster to be available ...
[2023-08-17T17:07:30,722][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:30,836][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:30,949][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:31,060][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:31,174][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:31,199][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:31,313][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:31,433][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-17T17:07:31,454][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Wait for cluster to be available ...
``

@stephen-crawford
Copy link
Contributor

First issue in updated node occurs before update:

[2023-08-18T12:38:30,543][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] unable to load JNA native support library, native methods will be disabled.
java.lang.UnsatisfiedLinkError: Can't load library: /Users/steecraw/Library/Caches/JNA/temp/jna11807200050257941166.tmp
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2393) ~[?:?]
	at java.lang.Runtime.load0(Runtime.java:755) ~[?:?]
	at java.lang.System.load(System.java:1953) ~[?:?]
	at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:1018) ~[jna-5.5.0.jar:5.5.0 (b0)]
	at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:988) ~[jna-5.5.0.jar:5.5.0 (b0)]
	at com.sun.jna.Native.<clinit>(Native.java:195) ~[jna-5.5.0.jar:5.5.0 (b0)]
	at java.lang.Class.forName0(Native Method) ~[?:?]
	at java.lang.Class.forName(Class.java:375) ~[?:?]
	at org.opensearch.bootstrap.Natives.<clinit>(Natives.java:60) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:123) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:191) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:404) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:180) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:171) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) [opensearch-cli-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.cli.Command.main(Command.java:101) [opensearch-cli-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:137) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
	at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:103) [opensearch-2.9.0-SNAPSHOT.jar:2.9.0-SNAPSHOT]
[2023-08-18T12:38:30,547][WARN ][o.o.b.Natives            ] [securityBwcCluster0-0] cannot check if running as root because JNA is not a

This seems standard however as I see it appear elsewhere on the non-upgrading nodes as well.

After that we transition into some basic config logging

[2023-08-18T12:38:30,968][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] OpenSearch Config path is /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config
[2023-08-18T12:38:31,102][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] JVM supports TLSv1.3
[2023-08-18T12:38:31,103][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] Config directory is /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/, from there the key- and truststore files are resolved relatively
[2023-08-18T12:38:31,440][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] TLS Transport Client Provider : JDK
[2023-08-18T12:38:31,440][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] TLS Transport Server Provider : JDK
[2023-08-18T12:38:31,441][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] TLS HTTP Provider             : JDK
[2023-08-18T12:38:31,441][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] Enabled TLS protocols for transport layer : [TLSv1.3, TLSv1.2]
[2023-08-18T12:38:31,441][INFO ][o.o.s.s.DefaultSecurityKeyStore] [securityBwcCluster0-0] Enabled TLS protocols for HTTP layer      : [TLSv1.3, TLSv1.2]
[2023-08-18T12:38:31,459][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Clustername: securityBwcCluster0
[2023-08-18T12:38:31,465][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Directory /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config has insecure file permissions (should be 0700)
[2023-08-18T12:38:31,465][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/kirk.pem has insecure file permissions (should be 0600)
[2023-08-18T12:38:31,465][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/esnode-key.pem has insecure file permissions (should be 0600)
[2023-08-18T12:38:31,465][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/root-ca.pem has insecure file permissions (should be 0600)
[2023-08-18T12:38:31,466][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Directory /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/jvm.options.d has insecure file permissions (should be 0700)
[2023-08-18T12:38:31,466][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/kirk-key.pem has insecure file permissions (should be 0600)
[2023-08-18T12:38:31,466][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/kirk-keystore.jks has insecure file permissions (should be 0600)
[2023-08-18T12:38:31,466][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/opensearch.yml has insecure file permissions (should be 0600)
[2023-08-18T12:38:31,466][WARN ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] File /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/esnode.pem has insecure file permissions (should be 0600)

Again, this appears correct.

[2023-08-18T12:38:31,469][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] loaded module [transport-netty4]
[2023-08-18T12:38:31,470][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] loaded plugin [opensearch-security]
[2023-08-18T12:38:31,482][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in opensearch.yml
[2023-08-18T12:38:31,486][INFO ][o.o.e.ExtensionsManager  ] [securityBwcCluster0-0] ExtensionsManager initialized
[2023-08-18T12:38:31,505][INFO ][o.o.e.NodeEnvironment    ] [securityBwcCluster0-0] using [1] data paths, mounts [[/System/Volumes/Data (/dev/disk3s5)]], net usable_space [266.9gb], net total_space [460.4gb], types [apfs]
[2023-08-18T12:38:31,506][INFO ][o.o.e.NodeEnvironment    ] [securityBwcCluster0-0] heap size [512mb], compressed ordinary object pointers [true]
[2023-08-18T12:38:31,540][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] node name [securityBwcCluster0-0], node ID [mWKwveN4RhmQ60MfGRLb6w], cluster name [securityBwcCluster0], roles [ingest, remote_cluster_client, data, cluster_manager]
[2023-08-18T12:38:32,135][WARN ][o.o.s.c.Salt             ] [securityBwcCluster0-0] If you plan to use field masking pls configure compliance salt e1ukloTsQlOgPquJ to be a random string of 16 chars length identical on all nodes
[2023-08-18T12:38:32,160][INFO ][o.o.s.a.i.AuditLogImpl   ] [securityBwcCluster0-0] Message routing enabled: true
[2023-08-18T12:38:32,179][INFO ][o.o.s.f.SecurityFilter   ] [securityBwcCluster0-0] <NONE> indices are made immutable.
[2023-08-18T12:38:32,345][INFO ][o.o.t.NettyAllocator     ] [securityBwcCluster0-0] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}]
[2023-08-18T12:38:32,406][INFO ][o.o.d.DiscoveryModule    ] [securityBwcCluster0-0] using discovery type [zen] and seed hosts providers [settings, file]
[2023-08-18T12:38:32,592][WARN ][o.o.g.DanglingIndicesState] [securityBwcCluster0-0] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2023-08-18T12:38:32,704][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] initialized
[2023-08-18T12:38:32,704][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] starting ...
[2023-08-18T12:38:32,767][INFO ][o.o.t.TransportService   ] [securityBwcCluster0-0] publish_address {127.0.0.1:54466}, bound_addresses {[::1]:54465}, {127.0.0.1:54466}
[2023-08-18T12:38:32,768][INFO ][o.o.t.TransportService   ] [securityBwcCluster0-0] Remote clusters initialized successfully.
[2023-08-18T12:38:32,912][WARN ][o.o.b.BootstrapChecks    ] [securityBwcCluster0-0] system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2023-08-18T12:38:32,915][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-0] startInitialJoin: coordinator becoming CANDIDATE in term 0 (was null, lastKnownLeader was [Optional.empty])
[2023-08-18T12:38:32,917][WARN ][o.o.d.FileBasedSeedHostsProvider] [securityBwcCluster0-0] expected, but did not find, a dynamic hosts list at [/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/unicast_hosts.txt]
[2023-08-18T12:38:32,928][INFO ][o.o.h.AbstractHttpServerTransport] [securityBwcCluster0-0] publish_address {127.0.0.1:54468}, bound_addresses {[::1]:54467}, {127.0.0.1:54468}
[2023-08-18T12:38:32,929][INFO ][o.o.n.Node               ] [securityBwcCluster0-0] started

Other node components are then loaded, an address is published, and the node starts.

[2023-08-18T12:38:32,930][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Will attempt to create index .opendistro_security and default configs if they are absent
[2023-08-18T12:38:32,931][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-0] 0 OpenSearch Security modules loaded so far: []
[2023-08-18T12:38:32,931][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Background init thread started. Install default config?: true
[2023-08-18T12:38:32,931][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Wait for cluster to be available ...
[2023-08-18T12:38:33,921][WARN ][o.o.d.FileBasedSeedHostsProvider] [securityBwcCluster0-0] expected, but did not find, a dynamic hosts list at [/Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/unicast_hosts.txt]
[2023-08-18T12:38:33,932][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Wait for cluster to be available ...

Node then loads configuration and looks for dynamic host lists. It repeats that it is waiting for the cluster be available several times.

@stephen-crawford
Copy link
Contributor

Then the node starts joining others to form a cluster:

[securityBwcCluster0-0] scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=82, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
[2023-08-18T12:38:37,264][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:37,290][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-0] scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=82, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}} starting election
[2023-08-18T12:38:37,290][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-0] scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=671, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
[2023-08-18T12:38:37,293][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-0] PreVotingRound{preVotesReceived={}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=0}, isClosed=false} requesting pre-votes from [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]
[2023-08-18T12:38:37,310][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-0] PreVotingRound{preVotesReceived={{securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0}, {securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=0}, isClosed=false} added PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
[2023-08-18T12:38:37,310][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-0] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=0}, isClosed=false} added PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
[2023-08-18T12:38:37,335][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-0] PreVotingRound{preVotesReceived={{securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0}, {securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0}, {securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=true, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=0}, isClosed=false} added PreVoteResponse{currentTerm=0, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, starting election
[2023-08-18T12:38:37,335][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-0] starting election with StartJoinRequest{term=1,node={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
[2023-08-18T12:38:37,336][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-0] joinLeaderInTerm: for [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}] with term 1
[2023-08-18T12:38:37,336][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-0] handleStartJoin: leaving term [0] due to StartJoinRequest{term=1,node={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
[2023-08-18T12:38:37,369][DEBUG][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] attempting to join {securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, minimumTerm=0, optionalJoin=Optional[Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}]}
[2023-08-18T12:38:37,373][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:37,371][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-0] handleJoin: added join Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} from [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}] for election, electionWon=false lastAcceptedTerm=0 lastAcceptedVersion=0
[2023-08-18T12:38:37,370][DEBUG][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] successful response to StartJoinRequest{term=1,node={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} from {securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}
[2023-08-18T12:38:37,386][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:37,388][DEBUG][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] successful response to StartJoinRequest{term=1,node={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} from {securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}
[2023-08-18T12:38:37,405][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-0] handleJoin: added join Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} from [{securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}] for election, electionWon=true lastAcceptedTerm=0 lastAcceptedVersion=0
[2023-08-18T12:38:37,407][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-0] handleJoin: election won in term [1] with VoteCollection{votes=[mWKwveN4RhmQ60MfGRLb6w, frpm-K-xSwmPuqdYsz8U0A], joins=[Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}, Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}]}
[2023-08-18T12:38:37,411][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-0] handleJoinRequest: coordinator becoming LEADER in term 1 (was CANDIDATE, lastKnownLeader was [Optional.empty])
[2023-08-18T12:38:37,422][DEBUG][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] received a join request for an existing node [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]
[2023-08-18T12:38:37,429][INFO ][o.o.c.s.MasterService    ] [securityBwcCluster0-0] elected-as-cluster-manager ([2] nodes joined)[{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, {securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_MANAGER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, delta: cluster-manager node changed {previous [], current [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]}, added {{securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
[2023-08-18T12:38:37,439][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received full cluster state version [1] with size [429]
[2023-08-18T12:38:37,457][DEBUG][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] successful response to StartJoinRequest{term=1,node={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} from {securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}
[2023-08-18T12:38:37,498][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:37,498][INFO ][o.o.c.c.CoordinationState] [securityBwcCluster0-0] cluster UUID set to [Vw4CIyDYRpaLD1d9WnpxQw]

For some reason 0-2 is listed twice while 0-1 is not listed.

The node then states that it successfully joined itself:

[securityBwcCluster0-0] cluster-manager node changed {previous [], current [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]}, added {{securityBwcCluster0-1}{frpm-K-xSwmPuqdYsz8U0A}{c9ImaN6YQXOTHENLOPhj7A}{127.0.0.1}{127.0.0.1:54472}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}, term: 1, version: 1, reason: Publication{term=1, version=1}
[2023-08-18T12:38:37,565][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=1}
[2023-08-18T12:38:37,569][DEBUG][o.o.c.c.JoinHelper       ] [securityBwcCluster0-0] successfully joined {securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, minimumTerm=0, optionalJoin=Optional[Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, targetNode={securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}]}
[2023-08-18T12:38:37,569][INFO ][o.o.d.PeerFinder         ] [securityBwcCluster0-0] setting findPeersInterval to [1s] as node commission status = [true] for local node [{securityBwcCluster0-0}{mWKwveN4RhmQ60MfGRLb6w}{BKgovVncSY2hgwndsgq7sQ}{127.0.0.1}{127.0.0.1:54466}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]
[2023-08-18T12:38:37,570][INFO ][o.o.c.s.MasterService    ] [securityBwcCluster0-0] node-join[{securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} join existing leader], term: 1, version: 2, delta: added {{securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
[2023-08-18T12:38:37,571][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received full cluster state version [2] with size [478]
[2023-08-18T12:38:37,573][INFO ][o.o.c.r.a.DiskThresholdMonitor] [securityBwcCluster0-0] skipping monitor as a check is already in progress
[2023-08-18T12:38:37,607][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:37,646][INFO ][o.o.c.s.ClusterApplierService] [securityBwcCluster0-0] added {{securityBwcCluster0-2}{ID4lsFuCRu6IfHjxbGddTg}{zkPSLl4SSSCNnOl7qKHllw}{127.0.0.1}{127.0.0.1:54477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}, term: 1, version: 2, reason: Publication{term=1, version=2}
[2023-08-18T12:38:37,647][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=2}
[2023-08-18T12:38:37,648][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received full cluster state version [3] with size [480]
[2023-08-18T12:38:37,718][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:37,768][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=3}
[2023-08-18T12:38:37,774][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received full cluster state version [4] with size [403]
[2023-08-18T12:38:37,808][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=4}
[2023-08-18T12:38:37,808][INFO ][o.o.g.GatewayService     ] [securityBwcCluster0-0] recovered [0] indices into cluster_state

BackendRegistry continually states that cluster is not initialized while

Setting replication.type: DOCUMENT will be used for Index until Segment Replication supports System and Hidden indices
[2023-08-18T12:38:37,962][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] PluginService:onIndexModule index:[.opendistro_security/OdEfyqiYRI-9sXGvFapsFQ]
[2023-08-18T12:38:37,966][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-0] scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=671, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}} not starting election
[2023-08-18T12:38:38,009][INFO ][o.o.c.m.MetadataCreateIndexService] [securityBwcCluster0-0] [.opendistro_security] creating index, cause [api], templates [], shards [1]/[1]
[2023-08-18T12:38:38,015][INFO ][o.o.c.r.a.AllocationService] [securityBwcCluster0-0] updating number_of_replicas to [2] for indices [.opendistro_security]
[2023-08-18T12:38:38,027][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received diff cluster state version [5] with uuid [8c_0gAWoSS2fVbHVwsofgg], diff size [460]
[2023-08-18T12:38:38,053][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,163][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,202][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=5}
[2023-08-18T12:38:38,267][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,329][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received diff cluster state version [6] with uuid [OWHQUrogQoino33iAS3ayA], diff size [462]
[2023-08-18T12:38:38,372][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,394][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Index .opendistro_security created?: true
[2023-08-18T12:38:38,394][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=6}
[2023-08-18T12:38:38,397][INFO ][o.o.s.c.ConfigurationRepository] [securityBwcCluster0-0] Node started, try to initialize it. Wait for at least yellow cluster state....
[2023-08-18T12:38:38,402][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received diff cluster state version [7] with uuid [gkvmh1GoQly6eZpSTAfEiQ], diff size [321]
[2023-08-18T12:38:38,403][INFO ][o.o.s.s.ConfigHelper     ] [securityBwcCluster0-0] Will update 'config' with /Users/steecraw/security/bwc-test/build/testclusters/securityBwcCluster0-0/config/opensearch-security/config.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false
[2023-08-18T12:38:38,433][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] PluginService:onIndexModule index:[.opendistro_security/OdEfyqiYRI-9sXGvFapsFQ]
[2023-08-18T12:38:38,470][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=7}
[2023-08-18T12:38:38,475][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received diff cluster state version [8] with uuid [Hx22rt6pR26uKZscyRNQ3Q], diff size [341]
[2023-08-18T12:38:38,491][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,502][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,613][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,624][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=8}
[2023-08-18T12:38:38,629][INFO ][o.o.p.PluginsService     ] [securityBwcCluster0-0] PluginService:onIndexModule index:[.opendistro_security/OdEfyqiYRI-9sXGvFapsFQ]
[2023-08-18T12:38:38,640][INFO ][o.o.c.m.MetadataMappingService] [securityBwcCluster0-0] [.opendistro_security/OdEfyqiYRI-9sXGvFapsFQ] create_mapping
[2023-08-18T12:38:38,645][DEBUG][o.o.c.c.PublicationTransportHandler] [securityBwcCluster0-0] received diff cluster state version [9] with uuid [vecMUuvrQ9SE97jvsrpWbA], diff size [532]
[2023-08-18T12:38:38,724][ERROR][o.o.s.a.BackendRegistry  ] [securityBwcCluster0-0] Not yet initialized (you may need to run securityadmin)
[2023-08-18T12:38:38,742][DEBUG][o.o.c.c.C.CoordinatorPublication] [securityBwcCluster0-0] publication ended successfully: Publication{term=1, version=9}

@stephen-crawford
Copy link
Contributor

I opened a manual backport against 2.x in core.

@stephen-crawford
Copy link
Contributor

I was able to get all tests passing on a change from 2.8 to 2.9.

Here are the steps to reproduce:

Use this build of 2.9: https://github.com/scrawfor99/OpenSearch/tree/2.9.0
and this build of 2.8: https://github.com/scrawfor99/OpenSearch/tree/2.8.0

For each of these, I cherry picked the commits to my backport in the 2.x line which can be found here: opensearch-project/OpenSearch#9444.

I then ran ./gradlew localDistro and ./gradlew publishToMavenLocal for each.
I then went to the security branch here: https://github.com/scrawfor99/security/tree/working2BWC and followed the steps used previously but replacing the earlier version for the build.gradle BWC test file with 2.8 and the later with 2.9.

Finally I ran:

./gradlew -p bwc-test bwcTestSuite -Dtests.opensearch.secure=true -Dtests.opensearch.username=admin -Dtests.opensearch.password=admin -PcustomDistributionUrl="${GIT_PROJECT_ROOT}/OpenSearch/distribution/archives/linux-tar/build/distributions/opensearch-min-2.8.0-SNAPSHOT-linux-x64.tar.gz" -i

Which grabbed the localDistro of the earlier version file which I had created. I am skeptical that part is truly necessary however.

Here are the test results parsed from the output:

> Task :securityBwcCluster#mixedClusterTaskrTask0
Caching disabled for task ':securityBwcCluster#mixedClusterTask' because:ecurity/bwc-test/build/test-results/securityBwcCluster#oldVersionClusterTask0
  Build cache is disablededClusterTaskrTask0 > 1 test completedrg.gradle.api.internal.tasks.testing.worker.TestWorkerityIT

> Task :securityBwcCluster#fullRestartClusterTask
Caching disabled for task ':securityBwcCluster#fullRestartClusterTask' because:y/bwc-test/build/test-results/securityBwcCluster#oldVersionClusterTask1
  Build cache is disabledlRestartClusterTask > 1 test completedrg.gradle.api.internal.tasks.testing.worker.TestWorkerityIT

Task :securityBwcCluster#rollingUpgradeClusterTask
Finished generating test XML results (0.0 secs) into: /Users/steecraw/security/bwc-test/build/test-results/securityBwcCluster#rollingUpgradeClusterTask
Generating HTML test report...pgradeClusterTask > 1 test completedgradle.api.internal.tasks.testing.worker.TestWorkerbilityITe/index].

This appears to me that all tests pass since the assertions of the tests are all based around calling the SecurityBackwardsCompatabilityIT:

org.opensearch.security.bwc.SecurityBackwardsCompatibilityIT STANDARD_OUTrch.security.bwc.SecurityBackwardsCompatibilityIT
    [2023-08-18T12:38:22,482][WARN ][o.o.b.Natives            ] [[SUITE-SecurityBackwardsCompatibilityIT-seed#[1934D36967F83B97]]] unable to load JNA native support library, na
tive methods will be disabled./work -Dorg.gradle.native=false -Dtests.artifact=bwc-test -Dtests.clustername=securityBwcCluster1 -Dtests.gradle=

@stephen-crawford
Copy link
Contributor

Blocked pending direction forward -- do we want to work on fixing whatever is wrong with main or prioritize merging (#2802)?

@stephen-crawford
Copy link
Contributor

Going to be splitting this into a 2.x fix and a main fix for now. That will mean that the changes in the working 2.x will be submitted as a PR to the 2.x line and then we will follow up on main from a separate issue.

cwperks added a commit that referenced this issue Aug 30, 2023
### Description

Opening up a PR to describe the issues faced with BWC tests with the
security plugin installed and solicit feedback.

I plan to forward port this change to main, but first wanted to show
this working for 2.9 -> 2.10 tests (as of the time of this writing).

Thanks to the work that @scrawfor99 did in
[core](opensearch-project/OpenSearch#8900) to
supply security settings to testClusters to be able to run the initial
wait for cluster yellow checks with a URL that includes the right
protocol (`https` when security is enabled) along with a username and
password to authenticate the request.

I ran into 4 hurdles to get this to run:

1. Initially the cluster didn't form. After a lot of frustration, I
ended up finding that by supplying `network.bind_host` and
`network.publish_host` to both 127.0.0.1 it resolved the issue. These
could probably be combined into a single `network.host`, but I chose to
keep them separated.
2. I had issue testing changes to the gradle build-tools after making
changes locally. This was the most frustrating hurdle, but ultimately
the solution was to change the [`opensearch.version` setting in
`bwc-test/build.gradle`](https://github.com/opensearch-project/security/blob/2.x/bwc-test/build.gradle#L47)
to `2.10.0-SNAPSHOT`. This value is specifically used as the version of
the gradle build-tools that the [BWC tests
use](https://github.com/opensearch-project/security/blob/main/bwc-test/build.gradle#L58).
The changes I made locally didn't reflect because I was publishing to
maven local from the 2.x branch (currently 2.10) and it was looking for
2.9.0-SNAPSHOT artifacts. After updating the value it found my maven
local snapshots. For this artifact you can produce maven local snapshots
using `./gradlew :build-tools:publishToMavenLocal` from the respective
branch in the core repo.
3. After the waitForYellow checks were able to run successfully, the
REST Client in the SecurityBackwardsCompatibilityIT was also having
problems connecting to the cluster because it didn't recognize the
certificates of the server. I ended up using the overly trustworthy
route where there is no SSL verification for the REST Client used in
this test. I borrowed this implementation from [k-NN's
ODFERestTestCase](https://github.com/opensearch-project/k-NN/blob/2.x/src/testFixtures/java/org/opensearch/knn/ODFERestTestCase.java#L118-L141)
which is widely used in the plugin ecosystem. There is an open issue to
abstract this class into common-utils. More work can be done here to
ensure the rest-high-level-client runs with a truststore with the root
certificate.
4. The last hurdle I faced was a WarningFailureException where the REST
Client could not deserialize the cluster health response because of a
warning that was returned with the response about the request including
system indices. According to this
[comment](opensearch-project/OpenSearch#1108 (comment)),
this may only be enabled in snapshots. To fix this, I set preserve
cluster to true which [bypasses the
method](https://github.com/opensearch-project/OpenSearch/blob/main/test/framework/src/main/java/org/opensearch/test/rest/OpenSearchRestTestCase.java#L364)
where the error was thrown.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Enhancement

### Issues Resolved

#3056

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
cwperks added a commit to cwperks/security that referenced this issue Aug 30, 2023
Opening up a PR to describe the issues faced with BWC tests with the
security plugin installed and solicit feedback.

I plan to forward port this change to main, but first wanted to show
this working for 2.9 -> 2.10 tests (as of the time of this writing).

Thanks to the work that @scrawfor99 did in
[core](opensearch-project/OpenSearch#8900) to
supply security settings to testClusters to be able to run the initial
wait for cluster yellow checks with a URL that includes the right
protocol (`https` when security is enabled) along with a username and
password to authenticate the request.

I ran into 4 hurdles to get this to run:

1. Initially the cluster didn't form. After a lot of frustration, I
ended up finding that by supplying `network.bind_host` and
`network.publish_host` to both 127.0.0.1 it resolved the issue. These
could probably be combined into a single `network.host`, but I chose to
keep them separated.
2. I had issue testing changes to the gradle build-tools after making
changes locally. This was the most frustrating hurdle, but ultimately
the solution was to change the [`opensearch.version` setting in
`bwc-test/build.gradle`](https://github.com/opensearch-project/security/blob/2.x/bwc-test/build.gradle#L47)
to `2.10.0-SNAPSHOT`. This value is specifically used as the version of
the gradle build-tools that the [BWC tests
use](https://github.com/opensearch-project/security/blob/main/bwc-test/build.gradle#L58).
The changes I made locally didn't reflect because I was publishing to
maven local from the 2.x branch (currently 2.10) and it was looking for
2.9.0-SNAPSHOT artifacts. After updating the value it found my maven
local snapshots. For this artifact you can produce maven local snapshots
using `./gradlew :build-tools:publishToMavenLocal` from the respective
branch in the core repo.
3. After the waitForYellow checks were able to run successfully, the
REST Client in the SecurityBackwardsCompatibilityIT was also having
problems connecting to the cluster because it didn't recognize the
certificates of the server. I ended up using the overly trustworthy
route where there is no SSL verification for the REST Client used in
this test. I borrowed this implementation from [k-NN's
ODFERestTestCase](https://github.com/opensearch-project/k-NN/blob/2.x/src/testFixtures/java/org/opensearch/knn/ODFERestTestCase.java#L118-L141)
which is widely used in the plugin ecosystem. There is an open issue to
abstract this class into common-utils. More work can be done here to
ensure the rest-high-level-client runs with a truststore with the root
certificate.
4. The last hurdle I faced was a WarningFailureException where the REST
Client could not deserialize the cluster health response because of a
warning that was returned with the response about the request including
system indices. According to this
[comment](opensearch-project/OpenSearch#1108 (comment)),
this may only be enabled in snapshots. To fix this, I set preserve
cluster to true which [bypasses the
method](https://github.com/opensearch-project/OpenSearch/blob/main/test/framework/src/main/java/org/opensearch/test/rest/OpenSearchRestTestCase.java#L364)
where the error was thrown.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Enhancement

opensearch-project#3056

- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-compatibility bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests

3 participants