Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AZURE] [SNAPSHOT] [Repository] "Unable to find client with name [default] " #8245

Closed
kapendra-sentieo opened this issue Nov 19, 2024 · 3 comments
Labels

Comments

@kapendra-sentieo
Copy link

kapendra-sentieo commented Nov 19, 2024

Encountering 'Unable to find client with name [default] ' error when configuring Azure snapshot repository in Elasticsearch 8.14 with ECK Operator 0.14.

Details:

Elasticsearch: Version: 8.14
ECK Stack Chart Version: 2.15
ECK Operator Version: 0.14

Snapshot Repository Configuration: We are using a secret setting to configure an Azure repository for Elasticsearch snapshots, as per Elastic's documentation on snapshots.

Issue Encountered:

When attempting to configure the Azure repository for snapshots, the following error occurred:

{
 "error": {
   "root_cause": [
     {
       "type": "repository_exception",
       "reason": "[my_azure_repository] Could not determine repository generation from root blobs"
     }
   ],
   "type": "repository_exception",
   "reason": "[my_azure_repository] Could not determine repository generation from root blobs",
   "caused_by": {
     "type": "i_o_exception",
     "reason": "Unable to list blobs by prefix [index-] for path ",
     "caused_by": {
       "type": "settings_exception",
       "reason": "Unable to find client with name [default]"
     }
   }
 },
 "status": 500
}

Potential Cause:

The error indicates a missing or misconfigured client setting, specifically for the default client in the Azure repository configuration.

As solution Next Steps:
we Create and Apply Secret for Azure Client and ensured that the secret is created with the correct format for the Azure repository:

kubectl create secret generic es-cluster-test-default-account \
  --from-literal=azure.client.default.account=${STORAGE_ACCOUNT_NAME}

Review Configuration:
we Verify that the Azure client name (default) and the storage account name are correctly referenced in both the secret and the Elasticsearch configuration.

After i added the secure setting I ran to another issue

Elasticsearch pods in my cluster are stuck in a crash loop due to issues loading the Azure Repository Plugin

Configuration Snippet

eck-elasticsearch:
  fullnameOverride: es-cluster-test
  labels: 
    app: es-cluster
  secureSettings:
    - secretName: es-cluster-test-default-account
  nodeSets:
    - name: master
      count: 2
      config:
        node.roles: ["master"]
      podTemplate:
        metadata:
          labels:
            nodeSet: master
            azure.workload.identity/use: "true"
.....
.....
            serviceAccountName: "es-cluster-test-storage-sa"
            containers:
            - name: elasticsearch
    - name: master-voting-only
      count: 1
      config:
        node.roles: ["master", "voting_only"]
      podTemplate:
        metadata:
          labels:
            nodeSet: master-voting-only
            azure.workload.identity/use: "true"
.....
.....
            serviceAccountName: "es-cluster-test-backup-storage-sa"
            containers:
            - name: elasticsearch

Logs Operator:

{"log.level":"info","@timestamp":"2024-11-19T16:12:03.859Z","log.logger":"elasticsearch-controller","message":"Elasticsearch cannot be reached yet, re-queuing","service.version":"2.15.0+61f99ee5","service.type":"eck","ecs.version":"1.4.0","iteration":"1244","namespace":"as-elasticsearch","es_name":"es-cluster-test"}
{"log.level":"info","@timestamp":"2024-11-19T16:12:03.860Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"2.15.0+61f99ee5","service.type":"eck","ecs.version":"1.4.0","iteration":"1244","namespace":"as-elasticsearch","es_name":"es-cluster-test"}

ES Pod Logs

{"@timestamp":"2024-11-19T16:01:01.348Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"es-cluster-test-es-master-0","elasticsearch.cluster.name":"es-cluster-test","error.type":"java.lang.IllegalStateException","error.message":"failed to load plugin class [org.elasticsearch.repositories.azure.AzureRepositoryPlugin]","error.stack_trace":"java.lang.IllegalStateException: failed to load plugin class [org.elasticsearch.repositories.azure.AzureRepositoryPlugin]\n\tat [email protected]/org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:688)...\nCaused by: org.elasticsearch.common.settings.SettingsException: Neither a secret key nor a shared access token was set.\n"}

Issue Details

The Elasticsearch pods enter a crash loop with the following error in the logs:

"fatal exception while booting Elasticsearch... failed to load plugin class [org.elasticsearch.repositories.azure.AzureRepositoryPlugin]"

It appears to be related to the Azure repository plugin configuration, which may fail due to missing Azure credentials (secret key or shared access token).

Steps Taken

  • Verified that the secureSettings field contains the correct secretName.
  • Ensured that azure.workload.identity/use is set to "true" in pod metadata labels.
  • The correct Service account is attached
  • access was verified from a sample pod and i can list Azure container by attaching the service account es-cluster-test-backup-storage-saand label azure.workload.identity/use: "true"
    az storage blob list \
    --container-name es-cluster-test-backup \
    --account-name <REDACTED> \
    --query "[].name" -o tsv
  • if the secret setting is removed then pods again run successfully but fail in registering the repository with the same error as the shared

Expected Outcome

Elasticsearch pods should initialize correctly with access to the Azure repository plugin without entering a crash loop.

@botelastic botelastic bot added the triage label Nov 19, 2024
@pebrc
Copy link
Collaborator

pebrc commented Nov 21, 2024

As stated in the documentation Azure workload identity is only supported as of Elasticsearch 8.16. You state that you are using 8.14. You have to explicitly configure credentials with account and key information see https://www.elastic.co/guide/en/cloud-on-k8s/2.14/k8s-snapshots.html#k8s-basic-snapshot-azure

@pebrc pebrc closed this as completed Nov 21, 2024
@kapendra-sentieo
Copy link
Author

As suggested, I’ve updated the Elasticsearch version to 18.6.1 and ensured the Azure account details are added to the keystore. However, I’m unsure if the client keys need to be included as well. Currently, I have the client ID, tenant ID, and Azure token mounted via web identity.

REF:

  1. Elastic's documentation on snapshots
  2. Create automated snapshots

Here’s my setup snippet:

apiVersion: v1
kind: Pod
spec:
  containers:
.............................
    - command:
        - sh
        - -c
        - |
          bin/elasticsearch-plugin remove --batch repository-azure || true
          bin/elasticsearch-plugin install --batch repository-azure
      env:
        - name: AZURE_CLIENT_ID
          valueFrom:
            secretKeyRef:
              name: es-test-backup-storage-azure-01-identity
              key: AZURE_CLIENT_ID
        - name: AZURE_TENANT_ID
          valueFrom:
            secretKeyRef:
              name: es-test-backup-storage-azure-01-identity
              key: AZURE_TENANT_ID
        - name: AZURE_STORAGE_ACC_NAME
          value: <REDACTED>
        - name: AZURE_FEDERATED_TOKEN_FILE
          value: /var/run/secrets/azure/tokens/azure-identity-token
        - name: AZURE_AUTHORITY_HOST
          value: https://login.microsoftonline.com/
      volumeMounts:
        - mountPath: /var/run/secrets/azure/tokens
          name: azure-identity-token
          readOnly: true
  volumes:
    - name: azure-identity-token
      projected:
        sources:
          - serviceAccountToken:
              audience: api://AzureADTokenExchange
              expirationSeconds: 3600
              path: azure-identity-token

I’ve verified that the token is present at the specified location, and using the same setup, I can access Azure Blob Storage to view the content. However, when attempting to interact with Elasticsearch, I encounter the following error:

Registry command

POST _snapshot/my_azure_repository
{
  "type": "azure",
  "settings": {
    "container": "es-test-backup-azure-<random>"
  }
}

Error

{
  "error": {
    "root_cause": [
      {
        "type": "credential_unavailable_exception",
        "reason": """credential_unavailable_exception: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/environmentcredential/troubleshoot
Managed Identity authentication is not available.
Managed Identity authentication is not available.
SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
access denied ("java.io.FilePermission" "/usr/share/elasticsearch/AzureToolsForIntelliJ/AuthMethodDetails.json" "read")
access denied ("java.io.FilePermission" "/bin/sh" "execute")
Azure Powershell authentication failed. Error Details: access denied ("java.io.FilePermission" "/bin/bash" "execute"). To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/powershellcredential/troubleshoot
access denied ("java.io.FilePermission" "/bin/sh" "execute")To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azure-identity-java-default-azure-credential-troubleshoot""",
        "suppressed": [
          {
            "type": "exception",
            "reason": "exception: #block terminated with an error"
          }
        ]
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[my_azure_repository] path  is not accessible on master node",
    "caused_by": {
      "type": "credential_unavailable_exception",
      "reason": """credential_unavailable_exception: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/environmentcredential/troubleshoot
Managed Identity authentication is not available.
Managed Identity authentication is not available.
SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
access denied ("java.io.FilePermission" "/usr/share/elasticsearch/AzureToolsForIntelliJ/AuthMethodDetails.json" "read")
access denied ("java.io.FilePermission" "/bin/sh" "execute")
Azure Powershell authentication failed. Error Details: access denied ("java.io.FilePermission" "/bin/bash" "execute"). To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/powershellcredential/troubleshoot
access denied ("java.io.FilePermission" "/bin/sh" "execute")To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azure-identity-java-default-azure-credential-troubleshoot""",
      "suppressed": [
        {
          "type": "exception",
          "reason": "exception: #block terminated with an error"
        }
      ]
    }
  },
  "status": 500
}

Questions
Do I need to include additional client keys or configurations in the keystore for Elasticsearch?

@kapendra-sentieo
Copy link
Author

kapendra-sentieo commented Nov 28, 2024

Never mind I got the fix
/var/run/secrets/azure/tokens/azure-identity-token had be in the elasticsearch config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants