Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch sink not rolling over the index after upgrading from 2.7 to 2.10 #5258

Open
dmossakowski opened this issue Dec 12, 2024 · 4 comments

Comments

@dmossakowski
Copy link

Since migrating from 2.7.0 to 2.10.1 the index id stayed at the same number (000056) and is no longer rolling over. You can see below that previously indexes would be rolled around 500mb but this latest one is now at 108.5gb. Number of documents used to be around 300000 but now it's more than 73 million. Did I miss some configuration?

  | otel-v1-apm-span-000056 | green | Yes | open | 108.5gb | 54.1gb | 73591140 | 4806948 | 1 | 1
  | otel-v1-apm-span-000055 | green | Yes | open | 471mb | 235.5mb | 231704 | 24212 | 1 | 1
  | otel-v1-apm-span-000054 | green | Yes | open | 478.3mb | 239.1mb | 299849 | 20501 | 1 | 1
  | otel-v1-apm-span-000053 | green | Yes | open | 504mb | 252mb | 310190 | 26766 |  
..

Data prepper config:

raw-pipeline:
  workers: 2
  delay: "3000"
  source:
    pipeline:
      name: "otel-trace-pipeline"
  buffer:
    bounded_blocking:
      buffer_size: 10240
      batch_size: 160
  processor:
    - delete_entries:
        with_keys: ['command_args']
    - otel_traces:
    - otel_trace_group:
        hosts: ["https://opensearch-node1:9200"]
        username: admin
        password: ------
        insecure: true
  sink:
    - opensearch:
        hosts: ["https://opensearch-node1:9200"]
        index_type: trace-analytics-raw
        username: admin
        password: --------
        insecure: true
@dmossakowski
Copy link
Author

After looking at bug: #3506 I checked the alias setting and it is set:

{
  "otel-v1-apm-span-index-template": {
    "order": 0,
    "version": 1,
    "index_patterns": [
      "otel-v1-apm-span-*"
    ],
    "settings": {
      "index": {
        "opendistro": {
          "index_state_management": {
            "rollover_alias": "otel-v1-apm-span"
          }
        }
      }
    },

The last index also has it:

GET otel-v1-apm-span-000056/_settings

{
  "otel-v1-apm-span-000056": {
    "settings": {
      "index": {
        "replication": {
          "type": "DOCUMENT"
        },
        "opendistro": {
          "index_state_management": {
            "rollover_alias": "otel-v1-apm-span"
          }
        },
        "number_of_shards": "1",
        "provided_name": "otel-v1-apm-span-000056",
        "creation_date": "1718207944075",
        "number_of_replicas": "1",
        "uuid": "9oLapWNhTo-e1WCZ-Eiu6w",
        "version": {
          "created": "136347827",
          "upgraded": "136387927"
        }
..

Now I started a completely new cluster with empty indexes and I do see the roll over so the problem seems to be only on the cluster that was upgraded. What can I do there on this existing cluster to start rolling the indexes again? This same index just keeps growing.

otel-v1-apm-span-000056 | green | Yes | open | 108.5gb | 54.4gb | 73904379 | 4806948

@chenqi0805
Copy link
Collaborator

@dmossakowski Thanks for reaching out and reporting the issue! I have two questions

  1. Is otel-v1-apm-span-000056 the current write index?
  2. If yes, have you tried manually calling the rollover api?

Reference: https://opensearch.org/docs/latest/api-reference/index-apis/rollover/#rolling-over-an-index-alias-with-a-write-index

@dmossakowski
Copy link
Author

Hi, yes that was the current index at the time and yes I did try to roll it. The roll action actually failed with the error below but the index has started rolling since. The index 56 is in green state and the latest one is 62.

{
    "shard_failures": [
        "CircuitBreakingException[[parent] Data too large, data for [indices:monitor/stats[n]] would be [512131372/488.4mb], which is larger than the limit of [510027366/486.3mb], real usage: [512130968/488.4mb], new bytes reserved: [404/404b], usages [request=0/0b, fielddata=21434324/20.4mb, in_flight_requests=33284/32.5kb]]"
    ],
    "message": "Failed to evaluate conditions for rollover [index=otel-v1-apm-span-000056]"
}

Is there a way to fix this? Maybe to break the index apart?

image

Thank you for your help.

@chenqi0805
Copy link
Collaborator

@dmossakowski It seems there are two write indices at the moment. This cannot be resolved automatically with data prepper. Could you check under https://github.com/opensearch-project/index-management to see if there is way to recover from this index state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants