Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry failed to caught up when QPS is higher than 1000 #3471

Open
1 task done
ZLBillShaw opened this issue Dec 13, 2024 · 1 comment
Open
1 task done

Sentry failed to caught up when QPS is higher than 1000 #3471

ZLBillShaw opened this issue Dec 13, 2024 · 1 comment

Comments

@ZLBillShaw
Copy link

Self-Hosted Version

24.9.0

CPU Architecture

x86_64

Docker Version

kubernetes

Docker Compose Version

kubernetes

Machine Specification

  • My system meets the minimum system requirements of Sentry

Steps to Reproduce

When the number of errors captured by my SDK is high, data delays occur. I suspect this is due to Kafka consumption not keeping up.
I’ve increased the maxBatchSize for postProcessForwardErrors, workerEvents, ingestConsumerEvents, outcomesConsumer, and replacer to 10,000. However, I’m still facing delays. Should I adjust the number of partitions for the topic? Below are my configurations:

topics:
      - name: events
        # Number of partitions for this topic
        partitions: 30
        config:
          "message.timestamp.type": LogAppendTime
      - name: event-replacements
        partitions: 10
      - name: snuba-commit-log
        partitions: 10
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: cdc
      - name: transactions
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-transactions-commit-log
        partitions: 10
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-metrics
        config:
          "message.timestamp.type": LogAppendTime
      - name: outcomes
        partitions: 20
      - name: outcomes-billing
        partitions: 20
      - name: ingest-sessions
      - name: snuba-sessions-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-metrics-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: scheduled-subscriptions-events
        partitions: 20
      - name: scheduled-subscriptions-transactions
        partitions: 10
      - name: scheduled-subscriptions-sessions
      - name: scheduled-subscriptions-metrics
      - name: scheduled-subscriptions-generic-metrics-sets
      - name: scheduled-subscriptions-generic-metrics-distributions
      - name: scheduled-subscriptions-generic-metrics-counters
      - name: events-subscription-results
        partitions: 20
      - name: transactions-subscription-results
        partitions: 10
      - name: sessions-subscription-results
      - name: metrics-subscription-results
      - name: generic-metrics-subscription-results
      - name: snuba-queries
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: processed-profiles
        config:
          "message.timestamp.type": LogAppendTime
      - name: profiles-call-tree
      - name: ingest-replay-events
        config:
          "message.timestamp.type": LogAppendTime
          "max.message.bytes": "15000000"
      - name: snuba-generic-metrics
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-generic-metrics-sets-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-generic-metrics-distributions-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-generic-metrics-counters-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: generic-events
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-generic-events-commit-log
        partitions: 20
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: group-attributes
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-attribution
        partitions: 20
      - name: snuba-dead-letter-metrics
      - name: snuba-dead-letter-sessions
      - name: snuba-dead-letter-generic-metrics
      - name: snuba-dead-letter-replays
      - name: snuba-dead-letter-generic-events
        partitions: 10
      - name: snuba-dead-letter-querylog
        partitions: 10
      - name: snuba-dead-letter-group-attributes
        partitions: 10
      - name: ingest-attachments
        partitions: 20
      - name: ingest-transactions
        partitions: 20
      - name: ingest-events
        ## If the number of exceptions increases, it is recommended to increase the number of partitions for ingest-events
        partitions: 30
      - name: ingest-replay-recordings
      - name: ingest-metrics
      - name: ingest-performance-metrics
      - name: ingest-monitors
      - name: profiles
      - name: ingest-occurrences
        partitions: 25
      - name: snuba-spans
      - name: shared-resources-usage
      - name: snuba-metrics-summaries 

Expected Result

The most recently processed event matches the one I captured in real-time.

Actual Result

The delay of the latest errors increases as the QPS grows higher.

Event ID

No response

@aldy505
Copy link
Collaborator

aldy505 commented Jan 11, 2025

When the number of errors captured by my SDK is high, data delays occur. I suspect this is due to Kafka consumption not keeping up.

Yes, most of the time this is true.

I’ve increased the maxBatchSize for postProcessForwardErrors, workerEvents, ingestConsumerEvents, outcomesConsumer, and replacer to 10,000. However, I’m still facing delays.

I wouldn't put such a high amount of maxBatchSize. Setting it to 500 would be a good maximum, since every worker need to process that sequentially.

Should I adjust the number of partitions for the topic?

Yes, but setting your event topic to 30 partitions is a bit too much, I guess. I also wonder on why you don't have the post-process-forwarder-errors topic. These should be the other topic required to process errors other than the ingest-events and errors topic:

post-process-forwarder-errors:
<<: *sentry_defaults
command: run consumer --no-strict-offset-reset post-process-forwarder-errors --consumer-group post-process-forwarder --synchronize-commit-log-topic=snuba-commit-log --synchronize-commit-group=snuba-consumers
subscription-consumer-events:
<<: *sentry_defaults
command: run consumer events-subscription-results --consumer-group query-subscription-consumer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Community
Status: No status
Development

No branches or pull requests

2 participants