Sentry failed to caught up when QPS is higher than 1000 #3471

ZLBillShaw · 2024-12-13T03:57:31Z

Self-Hosted Version

24.9.0

CPU Architecture

x86_64

Docker Version

kubernetes

Docker Compose Version

kubernetes

Machine Specification

My system meets the minimum system requirements of Sentry

Steps to Reproduce

When the number of errors captured by my SDK is high, data delays occur. I suspect this is due to Kafka consumption not keeping up.
I’ve increased the maxBatchSize for postProcessForwardErrors, workerEvents, ingestConsumerEvents, outcomesConsumer, and replacer to 10,000. However, I’m still facing delays. Should I adjust the number of partitions for the topic? Below are my configurations:

topics:
      - name: events
        # Number of partitions for this topic
        partitions: 30
        config:
          "message.timestamp.type": LogAppendTime
      - name: event-replacements
        partitions: 10
      - name: snuba-commit-log
        partitions: 10
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: cdc
      - name: transactions
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-transactions-commit-log
        partitions: 10
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-metrics
        config:
          "message.timestamp.type": LogAppendTime
      - name: outcomes
        partitions: 20
      - name: outcomes-billing
        partitions: 20
      - name: ingest-sessions
      - name: snuba-sessions-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-metrics-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: scheduled-subscriptions-events
        partitions: 20
      - name: scheduled-subscriptions-transactions
        partitions: 10
      - name: scheduled-subscriptions-sessions
      - name: scheduled-subscriptions-metrics
      - name: scheduled-subscriptions-generic-metrics-sets
      - name: scheduled-subscriptions-generic-metrics-distributions
      - name: scheduled-subscriptions-generic-metrics-counters
      - name: events-subscription-results
        partitions: 20
      - name: transactions-subscription-results
        partitions: 10
      - name: sessions-subscription-results
      - name: metrics-subscription-results
      - name: generic-metrics-subscription-results
      - name: snuba-queries
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: processed-profiles
        config:
          "message.timestamp.type": LogAppendTime
      - name: profiles-call-tree
      - name: ingest-replay-events
        config:
          "message.timestamp.type": LogAppendTime
          "max.message.bytes": "15000000"
      - name: snuba-generic-metrics
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-generic-metrics-sets-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-generic-metrics-distributions-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: snuba-generic-metrics-counters-commit-log
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: generic-events
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-generic-events-commit-log
        partitions: 20
        config:
          "cleanup.policy": "compact,delete"
          "min.compaction.lag.ms": "3600000"
      - name: group-attributes
        partitions: 20
        config:
          "message.timestamp.type": LogAppendTime
      - name: snuba-attribution
        partitions: 20
      - name: snuba-dead-letter-metrics
      - name: snuba-dead-letter-sessions
      - name: snuba-dead-letter-generic-metrics
      - name: snuba-dead-letter-replays
      - name: snuba-dead-letter-generic-events
        partitions: 10
      - name: snuba-dead-letter-querylog
        partitions: 10
      - name: snuba-dead-letter-group-attributes
        partitions: 10
      - name: ingest-attachments
        partitions: 20
      - name: ingest-transactions
        partitions: 20
      - name: ingest-events
        ## If the number of exceptions increases, it is recommended to increase the number of partitions for ingest-events
        partitions: 30
      - name: ingest-replay-recordings
      - name: ingest-metrics
      - name: ingest-performance-metrics
      - name: ingest-monitors
      - name: profiles
      - name: ingest-occurrences
        partitions: 25
      - name: snuba-spans
      - name: shared-resources-usage
      - name: snuba-metrics-summaries

Expected Result

The most recently processed event matches the one I captured in real-time.

Actual Result

The delay of the latest errors increases as the QPS grows higher.

Event ID

No response

The text was updated successfully, but these errors were encountered:

aldy505 · 2025-01-11T23:31:02Z

When the number of errors captured by my SDK is high, data delays occur. I suspect this is due to Kafka consumption not keeping up.

Yes, most of the time this is true.

I’ve increased the maxBatchSize for postProcessForwardErrors, workerEvents, ingestConsumerEvents, outcomesConsumer, and replacer to 10,000. However, I’m still facing delays.

I wouldn't put such a high amount of maxBatchSize. Setting it to 500 would be a good maximum, since every worker need to process that sequentially.

Should I adjust the number of partitions for the topic?

Yes, but setting your event topic to 30 partitions is a bit too much, I guess. I also wonder on why you don't have the post-process-forwarder-errors topic. These should be the other topic required to process errors other than the ingest-events and errors topic:

self-hosted/docker-compose.yml

Lines 349 to 354 in f97a5e2

    
           post-process-forwarder-errors: 
        
             <<: *sentry_defaults 
        
             command: run consumer --no-strict-offset-reset post-process-forwarder-errors --consumer-group post-process-forwarder --synchronize-commit-log-topic=snuba-commit-log --synchronize-commit-group=snuba-consumers 
        
           subscription-consumer-events: 
        
             <<: *sentry_defaults 
        
             command: run consumer events-subscription-results --consumer-group query-subscription-consumer

github-project-automation bot added this to Self-hosted Sentry Dec 13, 2024

getsantry bot added the Waiting for: Product Owner label Dec 13, 2024

getsantry bot added this to GitHub Issues with 👀 3 Dec 13, 2024

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 13, 2024

getsantry bot removed the Waiting for: Product Owner label Jan 11, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Jan 11, 2025

aldy505 added the Waiting for: Community label Jan 11, 2025

getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentry failed to caught up when QPS is higher than 1000 #3471

Sentry failed to caught up when QPS is higher than 1000 #3471

ZLBillShaw commented Dec 13, 2024

aldy505 commented Jan 11, 2025 •

edited

Loading

Sentry failed to caught up when QPS is higher than 1000 #3471

Sentry failed to caught up when QPS is higher than 1000 #3471

Comments

ZLBillShaw commented Dec 13, 2024

Self-Hosted Version

CPU Architecture

Docker Version

Docker Compose Version

Machine Specification

Steps to Reproduce

Expected Result

Actual Result

Event ID

aldy505 commented Jan 11, 2025 • edited Loading

aldy505 commented Jan 11, 2025 •

edited

Loading