Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Rotation Manager to Use the New Job Queue System #4244

Open
mastercactapus opened this issue Jan 15, 2025 · 0 comments
Open

Migrate Rotation Manager to Use the New Job Queue System #4244

mastercactapus opened this issue Jan 15, 2025 · 0 comments
Labels
enhancement New feature or request River

Comments

@mastercactapus
Copy link
Member

What problem would you like to solve? Please describe:
The rotation manager currently processes all rotations in a single transaction every 5 seconds, which can block or delay updates for every rotation if just one rotation encounters an issue. There is a batch size limit, but it still fetches all rotations each time. Additionally, tuning the interval isn’t practical since all legacy engine modules share the same loop. As GoAlert grows, this “all-or-nothing” approach becomes more inefficient and harder to manage.

Describe the solution you’d like:

  • Job Queue Integration: Switch rotation updates to the fine-grained River Queue system.
    • On-Demand Updates: When a rotation changes (e.g., adding participants), enqueue a job specifically for that rotation.
    • Future Scheduling: Each job schedules the rotation’s next update based on its interval.
    • Missed Updates Fallback: Include a mechanism to detect and recover from missed updates or failures (e.g., legacy versions, DB restores, crashes), ensuring rotations never become stuck.
  • Isolated Transactions: Each rotation update runs in its own transaction, so one failure does not block all updates.
  • Scalable & Resilient: Remove the need to process every rotation every 5 seconds, reducing both transaction size and performance bottlenecks.

Describe alternatives you’ve considered:

  1. Continue Interval-Based Updates: Tuning intervals or splitting batches still relies on one shared loop and scanning all rotations each time.
  2. Hybrid Batching: Partially batching updates reduces transaction size but continues fetching all rotations, leaving potential for blocking on failures.
  3. Event-Driven + Periodic Sweeps: The job queue already handles event-driven updates; additional sweeps would add complexity without the benefits of the proven queue approach.

Additional context:

  • Rotation Manager: Governs on-call handoff by updating rotation_state to the next user at set intervals.
  • Previously Migrated Modules: Cleanup manager, signal manager, and status update manager are already using the job queue successfully.
  • Concurrency & Multiple Engine Instances:
    • The old interval-based approach runs on every active engine instance, causing duplicate work and potential contention (e.g., two instances each fetch all rotations every 5 seconds).
    • The new job queue ensures that only one instance handles a job at a time, vastly improving resilience and performance. Multiple engine instances will be more practical once all modules are on the queue system.
  • Value: By isolating each rotation update, we minimize the risk of widespread failures and enable large-scale rotation schedules to run more smoothly.
@mastercactapus mastercactapus added enhancement New feature or request River labels Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request River
Projects
None yet
Development

No branches or pull requests

1 participant