Migrate Rotation Manager to Use the New Job Queue System #4244

mastercactapus · 2025-01-15T20:14:43Z

What problem would you like to solve? Please describe:
The rotation manager currently processes all rotations in a single transaction every 5 seconds, which can block or delay updates for every rotation if just one rotation encounters an issue. There is a batch size limit, but it still fetches all rotations each time. Additionally, tuning the interval isn’t practical since all legacy engine modules share the same loop. As GoAlert grows, this “all-or-nothing” approach becomes more inefficient and harder to manage.

Describe the solution you’d like:

Job Queue Integration: Switch rotation updates to the fine-grained River Queue system.
- On-Demand Updates: When a rotation changes (e.g., adding participants), enqueue a job specifically for that rotation.
- Future Scheduling: Each job schedules the rotation’s next update based on its interval.
- Missed Updates Fallback: Include a mechanism to detect and recover from missed updates or failures (e.g., legacy versions, DB restores, crashes), ensuring rotations never become stuck.
Isolated Transactions: Each rotation update runs in its own transaction, so one failure does not block all updates.
Scalable & Resilient: Remove the need to process every rotation every 5 seconds, reducing both transaction size and performance bottlenecks.

Describe alternatives you’ve considered:

Continue Interval-Based Updates: Tuning intervals or splitting batches still relies on one shared loop and scanning all rotations each time.
Hybrid Batching: Partially batching updates reduces transaction size but continues fetching all rotations, leaving potential for blocking on failures.
Event-Driven + Periodic Sweeps: The job queue already handles event-driven updates; additional sweeps would add complexity without the benefits of the proven queue approach.

Additional context:

Rotation Manager: Governs on-call handoff by updating rotation_state to the next user at set intervals.
Previously Migrated Modules: Cleanup manager, signal manager, and status update manager are already using the job queue successfully.
Concurrency & Multiple Engine Instances:
- The old interval-based approach runs on every active engine instance, causing duplicate work and potential contention (e.g., two instances each fetch all rotations every 5 seconds).
- The new job queue ensures that only one instance handles a job at a time, vastly improving resilience and performance. Multiple engine instances will be more practical once all modules are on the queue system.
Value: By isolating each rotation update, we minimize the risk of widespread failures and enable large-scale rotation schedules to run more smoothly.

The text was updated successfully, but these errors were encountered:

mastercactapus added enhancement New feature or request River labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Rotation Manager to Use the New Job Queue System #4244

Migrate Rotation Manager to Use the New Job Queue System #4244

mastercactapus commented Jan 15, 2025

Migrate Rotation Manager to Use the New Job Queue System #4244

Migrate Rotation Manager to Use the New Job Queue System #4244

Comments

mastercactapus commented Jan 15, 2025