Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edited cpu limits for workflows #4181

Merged
merged 1 commit into from
Jan 8, 2025
Merged

Edited cpu limits for workflows #4181

merged 1 commit into from
Jan 8, 2025

Conversation

Psalmz777
Copy link
Contributor

@Psalmz777 Psalmz777 commented Jan 8, 2025

Description

[Provide a brief description of the changes made in this PR]

Related Issues

Changes Made

  • Brief description of change 1
  • Brief description of change 2
  • Brief description of change 3

Testing

  • Tested locally
  • Tested against staging environment
  • Relevant tests passed: [List test names]

Affected Services

  • Which services were modified:
    • Service 1
    • Service 2
    • Other...

Endpoints Ready for Testing

  • New endpoints ready for testing:
    • Endpoint 1
    • Endpoint 2
    • Other...

API Documentation Updated?

  • Yes, API documentation was updated
  • No, API documentation does not need updating

Additional Notes

[Add any additional notes or comments here]

Summary by CodeRabbit

  • Chores
    • Updated Kubernetes configuration resource limits for multiple components:
      • Increased webserver CPU limit to 1 CPU
      • Set scheduler CPU limit to 1 CPU
      • Expanded celery CPU limit to 2.5 CPUs
      • Defined redis CPU limit at 1 CPU
    • Minor documentation formatting update in README file

Copy link
Contributor

coderabbitai bot commented Jan 8, 2025

📝 Walkthrough

Walkthrough

This pull request introduces modifications to the Kubernetes configuration file values-stage.yaml, specifically updating resource limits for key components including webserver, scheduler, celery, and redis. The changes involve setting explicit CPU limits for these services, with the most notable increase being for the celery component, which now has a 2.5 CPU limit. Additionally, a minor formatting change was made to the workflows README.md file, removing a period from the header.

Changes

File Change Summary
k8s/workflows/values-stage.yaml Updated CPU limits for multiple components:
- Webserver: 1000m
- Scheduler: 1000m
- Celery: 2500m
- Redis: 1000m
src/workflows/README.md Removed period from header formatting

Assessment against linked issues

Objective Addressed Explanation
Github PR Template [#123] No direct implementation of PR template visible
Exceedance Calculation [#456] No direct evidence of exceedance calculation in these changes

Possibly related PRs

Suggested labels

ready for review

Suggested reviewers

  • NicholasTurner23
  • Baalmart

Poem

🌟 Kubernetes configs dance and sway,
Resource limits finding their way,
Celery stretches, CPUs take flight,
A digital ballet of computing might! 🚀


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Psalmz777 Psalmz777 requested review from BenjaminSsempala, NicholasTurner23 and Baalmart and removed request for BenjaminSsempala January 8, 2025 08:37
@Psalmz777 Psalmz777 self-assigned this Jan 8, 2025
Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 11.97%. Comparing base (ac75b0c) to head (b545f0f).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           staging    #4181   +/-   ##
========================================
  Coverage    11.97%   11.97%           
========================================
  Files          121      121           
  Lines        15877    15877           
  Branches       329      329           
========================================
  Hits          1902     1902           
  Misses       13975    13975           

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
k8s/workflows/values-stage.yaml (1)

23-24: Review resource allocation strategy

The CPU limits have been increased across all components, but there are some concerns with the current configuration:

  1. The request-to-limit ratios are quite high:

    • Webserver and Scheduler: 1:8 (125m:1000m)
    • Celery and Redis: 1:20 (125m:2500m and 50m:1000m)
  2. Such high ratios could lead to:

    • Severe CPU throttling during peak usage
    • Unpredictable performance
    • Potential pod evictions
    • Resource contention issues

Consider the following recommendations:

  1. Adjust request-to-limit ratios to be closer to 1:2 or 1:3 for better resource utilization:
resources:
  webserver:
    requests:
      cpu: 500m
    limits:
      cpu: 1000m
  scheduler:
    requests:
      cpu: 500m
    limits:
      cpu: 1000m
  celery:
    requests:
      cpu: 1000m
    limits:
      cpu: 2500m
  redis:
    requests:
      cpu: 500m
    limits:
      cpu: 1000m
  1. Consider implementing Horizontal Pod Autoscaling (HPA) for individual components to handle varying loads more efficiently.

  2. Monitor actual resource usage in staging to fine-tune these values.

Also applies to: 30-31, 37-38, 44-45

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac75b0c and b545f0f.

📒 Files selected for processing (2)
  • k8s/workflows/values-stage.yaml (1 hunks)
  • src/workflows/README.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/workflows/README.md
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
k8s/workflows/values-stage.yaml (1)

37-38: Verify Celery's high CPU requirement

The Celery worker's CPU limit of 2.5 cores is significantly higher than other components. While this might be justified for computation-heavy tasks, it's important to verify this requirement.

Let's verify the Celery worker's CPU usage patterns:

✅ Verification successful

CPU allocation of 2.5 cores is appropriate for Celery workload

The high CPU allocation is justified due to:

  • Computationally intensive tasks (collocation analysis, data exports)
  • Use of scientific computing libraries (numpy, pandas, sklearn)
  • Regular task scheduling with potential concurrent operations
  • Complex data processing for analytics and monitoring
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for Celery task definitions to understand CPU requirements

# Look for Celery task definitions and their complexity
rg -g '*.py' -A 5 '@celery.task' 

# Look for CPU-intensive operations in task definitions
rg -g '*.py' -l 'numpy|pandas|scipy|sklearn|tensorflow|torch'

Length of output: 5289


Script:

#!/bin/bash
# Search for Celery configuration and task scheduling patterns
rg -g '*.py' -A 5 'CELERY_|celery.conf' 

# Look for concurrent task execution settings
rg -g '*.{py,yaml,json}' -l 'concurrency|worker_concurrency|pool_size'

Length of output: 1394

Comment on lines +44 to 45
cpu: 1000m
memory: 2000Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize Redis resource allocation

The Redis instance has a high memory limit (2000Mi) and CPU limit (1000m) for what is typically a lightweight in-memory store.

Consider reducing the limits unless there's a specific requirement:

redis:
  requests:
    cpu: 250m
    memory: 500Mi
  limits:
    cpu: 500m
    memory: 1000Mi

This should be sufficient for most Redis workloads while maintaining good performance.

@Baalmart Baalmart merged commit c223ba1 into staging Jan 8, 2025
52 checks passed
@Baalmart Baalmart deleted the worflow-cpu-limits branch January 8, 2025 08:47
@Baalmart Baalmart mentioned this pull request Jan 8, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants