Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration][GCP] Added Rate Limiting for ProjectsV3GetRequestsPerMinutePerProject Quota #1304

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

oiadebayo
Copy link
Member

Description

What

  • Implemented rate limiting for the ProjectsV3GetRequestsPerMinutePerProject quota.
  • Adjusted real-time event handling to respect the newly implemented rate limits.

Why

  • To address issues reported where real-time event handling exceeded GCP quota limits, resulting in 429 Quota Exceeded errors.
  • Improve system reliability and data consistency by adhering to quota limits.

How

  • Introduced a rate limiting mechanism for the ProjectsV3GetRequestsPerMinutePerProject quota using the existing rate limiter
  • Added support in the resolve_request_controllers method to fetch and apply this specific quota.
  • Integrated the rate limiter into real-time event processing methods, particularly for project-related operations.

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • New Integration (non-breaking change which adds a new integration)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Non-breaking change (fix of existing functionality that will not change current behavior)
  • Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.

@oiadebayo oiadebayo requested a review from a team as a code owner January 8, 2025 10:23
@github-actions github-actions bot added the size/M label Jan 8, 2025
@oiadebayo oiadebayo marked this pull request as draft January 8, 2025 10:28
@oiadebayo oiadebayo marked this pull request as ready for review January 9, 2025 10:13
oiadebayo and others added 4 commits January 9, 2025 12:08
Copy link
Member

@mk-armah mk-armah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work... 👏🏽 left some comments

integrations/gcp/gcp_core/search/resource_searches.py Outdated Show resolved Hide resolved
@@ -181,10 +187,36 @@ def get_service_account_project_id() -> str:


async def get_quotas_for_project(
project_id: str, kind: str
project_id: str, kind: str, **kwargs: Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
project_id: str, kind: str, **kwargs: Any
project_id: str, kind: str, quota_id: Optional[str] = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets be more precise here

) -> Tuple["AsyncLimiter", "BoundedSemaphore"]:
try:
match kind:
case AssetTypesWithSpecialHandling.PROJECT:
method = kwargs.get("method")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
method = kwargs.get("method")

) -> Tuple["AsyncLimiter", "BoundedSemaphore"]:
try:
match kind:
case AssetTypesWithSpecialHandling.PROJECT:
method = kwargs.get("method")
if method == "search":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if method == "search":
if quota_id == "apiSearchAllResourcesQpmPerProject":

@@ -106,7 +109,10 @@ async def resync_organizations(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:

@ocean.on_resync(kind=AssetTypesWithSpecialHandling.PROJECT)
async def resync_projects(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
async for batch in search_all_projects():
resync_projects_rate_limiter, _ = await resolve_request_controllers(
kind, method="search"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust

@@ -62,7 +62,10 @@ async def _resolve_resync_method_for_resource(
case AssetTypesWithSpecialHandling.ORGANIZATION:
return search_all_organizations()
case AssetTypesWithSpecialHandling.PROJECT:
return search_all_projects()
project_rate_limiter, _ = await resolve_request_controllers(
kind, method="search"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust

oiadebayo and others added 4 commits January 14, 2025 03:34
…cover-projects-v-3-get-requests-per-minute-per-project
…cover-projects-v-3-get-requests-per-minute-per-project
…cover-projects-v-3-get-requests-per-minute-per-project
integrations/gcp/gcp_core/utils.py Outdated Show resolved Hide resolved
integrations/gcp/gcp_core/search/resource_searches.py Outdated Show resolved Hide resolved
@mk-armah
Copy link
Member

Lets test on the different selector configurations we have and ensure real time works as expected

Comment on lines 266 to 271
async def set_get_project_limiter() -> None:
global GET_PROJECT_LIMITER
get_project_quota_id = "ProjectV3GetRequestsPerMinutePerProject"
GET_PROJECT_LIMITER, _ = await resolve_request_controllers(
AssetTypesWithSpecialHandling.PROJECT, quota_id=get_project_quota_id
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not set this if the integration is meant to run ONCE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@mk-armah mk-armah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments



@ocean.on_start()
async def set_get_project_limiter() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def set_get_project_limiter() -> None:
async def on_start() -> None:

Comment on lines 265 to 266
@ocean.on_start()
async def set_get_project_limiter() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to main.py

Comment on lines 180 to 186
asset_resource_data = await feed_event_to_resource(
asset_type,
asset_name,
asset_project,
asset_data,
config,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass the get project rate limiter from here

integrations/gcp/main.py Outdated Show resolved Hide resolved
Copy link
Member

@mk-armah mk-armah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, few changes requested

integrations/gcp/main.py Outdated Show resolved Hide resolved
Copy link
Member

@mk-armah mk-armah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uploading image.png…

Copy link
Member

@matan84 matan84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Don't create multiple places where we handle the limiter, always get it from the same place
  2. Keep the changelog as accurate as possible- You've changed a core part of the integration, make sure to keep the changelog updated correctly.
  3. Explicit is always better than implicit
  4. The dependency injection in the folder with the limiter looks abit redundant to me, I think that get_single_project should be the only one to compute it's rate_limiter.

integrations/gcp/gcp_core/search/resource_searches.py Outdated Show resolved Hide resolved
integrations/gcp/gcp_core/search/resource_searches.py Outdated Show resolved Hide resolved
integrations/gcp/gcp_core/utils.py Outdated Show resolved Hide resolved
asset_name,
asset_project,
asset_data,
PROJECT_V3_GET_REQUESTS_RATE_LIMITER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain in a comment why are you giving a static value and not dynamic value based on asset_type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation instantiated this quota handling class once on_start and created the AsyncLimiter and then subsequently uses this limiter for every real time event. So they can be shared by this events and the limit can hold

integrations/gcp/main.py Outdated Show resolved Hide resolved
@@ -38,6 +39,8 @@
resolve_request_controllers,
)

PROJECT_V3_GET_REQUESTS_RATE_LIMITER: AsyncLimiter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Copy link
Member Author

@oiadebayo oiadebayo Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we initialized the limiter to ensured that even when event context cycle has been completed the limiter persist and it is aware of the used quota and the available quota

global PROJECT_V3_GET_REQUESTS_RATE_LIMITER
if not ocean.event_listener_type == "ONCE":
get_project_quota_id = "ProjectV3GetRequestsPerMinutePerProject"
PROJECT_V3_GET_REQUESTS_RATE_LIMITER, _ = await resolve_request_controllers(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code instantiates the realtime event rate limiter once on_start so that it can be subsequently used for every realtime event.

asset_name,
asset_project,
asset_data,
PROJECT_V3_GET_REQUESTS_RATE_LIMITER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PROJECT_V3_GET_REQUESTS_RATE_LIMITER,
project_get_requests_per_minute_per_project.limiter(asset_project),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dynamic implementation, which means every call to the endpoint creates a new instance of the limiter. This implementation doesn't seem to work well with realtime event

@@ -214,15 +216,21 @@ async def search_all_organizations() -> ASYNC_GENERATOR_RESYNC_TYPE:


async def get_single_project(
project_name: str, config: Optional[ProtoConfig] = None
project_name: str,
rate_limiter: AsyncLimiter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inject the rate_limiter? you should just use the project_get_requests_per_minute_per_project always, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

integrations/gcp/gcp_core/search/resource_searches.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added size/L and removed size/M labels Jan 24, 2025
oiadebayo and others added 7 commits January 24, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants