Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: split metadata/val in objects query #2512

Merged
merged 30 commits into from
Oct 1, 2024

Conversation

gtarpenning
Copy link
Member

@gtarpenning gtarpenning commented Sep 27, 2024

#2496 but better

Description

This pr:

  • removes the val_dump read from the initial select objects query
  • adds a second query for val dump conditioned on digest, and combines the two query results then returns

This should improve performance on objects with many versions (or lots of data)

  1. list all objects and calculate the version_count (unused afaict), version_index which must see all objects.
  2. make a second query for the val of the object, using the found digests if not provided.

Performance:

Existing queries that crash due to massive memory requirements (50GB+) now can be resolved in around 1 second, with a memory requirement less than 100MB total.

Testing

  • existing tests
  • manual clickhouse testing, query here

@circle-job-mirror
Copy link

circle-job-mirror bot commented Sep 27, 2024

@gtarpenning gtarpenning marked this pull request as ready for review September 27, 2024 22:18
@gtarpenning gtarpenning requested a review from a team as a code owner September 27, 2024 22:18
@gtarpenning gtarpenning changed the title Griffin/object query two parts general perf: split metadata/val in objects query Sep 27, 2024
@@ -1578,7 +1600,7 @@ def _query_stream(
query: str,
parameters: Dict[str, Any],
column_formats: Optional[Dict[str, Any]] = None,
) -> Iterator[QueryResult]:
) -> Iterator[tuple]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not return a QueryResult! its a tuple...

],
row,
list(row) + ["{}"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be nice to comment what this +['{}'] does

@gtarpenning gtarpenning force-pushed the griffin/object-query-two-parts-general branch from adfdb67 to 5661b89 Compare October 1, 2024 21:08
@gtarpenning gtarpenning requested a review from tssweeney October 1, 2024 21:53
@gtarpenning gtarpenning merged commit e8fd886 into master Oct 1, 2024
77 checks passed
@gtarpenning gtarpenning deleted the griffin/object-query-two-parts-general branch October 1, 2024 22:00
@github-actions github-actions bot locked and limited conversation to collaborators Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants