Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid long heavy tasks in the GraphQL service #2340

Merged
merged 37 commits into from
Oct 14, 2024

Conversation

xgreenx
Copy link
Collaborator

@xgreenx xgreenx commented Oct 13, 2024

This PR adds chunking to the stream in GraphQL and requests data in batches.

The current implementation is simple and executes batches in the same runtime. But at the end of batch fetching, it yields, allowing other tasks to be processed.

Checklist

  • New behavior is reflected in tests

Before requesting review

  • I have reviewed the code myself

@xgreenx xgreenx requested a review from a team October 13, 2024 16:27
@xgreenx xgreenx self-assigned this Oct 13, 2024
crates/fuel-core/src/graphql_api/database.rs Outdated Show resolved Hide resolved
@@ -134,7 +141,7 @@ impl ReadView {
pub fn transaction(&self, tx_id: &TxId) -> StorageResult<Transaction> {
let result = self.on_chain.transaction(tx_id);
if result.is_not_found() {
if let Some(tx) = self.old_transaction(tx_id)? {
if let Some(tx) = self.off_chain.old_transaction(tx_id)? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the old helper... although the meaning of "old" here isn't clear to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-10-14 at 2 31 18 PM

crates/fuel-core/src/graphql_api/database.rs Outdated Show resolved Hide resolved
@@ -12,6 +12,10 @@ pub struct GraphQLArgs {
#[clap(long = "port", default_value = "4000", env)]
pub port: u16,

/// The size of the batch fetched from the database by GraphQL service.
#[clap(long = "graphql-database-batch-size", default_value = "100", env)]
pub database_batch_size: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have to be configurable? can we just set it to a global now to reduce the config surface? we have too many options rn :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to make it configurable for now to be able to adjust our performance on the fly. Later we can remove it when we have better ideas about the best value

Base automatically changed from feature/async-pagination-queries to feature/prepare-graphql-for-async October 14, 2024 06:37
@xgreenx xgreenx force-pushed the feature/prepare-graphql-for-async branch from 7cb4231 to ab5e940 Compare October 14, 2024 06:48
@xgreenx xgreenx changed the base branch from feature/prepare-graphql-for-async to feature/async-pagination-queries October 14, 2024 06:51
@xgreenx xgreenx requested review from rymnc and MitchTurner October 14, 2024 07:05
xgreenx and others added 4 commits October 14, 2024 10:09
Co-authored-by: Mårten Blankfors <[email protected]>
…re/avoid-long-heavy-tasks

# Conflicts:
#	CHANGELOG.md
#	crates/fuel-core/src/query/balance/asset_query.rs
#	crates/fuel-core/src/query/coin.rs
#	crates/fuel-core/src/query/message.rs
#	crates/fuel-core/src/query/tx.rs
netrome
netrome previously approved these changes Oct 14, 2024
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of nits, but otherwise looks good to me.

crates/fuel-core/src/database/block.rs Show resolved Hide resolved
crates/fuel-core/src/graphql_api/database.rs Show resolved Hide resolved
crates/fuel-core/src/graphql_api/database.rs Show resolved Hide resolved
crates/fuel-core/src/graphql_api/database.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/graphql_api/database.rs Outdated Show resolved Hide resolved
})
});

futures::stream::StreamExt::chunks(stream, database.batch_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just refer to self.database here

Suggested change
futures::stream::StreamExt::chunks(stream, database.batch_size)
futures::stream::StreamExt::chunks(stream, self.database.batch_size)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or add a getter for batch_size :)

})
.try_filter_map(move |chunk| async move {
let chunk = database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

Suggested change
let chunk = database
let chunk = self.database

crates/fuel-core/src/query/coin.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/query/block.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/query/message.rs Outdated Show resolved Hide resolved
Copy link
Member

@MitchTurner MitchTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one outstanding comment, but that's mostly a nit. Otherwise LGTM. Will wait until there is another review to approve.

Base automatically changed from feature/async-pagination-queries to master October 14, 2024 19:53
@xgreenx xgreenx dismissed netrome’s stale review October 14, 2024 19:53

The base branch was changed.

# Conflicts:
#	CHANGELOG.md
#	crates/fuel-core/src/coins_query.rs
#	crates/fuel-core/src/graphql_api/database.rs
#	crates/fuel-core/src/query/balance.rs
#	crates/fuel-core/src/query/balance/asset_query.rs
#	crates/fuel-core/src/query/block.rs
#	crates/fuel-core/src/query/coin.rs
#	crates/fuel-core/src/query/message.rs
#	crates/fuel-core/src/query/tx.rs
#	crates/fuel-core/src/schema/block.rs
#	crates/fuel-core/src/schema/tx.rs
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff! Just one question about fusing the underlying stream in YieldStream.

crates/services/src/yield_stream.rs Show resolved Hide resolved
.map(|tx_id| self.transaction(tx_id))
.collect::<Vec<_>>();
// Give a chance to other tasks to run.
tokio::task::yield_now().await;
Copy link
Member

@Voxelot Voxelot Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of this yield? This task has already finished using the database by now.

Should we be applying the same chunk batching to the iterator above?

Copy link
Collaborator Author

@xgreenx xgreenx Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that in the future, we will have an async caching mechanism where we will wait for notification when the cache fetches a new desired value(which potentially can be used by several queries in the last block with transactions).

This yield_now imitates this behavior, also allowing Tokio to work with other tasks.

})
});

futures::stream::StreamExt::chunks(stream, database.batch_size)
Copy link
Member

@Voxelot Voxelot Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this all be cleaned up with the new yield_each extension to avoid the need to map chunks and flatten results?

Copy link
Collaborator Author

@xgreenx xgreenx Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, here we actually send a batch request to the database like: "Please fetch the all coins for these list of UtxoIds".

Later we can optimize it with multi get + caching(next follow up PR will add caching)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oic, since it might be multiget later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@xgreenx xgreenx merged commit 87c9579 into master Oct 14, 2024
38 checks passed
@xgreenx xgreenx deleted the feature/avoid-long-heavy-tasks branch October 14, 2024 21:58
@xgreenx xgreenx mentioned this pull request Oct 14, 2024
xgreenx added a commit that referenced this pull request Oct 14, 2024
## Version v0.40.0

### Added
- [2347](#2347): Add GraphQL
complexity histogram to metrics.
- [2350](#2350): Added a new
CLI flag `graphql-number-of-threads` to limit the number of threads used
by the GraphQL service. The default value is `2`, `0` enables the old
behavior.
- [2335](#2335): Added CLI
arguments for configuring GraphQL query costs.

### Fixed
- [2345](#2345): In PoA
increase priority of block creation timer trigger compare to txpool
event management

### Changed
- [2334](#2334): Prepare the
GraphQL service for the switching to `async` methods.
- [2310](#2310): New metrics:
"The gas prices used in a block" (`importer_gas_price_for_block`), "The
total gas used in a block" (`importer_gas_per_block`), "The total fee
(gwei) paid by transactions in a block" (`importer_fee_per_block_gwei`),
"The total number of transactions in a block"
(`importer_transactions_per_block`), P2P metrics for swarm and protocol.
- [2340](#2340): Avoid long
heavy tasks in the GraphQL service by splitting work into batches.
- [2341](#2341): Updated all
pagination queries to work with the async stream instead of the sync
iterator.
- [2350](#2350): Limited the
number of threads used by the GraphQL service.

#### Breaking
- [2310](#2310): The `metrics`
command-line parameter has been replaced with `disable-metrics`. Metrics
are now enabled by default, with the option to disable them entirely or
on a per-module basis.
- [2341](#2341): The maximum
number of processed coins from the `coins_to_spend` query is limited to
`max_inputs`.

## What's Changed
* fix(gas_price_service): service name and unused trait impl by @rymnc
in #2317
* Do not require build of docker images to pass CI by @xgreenx in
#2342
* Prepare the GraphQL service for the switching to `async` methods by
@xgreenx in #2334
* Limited the number of threads used by the GraphQL service by @xgreenx
in #2350
* Increase priority of timer over txpool event by @xgreenx in
#2345
* Disable flaky `test_poa_multiple_producers` test by @rafal-ch in
#2353
* feat: CLI arguments for configuring GraphQL query costs. by @netrome
in #2335
* Add graphql query complexity histogram metric by @AurelienFT in
#2349
* Updated all pagination queries to work with the `Stream` instead of
`Iterator` by @xgreenx in
#2341
* Avoid long heavy tasks in the GraphQL service by @xgreenx in
#2340
* Add more metrics by @rafal-ch in
#2310


**Full Changelog**:
v0.39.0...v0.40.0

---------

Co-authored-by: Rafał Chabowski <[email protected]>
Co-authored-by: acerone85 <[email protected]>
Co-authored-by: rymnc <[email protected]>
Co-authored-by: Rafał Chabowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants