Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for pruning and some internal renaming #4505

Merged
merged 7 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

- the behavior for `graphman prune` has changed: running just `graphman
prune` will mark the subgraph for ongoing pruning in addition to
performing an initial pruning. To avoid ongoing pruning, use `graphman
prune --once` ([docs](./docs/implementation/pruning.md))
- the materialized views in the `info` schema (`table_sizes`, `subgraph_sizes`, and `chain_sizes`) that provide information about the size of various database objects are now automatically refreshed every 6 hours. [#4461](https://github.com/graphprotocol/graph-node/pull/4461)

### Fixes
Expand Down
16 changes: 8 additions & 8 deletions docs/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,14 +227,14 @@ those.
1.1 means that the subgraph will be pruned every time it contains 10%
more history (in blocks) than its history limit. The default value is 1.2
and the value must be at least 1.01
- `GRAPH_STORE_HISTORY_COPY_THRESHOLD`,
`GRAPH_STORE_HISTORY_DELETE_THRESHOLD`: when pruning, prune by copying the
entities we will keep to new tables if we estimate that we will remove
more than a factor of `COPY_THRESHOLD` of the deployment's history. If we
estimate to remove a factor between `COPY_THRESHOLD` and
`DELETE_THRESHOLD`, prune by deleting from the existing tables of the
- `GRAPH_STORE_HISTORY_REBUILD_THRESHOLD`,
`GRAPH_STORE_HISTORY_DELETE_THRESHOLD`: when pruning, prune by copying
the entities we will keep to new tables if we estimate that we will
remove more than a factor of `REBUILD_THRESHOLD` of the deployment's
history. If we estimate to remove a factor between `REBUILD_THRESHOLD`
and `DELETE_THRESHOLD`, prune by deleting from the existing tables of the
deployment. If we estimate to remove less than `DELETE_THRESHOLD`
entities, do not change the table. Both settings are floats, and default
to 0.5 for the `COPY_THRESHOLD` and 0.05 for the `DELETE_THRESHOLD`; they
must be between 0 and 1, and `COPY_THRESHOLD` must be bigger than
to 0.5 for the `REBUILD_THRESHOLD` and 0.05 for the `DELETE_THRESHOLD`;
they must be between 0 and 1, and `REBUILD_THRESHOLD` must be bigger than
`DELETE_THRESHOLD`.
1 change: 1 addition & 0 deletions docs/implementation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ the code should go into comments.
* [Time-travel Queries](./time-travel.md)
* [SQL Query Generation](./sql-query-generation.md)
* [Adding support for a new chain](./add-chain.md)
* [Pruning](./pruning.md)
99 changes: 99 additions & 0 deletions docs/implementation/pruning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
## Pruning deployments

Subgraphs, by default, store a full version history for entities, allowing
consumers to query the subgraph as of any historical block. Pruning is an
operation that deletes entity versions from a deployment older than a
certain block, so it is no longer possible to query the deployment as of
prior blocks. In GraphQL, those are only queries with a constraint `block {
number: <n> } }` or a similar constraint by block hash where `n` is before
the block to which the deployment is pruned. Queries that are run at a
block height greater than that are not affected by pruning, and there is no
difference between running these queries against an unpruned and a pruned
deployment.

Because pruning reduces the amount of data in a deployment, it reduces the
amount of storage needed for that deployment, and is beneficial for both
query performance and indexing speed. Especially compared to the default of
keeping all history for a deployment, it can often reduce the amount of
data for a deployment by a very large amount and speed up queries
considerably. See [caveats](#caveats) below for the downsides.

The block `b` to which a deployment is pruned is controlled by how many
blocks `history_blocks` of history to retain; `b` is calculated internally
using `history_blocks` and the latest block of the deployment when the
prune operation is performed. When pruning finishes, it updates the
`earliest_block` for the deployment. The `earliest_block` can be retrieved
through the `index-node` status API, and `graph-node` will return an error
for any query that tries to time-travel to a point before
`earliest_block`. The value of `history_blocks` must be greater than
`ETHEREUM_REORG_THRESHOLD` to make sure that reverts can never conflict
with pruning.

Pruning is started by running `graphman prune`. That command will perform
an initial prune of the deployment and set the subgraph's `history_blocks`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that initial prune now async (i.e. it doesn't block indexing?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added a paragraph fro that. It blocks indexing with the rebuild strategy while it copies nonfinal entities. I also added another paragraph explaining what log output to look for.

setting which is used to periodically check whether the deployment has
accumulated more history than that. Whenever the deployment does contain
more history than that, the deployment is automatically repruned. If
ongoing pruning is not desired, pass the `--once` flag to `graphman
prune`. Ongoing pruning can be turned off by setting `history_blocks` to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding, the turning off pointer here is saying that if you pruned once with (say) 10,000 blocks (setting history_block to 10,000), if you want to turn off pruning you might call graphman prune --history 1000000000 so 1B blocks, which is effectively no pruning)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's exactly what I meant here

very large value with the `--history` flag.

Repruning is performed whenever the deployment has more than
`history_blocks * GRAPH_STORE_HISTORY_SLACK_FACTOR` blocks of history. The
environment variable `GRAPH_STORE_HISTORY_SLACK_FACTOR` therefore controls
how often repruning is performed: with
`GRAPH_STORE_HISTORY_SLACK_FACTOR=1.5` and `history_blocks` set to 10,000,
a reprune will happen every 5,000 blocks. After the initial pruning, a
reprune therefore happens every `history_blocks * (1 -
GRAPH_STORE_HISTORY_SLACK_FACTOR)` blocks. This value should be set high
enough so that repruning occurs relatively infrequently to not cause too
much database work.

Pruning uses two different strategies for how to remove unneeded data:
rebuilding tables and deleting old entity versions. Deleting old entity
versions is straightforward: this strategy deletes rows from the underlying
tables. Rebuilding tables will copy the data that should be kept from the
existing tables into new tables and then replaces the existing tables with
these much smaller tables. Which strategy to use is determined for each
table individually, and governed by the settings for
`GRAPH_STORE_HISTORY_REBUILD_THRESHOLD` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these thresholds 0-1 (i.e. 0.5 is 50%)? Or 0-100?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's between 0 and 1, added that to the text

`GRAPH_STORE_HISTORY_DELETE_THRESHOLD`, both numbers between 0 and 1: if we
estimate that we will remove more than `REBUILD_THRESHOLD` of the table,
the table will be rebuilt. If we estimate that we will remove a fraction
between `REBUILD_THRESHOLD` and `DELETE_THRESHOLD` of the table, unneeded
entity versions will be deleted. If we estimate to remove less than
`DELETE_THRESHOLD`, the table is not changed at all. With both strategies,
operations are broken into batches that should each take
`GRAPH_STORE_BATCH_TARGET_DURATION` seconds to avoid causing very
long-running transactions.

Pruning, in most cases, runs in parallel with indexing and does not block
it. When the rebuild strategy is used, pruning does block indexing while it
copies non-final entities from the existing table to the new table.

The initial prune started by `graphman prune` prints a progress report on
the console. For the ongoing prune runs that are periodically performed,
the following information is logged: a message `Start pruning historical
entities` which includes the earliest and latest block, a message `Analyzed
N tables`, and a message `Finished pruning entities` with details about how
much was deleted or copied and how long that took. Pruning analyzes tables,
if that seems necessary, because its estimates of how much of a table is
likely not needed are based on Postgres statistics.

### Caveats

Pruning is a user-visible operation and does affect some of the things that
can be done with a deployment:

* because it removes history, it restricts how far back time-travel queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth linking to the time travel docs page?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked at the time-travel doc, and it's super low-level about how rows in the db are manipulated. Seems we miss more of a user-level explanation of it.

can be performed. This will only be an issue for entities that keep
lifetime statistics about some object (e.g., a token) and are used to
produce time series: after pruning, it is only possible to produce a time
series that goes back no more than `history_blocks`. It is very
beneficial though for entities that keep daily or similar statistics
about some object as it removes data that is not needed once the time
period is over, and does not affect how far back time series based on
these objects can be retrieved.
* it restricts how far back a graft can be performed. Because it removes
history, it becomes impossible to graft more than `history_blocks` before
the current deployment head.
43 changes: 22 additions & 21 deletions graph/src/components/store/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1208,7 +1208,7 @@ pub enum PrunePhase {
impl PrunePhase {
pub fn strategy(&self) -> PruningStrategy {
match self {
PrunePhase::CopyFinal | PrunePhase::CopyNonfinal => PruningStrategy::Copy,
PrunePhase::CopyFinal | PrunePhase::CopyNonfinal => PruningStrategy::Rebuild,
PrunePhase::Delete => PruningStrategy::Delete,
}
}
Expand Down Expand Up @@ -1247,9 +1247,9 @@ pub trait PruneReporter: Send + 'static {
/// Select how pruning should be done
#[derive(Clone, Copy, Debug, Display, PartialEq)]
pub enum PruningStrategy {
/// Copy the data we want to keep to new tables and swap them out for
/// the existing tables
Copy,
/// Rebuild by copying the data we want to keep to new tables and swap
/// them out for the existing tables
Rebuild,
/// Delete unneeded data from the existing tables
Delete,
}
Expand All @@ -1270,12 +1270,12 @@ pub struct PruneRequest {
pub final_block: BlockNumber,
/// The latest block, i.e., the subgraph head
pub latest_block: BlockNumber,
/// Use the copy strategy when removing more than this fraction of
/// history. Initialized from `ENV_VARS.store.copy_threshold`, but can
/// be modified after construction
pub copy_threshold: f64,
/// Use the rebuild strategy when removing more than this fraction of
/// history. Initialized from `ENV_VARS.store.rebuild_threshold`, but
/// can be modified after construction
pub rebuild_threshold: f64,
/// Use the delete strategy when removing more than this fraction of
/// history but less than `copy_threshold`. Initialized from
/// history but less than `rebuild_threshold`. Initialized from
/// `ENV_VARS.store.delete_threshold`, but can be modified after
/// construction
pub delete_threshold: f64,
Expand All @@ -1293,11 +1293,11 @@ impl PruneRequest {
first_block: BlockNumber,
latest_block: BlockNumber,
) -> Result<Self, StoreError> {
let copy_threshold = ENV_VARS.store.copy_threshold;
let rebuild_threshold = ENV_VARS.store.rebuild_threshold;
let delete_threshold = ENV_VARS.store.delete_threshold;
if copy_threshold < 0.0 || copy_threshold > 1.0 {
if rebuild_threshold < 0.0 || rebuild_threshold > 1.0 {
return Err(constraint_violation!(
"the copy threshold must be between 0 and 1 but is {copy_threshold}"
"the copy threshold must be between 0 and 1 but is {rebuild_threshold}"
));
}
if delete_threshold < 0.0 || delete_threshold > 1.0 {
Expand Down Expand Up @@ -1331,19 +1331,20 @@ impl PruneRequest {
earliest_block,
final_block,
latest_block,
copy_threshold,
rebuild_threshold,
delete_threshold,
})
}

/// Determine what strategy to use for pruning
///
/// We are pruning `history_pct` of the blocks from a table that has a ratio
/// of `version_ratio` entities to versions. If we are removing more than
/// `copy_threshold` percent of the versions, we prune by copying, and if we
/// are removing more than `delete_threshold` percent of the versions, we
/// prune by deleting. If we would remove less than `delete_threshold`
/// percent of the versions, we don't prune.
/// We are pruning `history_pct` of the blocks from a table that has a
/// ratio of `version_ratio` entities to versions. If we are removing
/// more than `rebuild_threshold` percent of the versions, we prune by
/// rebuilding, and if we are removing more than `delete_threshold`
/// percent of the versions, we prune by deleting. If we would remove
/// less than `delete_threshold` percent of the versions, we don't
/// prune.
pub fn strategy(&self, stats: &VersionStats) -> Option<PruningStrategy> {
// If the deployment doesn't have enough history to cover the reorg
// threshold, do not prune
Expand All @@ -1356,8 +1357,8 @@ impl PruneRequest {
// that `history_pct` will tell us how much of that data pruning
// will remove.
let removal_ratio = self.history_pct(stats) * (1.0 - stats.ratio);
if removal_ratio >= self.copy_threshold {
Some(PruningStrategy::Copy)
if removal_ratio >= self.rebuild_threshold {
Some(PruningStrategy::Rebuild)
} else if removal_ratio >= self.delete_threshold {
Some(PruningStrategy::Delete)
} else {
Expand Down
14 changes: 7 additions & 7 deletions graph/src/env/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,11 @@ pub struct EnvVarsStore {
pub batch_target_duration: Duration,

/// Prune tables where we will remove at least this fraction of entity
/// versions by copying. Set by `GRAPH_STORE_HISTORY_COPY_THRESHOLD`.
/// The default is 0.5
pub copy_threshold: f64,
/// versions by rebuilding the table. Set by
/// `GRAPH_STORE_HISTORY_REBUILD_THRESHOLD`. The default is 0.5
pub rebuild_threshold: f64,
/// Prune tables where we will remove at least this fraction of entity
/// versions, but fewer than `copy_threshold`, by deleting. Set by
/// versions, but fewer than `rebuild_threshold`, by deleting. Set by
/// `GRAPH_STORE_HISTORY_DELETE_THRESHOLD`. The default is 0.05
pub delete_threshold: f64,
/// How much history a subgraph with limited history can accumulate
Expand Down Expand Up @@ -134,7 +134,7 @@ impl From<InnerStore> for EnvVarsStore {
connection_idle_timeout: Duration::from_secs(x.connection_idle_timeout_in_secs),
write_queue_size: x.write_queue_size,
batch_target_duration: Duration::from_secs(x.batch_target_duration_in_secs),
copy_threshold: x.copy_threshold.0,
rebuild_threshold: x.rebuild_threshold.0,
delete_threshold: x.delete_threshold.0,
history_slack_factor: x.history_slack_factor.0,
}
Expand Down Expand Up @@ -180,8 +180,8 @@ pub struct InnerStore {
write_queue_size: usize,
#[envconfig(from = "GRAPH_STORE_BATCH_TARGET_DURATION", default = "180")]
batch_target_duration_in_secs: u64,
#[envconfig(from = "GRAPH_STORE_HISTORY_COPY_THRESHOLD", default = "0.5")]
copy_threshold: ZeroToOneF64,
#[envconfig(from = "GRAPH_STORE_HISTORY_REBUILD_THRESHOLD", default = "0.5")]
rebuild_threshold: ZeroToOneF64,
#[envconfig(from = "GRAPH_STORE_HISTORY_DELETE_THRESHOLD", default = "0.05")]
delete_threshold: ZeroToOneF64,
#[envconfig(from = "GRAPH_STORE_HISTORY_SLACK_FACTOR", default = "1.2")]
Expand Down
12 changes: 6 additions & 6 deletions node/src/bin/manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -253,12 +253,12 @@ pub enum Command {
Prune {
/// The deployment to prune (see `help info`)
deployment: DeploymentSearch,
/// Prune by copying when removing more than this fraction of
/// history. Defaults to GRAPH_STORE_HISTORY_COPY_THRESHOLD
/// Prune by rebuilding tables when removing more than this fraction
/// of history. Defaults to GRAPH_STORE_HISTORY_REBUILD_THRESHOLD
#[clap(long, short)]
copy_threshold: Option<f64>,
rebuild_threshold: Option<f64>,
/// Prune by deleting when removing more than this fraction of
/// history but less than copy_threshold. Defaults to
/// history but less than rebuild_threshold. Defaults to
/// GRAPH_STORE_HISTORY_DELETE_THRESHOLD
#[clap(long, short)]
delete_threshold: Option<f64>,
Expand Down Expand Up @@ -1390,7 +1390,7 @@ async fn main() -> anyhow::Result<()> {
Prune {
deployment,
history,
copy_threshold,
rebuild_threshold,
delete_threshold,
once,
} => {
Expand All @@ -1400,7 +1400,7 @@ async fn main() -> anyhow::Result<()> {
primary_pool,
deployment,
history,
copy_threshold,
rebuild_threshold,
delete_threshold,
once,
)
Expand Down
6 changes: 3 additions & 3 deletions node/src/manager/commands/prune.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ pub async fn run(
primary_pool: ConnectionPool,
search: DeploymentSearch,
history: usize,
copy_threshold: Option<f64>,
rebuild_threshold: Option<f64>,
delete_threshold: Option<f64>,
once: bool,
) -> Result<(), anyhow::Error> {
Expand Down Expand Up @@ -198,8 +198,8 @@ pub async fn run(
status.earliest_block_number,
latest,
)?;
if let Some(copy_threshold) = copy_threshold {
req.copy_threshold = copy_threshold;
if let Some(rebuild_threshold) = rebuild_threshold {
req.rebuild_threshold = rebuild_threshold;
}
if let Some(delete_threshold) = delete_threshold {
req.delete_threshold = delete_threshold;
Expand Down
8 changes: 4 additions & 4 deletions store/postgres/src/deployment_store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1290,10 +1290,10 @@ impl DeploymentStore {
site: Arc<Site>,
req: PruneRequest,
) -> Result<(), StoreError> {
let logger = logger.cheap_clone();
retry::forever_async(&logger, "prune", move || {
let logger2 = logger.cheap_clone();
retry::forever_async(&logger2, "prune", move || {
let store = store.cheap_clone();
let reporter = OngoingPruneReporter::new(store.logger.cheap_clone());
let reporter = OngoingPruneReporter::new(logger.cheap_clone());
let site = site.cheap_clone();
async move { store.prune(reporter, site, req).await.map(|_| ()) }
})
Expand Down Expand Up @@ -1969,7 +1969,7 @@ impl PruneReporter for OngoingPruneReporter {

fn prune_batch(&mut self, _table: &str, rows: usize, phase: PrunePhase, _finished: bool) {
match phase.strategy() {
PruningStrategy::Copy => self.rows_copied += rows,
PruningStrategy::Rebuild => self.rows_copied += rows,
PruningStrategy::Delete => self.rows_deleted += rows,
}
}
Expand Down
Loading