-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for pruning and some internal renaming #4505
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just some ideas in the comments!
docs/implementation/pruning.md
Outdated
@@ -0,0 +1,82 @@ | |||
## Pruning deployments | |||
|
|||
Pruning is an operation that deletes data from a deployment that is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might start with the higher level context for, maybe something like...
By default, subgraphs store a full version history for entities, allowing consumers to query the subgraph as of any historical block. Pruning is an operation that deletes entity versions from a deployment older than a certain block, so it is no longer possible to query that deployment as of prior blocks. In GraphQL...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, incorporated.
accumulated more history than that. Whenever the deployment does contain | ||
more history than that, the deployment is automatically repruned. If | ||
ongoing pruning is not desired, pass the `--once` flag to `graphman | ||
prune`. Ongoing pruning can be turned off by setting `history_blocks` to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check my understanding, the turning off pointer here is saying that if you pruned once with (say) 10,000 blocks (setting history_block
to 10,000), if you want to turn off pruning you might call graphman prune --history 1000000000
so 1B blocks, which is effectively no pruning)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's exactly what I meant here
existing tables into new tables and then replaces the existing tables with | ||
these much smaller tables. Which strategy to use is determined for each | ||
table individually, and governed by the settings for | ||
`GRAPH_STORE_HISTORY_REBUILD_THRESHOLD` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these thresholds 0-1 (i.e. 0.5 is 50%)? Or 0-100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's between 0 and 1, added that to the text
docs/implementation/pruning.md
Outdated
`GRAPH_STORE_HISTORY_DELETE_THRESHOLD`: if we estimate that we will remove | ||
more than `REBUILD_THRESHOLD` of the table, the table will be rebuilt. If | ||
we estimate that we will remove a fraction between `REBUILD_THRESHOLD` and | ||
`DELETE_THRESHOLD` of the table, unneeded entity versions will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there checks that REBUILD_THRESHOLD is greater than DELETE_THRESHOLD (does it matter?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked the code - there are actually no checks; you could use that to make REBUILD_THRESHOLD lower than DELETE_THRESHOLD which would disable rebuilding
Pruning is a user-visible operation and does affect some of the things that | ||
can be done with a deployment: | ||
|
||
* because it removes history, it restricts how far back time-travel queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth linking to the time travel docs page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked at the time-travel doc, and it's super low-level about how rows in the db are manipulated. Seems we miss more of a user-level explanation of it.
with pruning. | ||
|
||
Pruning is started by running `graphman prune`. That command will perform | ||
an initial prune of the deployment and set the subgraph's `history_blocks` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that initial prune now async (i.e. it doesn't block indexing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added a paragraph fro that. It blocks indexing with the rebuild strategy while it copies nonfinal entities. I also added another paragraph explaining what log output to look for.
We need to use the logger that adds information about the subgraph
This PR provides some user-level explanation of how pruning works and renames the copy strategy for pruning to 'rebuild'. Since we already have a 'copy' operation in
graph-node
, calling the strategy 'rebuild' reduces the risk for confusion between the two very different operations.