Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block-aware API for deleting StorageNodes #7405

Open
turadg opened this issue Apr 12, 2023 · 9 comments · May be fixed by #8063
Open

block-aware API for deleting StorageNodes #7405

turadg opened this issue Apr 12, 2023 · 9 comments · May be fixed by #8063
Assignees
Labels
enhancement New feature or request performance Performance related issues

Comments

@turadg
Copy link
Member

turadg commented Apr 12, 2023

What is the Problem Being Solved?

Once a StorageNode is made, there's no clear way to delete it from RPC. #5945 describes setValue(undefined) but that will only delete if sequence: set whereas the contracts all use sequence: append presently.

The sequence value is on the StorageNode object, not the stored path so one work-around is to make a StorageNode object with sequence:false and then call setValue(''), but that wipe out any data written within the current block.

Description of the Design

An API method to delete the node.

When a contract marks a sequence:true node for deletion, it should not affect data already written. That entails waiting for the next block to actually delete it.

A way to trigger that from a wrapping Recorder.

Security Considerations

Scaling Considerations

Test Plan

@turadg turadg added enhancement New feature or request performance Performance related issues labels Apr 12, 2023
@ivanlei ivanlei added the vaults_triage DO NOT USE label Apr 27, 2023
@gibson042
Copy link
Member

Question: Is there a need for recursive delete, or just a single storage node?

@mhofman
Copy link
Member

mhofman commented Sep 20, 2023

@michaelfig @gibson042, thinking about this, should a delete of a storage node with sequence: true just set a marker for deletion in the current block's cell, and (for optimization) we maintain a list of path that can be removed in the next block?

@gibson042
Copy link
Member

Ugh, that would bring in all kinds of complexity (e.g., write then delete then write again). I'd really prefer to keep stomping on any writes that precede delete().

@mhofman
Copy link
Member

mhofman commented Sep 20, 2023

I'm really concerned that contract code does not have the concept of what the block boundary is, and they may expect the write performed in a previous delivery, but executed in the same block, to be visible off-chain.

@michaelfig
Copy link
Member

Keep in mind that basically every write is accompanied by a Tendermint CometBFT event containing the write's data, which will be indexed even if the state is deleted in that block. We may need to update off-chain client libraries to use these events correctly, but then the worst that can happen is only if we relied on provable queries, which we don't.

I may be convinced to handle delete specially, but only if somebody presents clear requirements and semantics for how it should behave in the face of "write then delete then write again".

@mhofman
Copy link
Member

mhofman commented Sep 26, 2023

First, a clarification. My understanding is that vstorage has 2 features:

  • Provable update events
  • hierarchical storage with support for breadth first iteration

vstorage accomplishes the first by offering a sequence: true mode where all updates within a block are written combined (with tracking of the block height in which the updates were made). If multiple updates in a block are generated for a sequence: false node, the events may not be provable (is there a use case for generating events from sequence: false? maybe for any feature implementing its own concatenation?)

I believe that a node should not switch semantics over time. If a node has sequence: true, I assume the consumers would like to be able to always prove all update events. If we introduce delete for those nodes, it should similarly be provable.

I would expect that a deletion of a sequence: true node would actually record the deletion as a special value. My understanding is that currently an empty string value is currently not allowed for an append, so we could use the empty string value as a deletion marker for append nodes, or we could do the right thing and change the definition of StreamCell to Values []*string so that each value entry is null or a string when JSON encoded. Any existing cell will parse just fine with this new definition for the golang/chain side, but I'm less sure how casting clients would handle new cells with a null value.

Then there is the question of intent for a delete of an append node. I think that's where my previous observation comes in that we could optimize our storage cost by actually removing the cell in the subsequent block (if the last value of the cell is a deletion). From what I understand of the vstorage model, we could theoretically generalize this and always rewrite all cell nodes that were just updated the previous block to the latest value from the cell only (low level write, without events generated), but given how an IAVL DB works, that may actually be counter productive.

@mhofman
Copy link
Member

mhofman commented Dec 6, 2024

A potential workaround for a publisher, until a deletion API is available:

  • On sequence true (append) mode node: set value to some sentinel value indication pending deletion
    • empty value is currently not allowed for append mode nodes. Maybe an empty object JSON stringified?
  • Record in a durable map, keyed by current timestamp, the list of node paths pending deletion
  • Iterate over entries in pending deletion map, and for any entry where timestamp < current timestamp, iterate over list of node path:
    • Create a storage node with sequence false for the given path and set its value to an empty string (this causes deletion of the node)

The advantage of the above is that it delays the actual deletion of the node to a subsequent block so that external consumers can consume any final value. The drawback is that it puts the deletion book keeping burden onto the publisher.

Update: The potential users of this deletion API do not currently have a timer, and don't want to gain one, so this workaround doesn't work.

@mhofman
Copy link
Member

mhofman commented Dec 13, 2024

We discussed this more today with @gibson042. Some quick observations:

  • lib-chainStorage.js currently considers empty strings as normal values for sequence / append nodes and passes it to the golang side, which will simply append an empty string value.
    • We want to stop this behavior difference between sequence and not sequence mode.
    • We only have 3 places that currently do not JSON stringify before write:
    • As such we can safely change the behavior of the current setValue("") to always mean deletion
    • Worst case we can handle on the golang side empty string values in append mode to mean deletion
  • Changing lib-chainStorage (adding new API or changing behavior of setValue) requires a bridge vat upgrade, which really should include Report bridge inbound result to cosmic-swingset #8441 or some mechanism to make sure core proposals succeed (and to make the bridge vat vow aware).
  • We need some place in the Store to keep paths that are tombstoned
    • All paths are currently prefixed with the depth as ascii digits. We could chose another prefix that isn't a digit to use for internal bookkeeping
    • Alternative is to use another store altogether
  • We could (should?) change Set to verify that it does not overwrite a StreamCell. However technically a non sequence mode node allows raw access to the storage value, and StreamCell just appear as a value with a certain format.

@michaelfig
Copy link
Member

All paths are currently prefixed with the depth as ascii digits. We could chose another prefix that isn't a digit to use for internal bookkeeping

I prefer this approach. Using "another store altogether" goes against the Cosmos convention of having just one main store per module, and dividing it up by subprefixes. The current vstorage store just happens to reserve '0' through '9' as subprefixes... choosing another byte value would be reasonable.

@turadg turadg changed the title API for deleting StorageNodes block-aware API for deleting StorageNodes Dec 14, 2024
mergify bot added a commit that referenced this issue Dec 17, 2024
refs: #10598

## Description
The dashboard will requires data for each transaction. In the course of designing that we explored how to design the storage node paths without creating too much cost in IAVL. (The data in HEAD carry a burden of magnitude O(k+v) for all network nodes (including non-validators) into perpetuity.)

Examining [the trade-offs](https://alpha.decidedly.co/d/b27cbb6a-df71-4136-aa27-3e947015e84b/view) we arrived at:
- store one node per transaction
  - push all changes through it
  - rely on indexer to demux
- delete on demand
  - when we have [an API for block-safe deletion](#7405) it can delete in the operation
  - until then keep track of what to delete
  - a core-eval can be run to cull data whenever necessary


### Security Considerations
none

### Scaling Considerations
One storage path per transaction, which persists until the deletion job is called.

One entry in a set to keep track of transactions to delete.

### Documentation Considerations
Not developer facing

### Testing Considerations
CI

### Upgrade Considerations
not yet deployed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants