Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data storage and backup #37

Open
szynwelski opened this issue Jun 22, 2023 · 3 comments
Open

Data storage and backup #37

szynwelski opened this issue Jun 22, 2023 · 3 comments
Milestone

Comments

@szynwelski
Copy link
Contributor

If Cosmos generates a block every second, the amount of memory required to store the entire blockchain will be substantial.

The goal of the task is to estimate how much space will be needed in a year under different scenarios:

  • The number of transactions remains at the current level.
  • The number of transactions significantly increases.

The second goal is to find a way to save memory. Possible solutions to consider, along with their consequences, are:

  • Not generating empty blocks (see option create_empty_blocks), at most once in a while (see option create_empty_blocks_interval).
  • Not keeping the entire blockchain for all nodes (see option min-retain-blocks). However, it should be noted that the entire blockchain is necessary for new nodes to join the network, so "archive nodes" and potential data backups must be established.

Security considerations must also be taken into account. If the network does not return data from the entire blockchain, it limits the ability to verify whether the data on Arweave originates from the sequencer.

@szynwelski szynwelski added this to the First phase milestone Jun 22, 2023
@janekolszak
Copy link
Contributor

Another possible solution:

  • Store whole data items only for blocks that aren't confirmed to be on Arweave. Upon confirming that data is saved validators would prune blocks and leave only ids.

@szynwelski szynwelski modified the milestones: First phase, Second phase Jun 23, 2023
@szynwelski
Copy link
Contributor Author

Preliminary estimates show that Cosmos blocks being published approximately every second take up about 1GB per month. As for data items, we have so far reached the largest size around 750MB per month. However, we need to prepare for a larger size in the future.

In summary, for now, we can assume that we will need a total of 2GB per month (1GB for Cosmos data + 1GB for DataItems). However, with an upward trend in the future.

@szynwelski
Copy link
Contributor Author

First version

In the first version, we will launch the network with at least two archival nodes, meaning nodes that will have a complete history from the beginning of the blockchain. The remaining nodes will only have the latest history (the exact size to be determined). Additionally, archival nodes will have an implemented mechanism for data backup.

Next stage

If the size of data stored in archival nodes becomes too costly, the following mechanism can be implemented:

  1. Nodes should have the StateSync mode enabled, which allows joining the network without pulling the entire history (as is the case with BlockSync).
  2. If it turns out that data items occupy a significant portion of the space, a mechanism can be considered where archival nodes serve smaller blocks that only store the ID and owner of data items. The owner is needed to create accounts with public keys that signed the data item and calculate their sequence (also known as nonce, indicating how many interactions that account has sent so far). Then, a sync mechanism similar to BlockSync could be implemented, but based on these smaller blocks. This is possible because the sequencer does not analyze the content of data items (except for the owner) and does not store state dependent on that content. It is only interested in their order. Such cleansed data will occupy significantly less space, and the entire backup can be restored by retrieving the data item data from Arweave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants