Skip to content

2.) Running Polkadot Archive

Andrew Plaza edited this page Jan 26, 2022 · 16 revisions

Polkadot Archive is a simple CLI created for the express purpose of trying out the functionality of Substrate-Archive. It supports Westend, Kusama, and Polkadot out-of-the-box.

Compiling

git clone https://github.com/paritytech/substrate-archive.git
cd substrate-archive && cargo build --release --bin polkadot-archive

Finished binary will be available in ./target/release/polkadot-archive

Note

Substrate Archive uses compile-time type checks for it's SQL queries, which requires a database that has already been populated with tables. So it is best to build the project first with cargo build --release without a DATABASE_URL set. Once compiled, run it with ./polkadot-archive [OPTIONS] after setting the DATABASE_URL. This allows the polkadot-archive binary to run migrations on the database you created.


Using

Polkadot Archive needs three things to run, a TOML configuration file, a PostgreSQL database, and RabbitMQ. Setting up these services is described in the Setup. You can always check the available CLI flags with ./polkadot-archive --help.

Minimal Configuration without a TOML file

polkadot-archive can be ran in a minimal configuration without configuring a TOML file at all. It only requires two environment variables to be set:

  • CHAIN_DATA_DB: The path to the rocksdb database where chain data is stored. Generally /home/$USER/.local/share/polkadot on Linux, and /home/$USER/Library/Application Support/polkadot/ on MacOSX. More information about this in the section below.
  • DATABASE_URL: The URL to the postgres instance described in Setup
export DATABASE_URL="postgres://postgres:123@localhost/your_db"
export CHAIN_DATA_DB="/home/$USER/.local/share/polkadot" # must be an absolute path

Full Example Config File

This file can also be found in substrate-archive/bin/polkadot-archive/archive.toml

[chain]
# Must be an absolute path to the folder where polkadot/kusama/and westend chain data is stored
# Can also be specified via the `CHAIN_DATA_DB` environment variable
data_path = "/.local/share/polkadot/chains/polkadot/db/full"

# How much should the read-only database keep in cache (MB)
# Optional, default: 128
cache_size = 128

# RocksDB secondary directory
# Optional, default: /<local>/substrate_archive/rocksdb_secondary/
rocksdb_secondary_path = "./substrate_archive/rocksdb_secondary"

[runtime]
# Specification of different methods of executing the runtime Wasm code.
# Optional, "Interpreted" or "Compiled", default: "Interpreted"
#exec_method = "Interpreted"

# Number of threads to dedicate for executing blocks
# Optional, default: the number of logical system threads
# More BlockWorkers requires that you also increase the number of WASM pages
# else the wasm executor will run out of memory.
# This also increases substrate-archives memory usage.
# Generally, you want 32 pages per block worker
block_workers = 8

# Number of 64KB Heap Pages to allocate for WASM execution
# Optional, default: 1024.
wasm_pages = 512

[database]
# Database url.
# Each chain needs it's own PostgreSQL database
# Can also be specified via the `DATABASE_URL` environment variable.
# For production use, using `DATABASE_URL` is preferable.
# More info on the wiki: https://github.com/paritytech/substrate-archive/wiki/1.)-Requirements.
url = "postgres://postgres:123@localhost:5432/polkadot-archive"

[log]
# Optional log level of stdout, default: "DEBUG"
std = "DEBUG"

# Optional file log.
#[log.file]
# Optional log level of file, default: "DEBUG"
#level = "DEBUG"
# Optional log file directory path, default: "/<local>/substrate_archive/"
#dir = "./output/"
# Optional log file name, default: current time in UTC + ".log"
#name = "archive.log"

# Advanced options
#
# Changing these may lead to unexpected results.
[control]
# Whether to index storage via re-executing historical blocks.
# storage_indexing = true

# Timeout to wait for a task to start execution.
# Optional, default: 20 seconds
task_timeout = 20

# Maximium number of blocks to load and insert into database at a time.
# Useful for controlling memory usage.
# Optional, defaults: 100,000
max_block_load = 100000
# URL for RabbitMQ. Default is localhost:5672
# task_url = "amqp://localhost:5672"

[wasm_tracing]
# Targets for tracing.
targets = '''wasm_tracing,pallet,frame,state'''

# Folder where tracing-enabled WASM binaries are kept.
#folder = ""

You can place the configuration file anywhere you want. Once setup, it is a matter of running the CLI: ./polkadot-archive -c ~/some_dir/archive.toml