-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hydra (simplified) Tail Simulation + Chain Analysis #16
Conversation
This doesn't yet impact the server behavior much because the server multiplexer isn't aware of whether a client is online or offline. So this we need to make sure that the server can notice a client disconnection and act accordingly. We may maybe model a client disconnection / reconnection with a 0-size network message?
- Now go offline _immediately_ after waking up - Now have a non-zero probability of not sending any transaction when they go online
…er for offline clients.
…mulation duration.
…cessing it into 'events'
…t at the right time This speeds up the pipeline from ~21 hours down to ~3 minutes. I also generated new datasets with higher compression rates (1:10000 & 1:100000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did a call to go through the code and review it.
Great work, thank you Matthias!
…nect server handlers. Also renamed 'subscribers' to 'recipients' to make it a bit clearer. There was initially a concept of subscriptions understood from the draft Tail paper, but this is still a blurry concept and we ended up associating transactions with their recipients directly.
Corrected a few points as discussed in this morning's review & in yesterday's call with the research team:
|
- Add a '--concurrency' option to the command-line - Measure actual network usage (read / write) in the simulation analysis - Some renaming (real -> actual) + moved numberOfTransactions from the analysis to the simulation summary
(Simplified) Tail Protocol Simulation
See exe/tail.
The tail protocol simulation works in two steps: preparation and run.
Preparation
The preparation generates clients events from a set of parameters. That is, one can create a simulation over a certain period of time, from a certain number of clients with certain behavior. For example:
The events can then be fed into the simulation for execution. Having both step separated allows for creating events from other sources (e.g. from a real network) while the
prepare
command can be used to establish some baseline on simple patterns. Note that theprepare
command is fully deterministic. Same options yield exactly the same events.Execution
To run a simulation, simply provide an events dataset with possibly some custom options for the server:
The simulation outputs two numbers: a maximum throughput and a real throughput. The real throughput is calculated by looking at the (simulated) time it took to run the simulation, compared to the max throughput which is the best that the server could achieve given the inputs (or said differently, the actual traffic generated by all clients).
Hydra Tail Simulation Scripts
This folder contains a few scripts can be used to generate data-sets to inject in the simulation. It works as a pipeline of Node.js streams using real blockchain data obtained from the Mainnet.
Why Node.js? Because JavaScript and JSON are quite convenient to rapidly prototype something and transform data on the fly.
How to use
The first argument given to pipeline corresponds to the number of clients considered for generating events, whereas the second correspond to the compression rate of the chain (10 means that we only count 1 slot every 10 slots).
NOTE (1): If you haven't downloaded the chain locally, you'll need to install and setup an Ogmios server to download blocks from the chain. The script assumes a local instance up-and-running with the default configuration.
NOTE (2): The entire Cardano chain since the beginning of Shelley spreads across ~1.2M blocks. The various intermediate representations are quite voluminous but the final output is quite compact (for it is a CSV file). On a decent CPU, it takes about 3 minutes to run the whole pipeline with a new set of parameter, assuming the blockchain has already been downloaded.
NOTE (3): The pipeline is single-cored, but multiple pipelines can be ran at once to help generating multiple datasets with different parameters. The output filenames are automatically generated from the script's arguments.
Steps Overview
downloadChain
(~1.2M blocks)Downloads the blockchain from a certain point (by default, from the first Shelley block and onwards). It'll download it both into a file and into a readable stream that is passed to the rest of the pipeline such that (a) The script runs in somewhat constant memory usage, (b) The pipeline produces data immediately.
The file produced is rather voluminous (4.5GB+) and will contain line-separated JSON blocks like this (formatted over multiple-lines for readability):
viewViaStakeKeys
(~5M transactions & ~700K wallets)Extract transactions from each blocks and transform them so that inputs and outputs are directly associated with their corresponding stake keys. Indeed, since the beginning of the Shelley era,
most wallets in Cardano use full delegation addresses containing both a payment part and a delegation part, but uses a single stake key per wallet. Thus, by looking at stake key hashes from
addresses it is possible to track down (Shelley) wallets with a quite good accuracy. This second step does exactly just that, while also trimming out informations that aren't useful for the simulation. This stream transformer produces chunks of line-separated JSON "transactions" which look like the following (formatted over multiple-lines for readability):
Inputs or outputs marked as
null
correspond to either Byron addresses or Shelley addresses with no stake part whatsoever.createEvents
(~ XXX client events)This steps consumes the stream of transactions created by
viewViaStakeKeys
and create Hydra Tail Simulation client events by assigning client ids to each stake key.The number of client ids is however limited and rotates. The effect creates a long stream of transactions but across a vastly smaller set of wallets / clients (the
main chain has about ~700.000 wallets identified by stake keys, and this pipeline step compress them down to ~1000). It also get rid of unknown inputs / outputs and
keep transactions even simpler.
It generate line-separated JSON events as such (note that there's always a 'Pull' event added for every 'NewTx'):
lineSeparatedFile
This final steps format events as CSV and put one event per line in a rather compact format. Note that it also drop the transaction reference to save space and
because a unique identifier can be derived from a simple counter / line number of the corresponding event.
The final format looks like this the following:
The last column
recipients
contains a space-separated list of recipients (or subscribers for a particular transaction). It may contains 0, 1 or many elements.The
size
,amount
andrecipients
columns are always empty forpull
events.