Hydra (simplified) Tail Simulation + Chain Analysis #16

KtorZ · 2021-05-18T13:57:54Z

(Simplified) Tail Protocol Simulation

The tail protocol simulation works in two steps: preparation and run.

Preparation

The preparation generates clients events from a set of parameters. That is, one can create a simulation over a certain period of time, from a certain number of clients with certain behavior. For example:

$ hydra-tail-simulation prepare \
  --number-of-clients 1000 \
  --duration 60 \
  --client-online-likelihood 50%100 \
  --client-submit-likelihood 10%100 
  events.csv

PrepareOptions
    { numberOfClients = 1000
    , duration = SlotNo 60
    , clientOptions = ClientOptions
        { onlineLikelihood = 1 % 2
        , submitLikelihood = 1 % 10
        }
    }

The events can then be fed into the simulation for execution. Having both step separated allows for creating events from other sources (e.g. from a real network) while the prepare command can be used to establish some baseline on simple patterns. Note that the prepare command is fully deterministic. Same options yield exactly the same events.

Execution

To run a simulation, simply provide an events dataset with possibly some custom options for the server:

$ hydra-tail-simulation run \
  --slot-length 1s \
  --server-region LondonAWS \
  --server-read-capacity 102400 \
  --server-write-capacity 102400 \
  events.csv

RunOptions
    { slotLength = 1 s
    , serverOptions = ServerOptions
        { region = LondonAWS
        , readCapacity = 102400 KBits/s
        , writeCapacity = 102400 KBits/s
        }
    }
SimulationSummary
    { numberOfClients = 1000
    , numberOfEvents = 33567
    , lastSlot = SlotNo 60
    }
Analyze
    { realThroughput = 50.5261490853439
    , maxThroughput = 50.583333333333336
    , numberOfTransactions = 12491
    }

The simulation outputs two numbers: a maximum throughput and a real throughput. The real throughput is calculated by looking at the (simulated) time it took to run the simulation, compared to the max throughput which is the best that the server could achieve given the inputs (or said differently, the actual traffic generated by all clients).

Hydra Tail Simulation Scripts

This folder contains a few scripts can be used to generate data-sets to inject in the simulation. It works as a pipeline of Node.js streams using real blockchain data obtained from the Mainnet.
Why Node.js? Because JavaScript and JSON are quite convenient to rapidly prototype something and transform data on the fly.

How to use

$ yarn install
$ yarn pipeline 1000 10

The first argument given to pipeline corresponds to the number of clients considered for generating events, whereas the second correspond to the compression rate of the chain (10 means that we only count 1 slot every 10 slots).

NOTE (1): If you haven't downloaded the chain locally, you'll need to install and setup an Ogmios server to download blocks from the chain. The script assumes a local instance up-and-running with the default configuration.

NOTE (2): The entire Cardano chain since the beginning of Shelley spreads across ~1.2M blocks. The various intermediate representations are quite voluminous but the final output is quite compact (for it is a CSV file). On a decent CPU, it takes about 3 minutes to run the whole pipeline with a new set of parameter, assuming the blockchain has already been downloaded.

NOTE (3): The pipeline is single-cored, but multiple pipelines can be ran at once to help generating multiple datasets with different parameters. The output filenames are automatically generated from the script's arguments.

Steps Overview

downloadChain (~1.2M blocks)

Downloads the blockchain from a certain point (by default, from the first Shelley block and onwards). It'll download it both into a file and into a readable stream that is passed to the rest of the pipeline such that (a) The script runs in somewhat constant memory usage, (b) The pipeline produces data immediately.

The file produced is rather voluminous (4.5GB+) and will contain line-separated JSON blocks like this (formatted over multiple-lines for readability):

{
    "headerHash": "b51b1605cc27b0be3a1ab07dfcc2ceb0b0da5e8ab5d0cb944c16366edba92e83",
    "header": {
        "blockHeight": 4490515,
        "slot": 4492900,
        "prevHash": "23fd3b638e8f286978681567d52597b73f7567e18719cef2cbd66bba31303d98",
        "issuerVk": "5fddeedade2714d6db2f9e1104743d2d8d818ecddc306e176108db14caadd441",
        "issuerVrf": "axwYeh90N9B55BQwtqn8eymybovJxGco5VE6kwTyIm8=",
        "blockSize": 1053,
        "blockHash": "f8ffe66aeeac127f30b8672857c4f6b8cb29c9ed24267104619a985105e22ba0"
    },
    "body": [
        {
            "id": "79acf08126546b68d0464417af9530473b8c56c63b2a937bf6451e96e55cb96a",
            "body": {
                "inputs": [
                    {
                        "txId": "397eb970e7980e6ac1eb17fcb26a8df162db4e101f776138d74bbd09ad1a9dee",
                        "index": 0
                    },
                    ...
                ],
                "outputs": [
                    {
                        "address": "addr1qx2kd28nq8ac5prwg32hhvudlwggpgfp8utlyqxu6wqgz62f79qsdmm5dsknt9ecr5w468r9ey0fxwkdrwh08ly3tu9sy0f4qd",
                        "value": 402999781127
                    },
                    ...
                ],
                "certificates": [],
                "withdrawals": {},
                "fee": 218873,
                "timeToLive": 4500080,
                "update": null
            },
            "metadata": {
                "hash": null,
                "body": null
            }
        }
    ]
}

viewViaStakeKeys (~5M transactions & ~700K wallets)

Extract transactions from each blocks and transform them so that inputs and outputs are directly associated with their corresponding stake keys. Indeed, since the beginning of the Shelley era,
most wallets in Cardano use full delegation addresses containing both a payment part and a delegation part, but uses a single stake key per wallet. Thus, by looking at stake key hashes from
addresses it is possible to track down (Shelley) wallets with a quite good accuracy. This second step does exactly just that, while also trimming out informations that aren't useful for the simulation. This stream transformer produces chunks of line-separated JSON "transactions" which look like the following (formatted over multiple-lines for readability):

{
    "ref": "79acf08126546b68d0464417af9530473b8c56c63b2a937bf6451e96e55cb96a",
    "size": 1443,
    "inputs": [
        null,
        null,
        null,
        null,
        null
    ],
    "outputs": [
        {
            "wallet": "f79qsdmm5dsknt9ecr5w468r9ey0fxwkdrwh08ly3tu9s",
            "value": 402999781127
        },
        {
            "wallet": "f79qsdmm5dsknt9ecr5w468r9ey0fxwkdrwh08ly3tu9s",
            "value": 39825492736
        },
        {
            "wallet": "f79qsdmm5dsknt9ecr5w468r9ey0fxwkdrwh08ly3tu9s",
            "value": 1999822602
        },
        {
            "wallet": "f79qsdmm5dsknt9ecr5w468r9ey0fxwkdrwh08ly3tu9s",
            "value": 1000000
        },
        {
            "wallet": "f79qsdmm5dsknt9ecr5w468r9ey0fxwkdrwh08ly3tu9s",
            "value": 100000000000
        }
    ],
    "slot": 4492900
}

Inputs or outputs marked as null correspond to either Byron addresses or Shelley addresses with no stake part whatsoever.

createEvents (~ XXX client events)

This steps consumes the stream of transactions created by viewViaStakeKeys and create Hydra Tail Simulation client events by assigning client ids to each stake key.
The number of client ids is however limited and rotates. The effect creates a long stream of transactions but across a vastly smaller set of wallets / clients (the
main chain has about ~700.000 wallets identified by stake keys, and this pipeline step compress them down to ~1000). It also get rid of unknown inputs / outputs and
keep transactions even simpler.

It generate line-separated JSON events as such (note that there's always a 'Pull' event added for every 'NewTx'):

{"slot":0,"from":986,"msg":"Pull"}
{"slot":0,"from":986,"msg":{"NewTx":{"ref":"f746a18d6a17acf111109ff9a35a8c4bd130f73697188edd2d367cea5efe98a2","size":297,"recipients":[987],"amount":1002000000}}}
{"slot":1,"from":732,"msg":"Pull"}
{"slot":1,"from":732,"msg":{"NewTx":{"ref":"9e383d78de88fed8e222480f2f24766aa919038e3d238afb40d383e3e5069675","size":297,"recipients":[733],"amount":10000000}}}

lineSeparatedFile

This final steps format events as CSV and put one event per line in a rather compact format. Note that it also drop the transaction reference to save space and
because a unique identifier can be derived from a simple counter / line number of the corresponding event.

The final format looks like this the following:

slot  , clientId , event  , size , amount      , recipients
63025 , 28       , pull   ,      ,             ,
63025 , 28       , new-tx , 297  , 2000000     , 632
63031 , 156      , pull   ,      ,             ,
63031 , 156      , new-tx , 5212 , 1391209719  ,
63031 , 157      , pull   ,      ,             ,
63031 , 157      , new-tx , 232  , 148834411   , 158
63034 , 942      , pull   ,      ,             ,
63034 , 942      , new-tx , 320  , 23000000000 ,
63037 , 772      , pull   ,      ,             ,
63037 , 772      , new-tx , 287  , 5455000055  ,

The last column recipients contains a space-separated list of recipients (or subscribers for a particular transaction). It may contains 0, 1 or many elements.
The size, amount and recipients columns are always empty for pull events.

…ifications.

This doesn't yet impact the server behavior much because the server multiplexer isn't aware of whether a client is online or offline. So this we need to make sure that the server can notice a client disconnection and act accordingly. We may maybe model a client disconnection / reconnection with a 0-size network message?

- Now go offline _immediately_ after waking up - Now have a non-zero probability of not sending any transaction when they go online

…er for offline clients.

…ers for it.

…mulation duration.

…, and execution.

…cessing it into 'events'

…t at the right time This speeds up the pipeline from ~21 hours down to ~3 minutes. I also generated new datasets with higher compression rates (1:10000 & 1:100000)

kantp

We did a call to go through the code and review it.

Great work, thank you Matthias!

…nect server handlers. Also renamed 'subscribers' to 'recipients' to make it a bit clearer. There was initially a concept of subscriptions understood from the draft Tail paper, but this is still a blurry concept and we ended up associating transactions with their recipients directly.

KtorZ · 2021-05-19T11:44:21Z

Corrected a few points as discussed in this morning's review & in yesterday's call with the research team:

40a0cd9
📍 Add missing simulated lookup computations in Pull, Connect and Disconnect server handlers.
Also renamed 'subscribers' to 'recipients' to make it a bit clearer.
There was initially a concept of subscriptions understood from the
draft Tail paper, but this is still a blurry concept and we ended up
associating transactions with their recipients directly.
82d9f14
📍 Plot transaction volume in USD using conversion rates from CoinGecko
e2274b3
📍 Fix off-by-one error on time calculation.
35b5f7c
📍 discard self made or byron transactions from the datasets.

- Add a '--concurrency' option to the command-line - Measure actual network usage (read / write) in the simulation analysis - Some renaming (real -> actual) + moved numberOfTransactions from the analysis to the simulation summary

KtorZ added 21 commits May 7, 2021 15:32

Write down boiletplate code for Hydra Tail simulation.

2041d3e

Make clients send messages to the server.

6fafdf1

Only print 'TraceTailSimulation' traces.

49387c6

More elaborate simulation, sending mock txs, acknowledgements and not…

876eb46

…ifications.

adjust client behavior of the Tail simulation

b5e386e

- Now go offline _immediately_ after waking up - Now have a non-zero probability of not sending any transaction when they go online

Keep track of client states in the server, and store messages for lat…

9b968d9

…er for offline clients.

Analyze simulation's trace and show options and analyze as result.

4643023

Fix likelyhood comparison :s ...

7ae89b5

Split Tail simulation in multiple modules and write command-line pars…

0ce1305

…ers for it.

Fix analyze to not falsy count transaction received outside of the si…

7873b8f

…mulation duration.

Rework tail simulation to be in 2-phase: preparation of client events…

6f1786f

…, and execution.

Add node.js scripts for downloading, analyzing the chain and post-pro…

ce9453f

…cessing it into 'events'

Add reader/writer for events as CSV

05a4274

Show an event summary when running the simulation.

a32e479

Sprinkle some newlines in the pipeline to make sure chunks are cut ou…

44b467b

…t at the right time This speeds up the pipeline from ~21 hours down to ~3 minutes. I also generated new datasets with higher compression rates (1:10000 & 1:100000)

Rework README and pass event csv file as argument instead of option

7c87e8b

Rework options to more clearly separte Prepare & Run options.

3c6efce

Plot some metrics about datasets

18eb435

Oversight: implement connect/disconnect client behavior.

c707710

Revert formatting change in hydra-sim

08b48ca

KtorZ requested a review from kantp May 18, 2021 13:57

KtorZ self-assigned this May 18, 2021

kantp approved these changes May 19, 2021

View reviewed changes

KtorZ added 4 commits May 19, 2021 10:34

Plot transaction volume in USD using conversion rates from CoinGecko

82d9f14

Fix off-by-one error on time calculation.

e2274b3

discard self made or byron transactions from the datasets.

35b5f7c

Add (some) simulation results to README

aaa19e6

KtorZ merged commit 07058a9 into master May 21, 2021

KtorZ deleted the Ktorz/hydra-tail-simulation branch May 21, 2021 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hydra (simplified) Tail Simulation + Chain Analysis #16

Hydra (simplified) Tail Simulation + Chain Analysis #16

KtorZ commented May 18, 2021 •

edited

Loading

kantp left a comment

KtorZ commented May 19, 2021

Hydra (simplified) Tail Simulation + Chain Analysis #16

Hydra (simplified) Tail Simulation + Chain Analysis #16

Conversation

KtorZ commented May 18, 2021 • edited Loading

(Simplified) Tail Protocol Simulation

Preparation

Execution

Hydra Tail Simulation Scripts

How to use

Steps Overview

kantp left a comment

Choose a reason for hiding this comment

KtorZ commented May 19, 2021

KtorZ commented May 18, 2021 •

edited

Loading