QA Process report for v0.37.x (and baseline for v0.34.x) (backport #9…

…499) (#9578) * QA Process report for v0.37.x (and baseline for v0.34.x) (#9499) * 1st version. 200 nodes. Missing rotating node * Small fixes * Addressed @jmalicevic's comment * Explain in method how to set the tmint version to test. Improve result section * 1st version of how to run the 'rotating node' testnet * Apply suggestions from @williambanfield Co-authored-by: William Banfield <[email protected]> * Addressed @williambanfield's comments * Added reference to Unix load metric * Added total TXs * Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots * Report for rotating node * Adressed remaining comments from @williambanfield * Cosmetic * Addressed some of @thanethomson's comments * Re-executed the 200 node tests and updated the corresponding sections of the report * Ignore Python virtualenv directories Signed-off-by: Thane Thomson <[email protected]> * Add latency vs throughput script Signed-off-by: Thane Thomson <[email protected]> * Add README for latency vs throughput script Signed-off-by: Thane Thomson <[email protected]> * Fix local links to folders Signed-off-by: Thane Thomson <[email protected]> * v034: only have one level-1 heading Signed-off-by: Thane Thomson <[email protected]> * Adjust headings Signed-off-by: Thane Thomson <[email protected]> * v0.37.x: add links to issues/PRs Signed-off-by: Thane Thomson <[email protected]> * v0.37.x: add note about bug being present in v0.34 Signed-off-by: Thane Thomson <[email protected]> * method: adjust heading depths Signed-off-by: Thane Thomson <[email protected]> * Show data points on latency vs throughput plot Signed-off-by: Thane Thomson <[email protected]> * Add latency vs throughput plots Signed-off-by: Thane Thomson <[email protected]> * Correct mentioning of v0.34.21 and add heading Signed-off-by: Thane Thomson <[email protected]> * Refactor latency vs throughput script Update the latency vs throughput script to rather generate plots from the "raw" CSV output from the loadtime reporting tool as opposed to the separated CSV files from the experimental method. Also update the relevant documentation, and regenerate the images from the raw CSV data (resulting in pretty much the same plots as the previous ones). Signed-off-by: Thane Thomson <[email protected]> * Remove unused default duration const Signed-off-by: Thane Thomson <[email protected]> * Adjust experiment start time to be more accurate and re-plot latency vs throughput Signed-off-by: Thane Thomson <[email protected]> * Addressed @williambanfield's comments * Apply suggestions from code review Co-authored-by: William Banfield <[email protected]> * Apply suggestions from code review Co-authored-by: William Banfield <[email protected]> * scripts: Update latency vs throughput readme for clarity Signed-off-by: Thane Thomson <[email protected]> Signed-off-by: Thane Thomson <[email protected]> Co-authored-by: William Banfield <[email protected]> Co-authored-by: Thane Thomson <[email protected]> (cherry picked from commit b06e1cea5495dc4557d805dcc433a0f771c0fc1c) * Remove v037 dir * Removed reference to v0.37 testnets Co-authored-by: Sergio Mena <[email protected]>
Finschia · Jul 20, 2023 · c94737c · c94737c
1 parent 9353d3b
commit c94737c
Show file tree

Hide file tree

Showing 4 changed files with 186 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -69,3 +69,5 @@ test/fuzz/**/*.zip
 *.pdf
 *.gz
 *.dvi
+# Python virtual environments
+.venv
diff --git a/scripts/qa/reporting/README.md b/scripts/qa/reporting/README.md
@@ -0,0 +1,3 @@
+# Reporting Scripts
+
+Basically, see the [Tendermint v0.34 Reporting Scripts](https://github.com/tendermint/tendermint/blob/v0.34.x/scripts/qa/reporting/README.md).
diff --git a/scripts/qa/reporting/latency_throughput.py b/scripts/qa/reporting/latency_throughput.py
@@ -0,0 +1,170 @@
+#!/usr/bin/env python3
+"""
+A simple script to parse the CSV output from the loadtime reporting tool (see
+https://github.com/Ostracon/finschia/tree/main/test/loadtime/cmd/report).
+
+Produces a plot of average transaction latency vs total transaction throughput
+according to the number of load testing tool WebSocket connections to the
+Tendermint node.
+"""
+
+import argparse
+import csv
+import logging
+import sys
+import matplotlib.pyplot as plt
+import numpy as np
+
+DEFAULT_TITLE = "Ostracon latency vs throughput"
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Renders a latency vs throughput diagram "
+        "for a set of transactions provided by the loadtime reporting tool",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('-t',
+                        '--title',
+                        default=DEFAULT_TITLE,
+                        help='Plot title')
+    parser.add_argument('output_image',
+                        help='Output image file (in PNG format)')
+    parser.add_argument(
+        'input_csv_file',
+        nargs='+',
+        help="CSV input file from which to read transaction data "
+        "- must have been generated by the loadtime reporting tool")
+    args = parser.parse_args()
+
+    logging.basicConfig(format='%(levelname)s\t%(message)s',
+                        stream=sys.stdout,
+                        level=logging.INFO)
+    plot_latency_vs_throughput(args.input_csv_file,
+                               args.output_image,
+                               title=args.title)
+
+
+def plot_latency_vs_throughput(input_files, output_image, title=DEFAULT_TITLE):
+    avg_latencies, throughput_rates = process_input_files(input_files, )
+
+    fig, ax = plt.subplots()
+
+    connections = sorted(avg_latencies.keys())
+    for c in connections:
+        tr = np.array(throughput_rates[c])
+        al = np.array(avg_latencies[c])
+        label = '%d connection%s' % (c, '' if c == 1 else 's')
+        ax.plot(tr, al, 'o-', label=label)
+
+    ax.set_title(title)
+    ax.set_xlabel('Throughput rate (tx/s)')
+    ax.set_ylabel('Average transaction latency (s)')
+
+    plt.legend(loc='upper left')
+    plt.savefig(output_image)
+
+
+def process_input_files(input_files):
+    # Experimental data from which we will derive the latency vs throughput
+    # statistics
+    experiments = {}
+
+    for input_file in input_files:
+        logging.info('Reading %s...' % input_file)
+
+        with open(input_file, 'rt') as inf:
+            reader = csv.DictReader(inf)
+            for tx in reader:
+                experiments = process_tx(experiments, tx)
+
+    return compute_experiments_stats(experiments)
+
+
+def process_tx(experiments, tx):
+    exp_id = tx['experiment_id']
+    # Block time is nanoseconds from the epoch - convert to seconds
+    block_time = float(tx['block_time']) / (10**9)
+    # Duration is also in nanoseconds - convert to seconds
+    duration = float(tx['duration_ns']) / (10**9)
+    connections = int(tx['connections'])
+    rate = int(tx['rate'])
+
+    if exp_id not in experiments:
+        experiments[exp_id] = {
+            'connections': connections,
+            'rate': rate,
+            'block_time_min': block_time,
+            # We keep track of the latency associated with the minimum block
+            # time to estimate the start time of the experiment
+            'block_time_min_duration': duration,
+            'block_time_max': block_time,
+            'total_latencies': duration,
+            'tx_count': 1,
+        }
+        logging.info('Found experiment %s with rate=%d, connections=%d' %
+                     (exp_id, rate, connections))
+    else:
+        # Validation
+        for field in ['connections', 'rate']:
+            val = int(tx[field])
+            if val != experiments[exp_id][field]:
+                raise Exception(
+                    'Found multiple distinct values for field '
+                    '"%s" for the same experiment (%s): %d and %d' %
+                    (field, exp_id, val, experiments[exp_id][field]))
+
+        if block_time < experiments[exp_id]['block_time_min']:
+            experiments[exp_id]['block_time_min'] = block_time
+            experiments[exp_id]['block_time_min_duration'] = duration
+        if block_time > experiments[exp_id]['block_time_max']:
+            experiments[exp_id]['block_time_max'] = block_time
+
+        experiments[exp_id]['total_latencies'] += duration
+        experiments[exp_id]['tx_count'] += 1
+
+    return experiments
+
+
+def compute_experiments_stats(experiments):
+    """Compute average latency vs throughput rate statistics from the given
+    experiments"""
+    stats = {}
+
+    # Compute average latency and throughput rate for each experiment
+    for exp_id, exp in experiments.items():
+        conns = exp['connections']
+        avg_latency = exp['total_latencies'] / exp['tx_count']
+        exp_start_time = exp['block_time_min'] - exp['block_time_min_duration']
+        exp_duration = exp['block_time_max'] - exp_start_time
+        throughput_rate = exp['tx_count'] / exp_duration
+        if conns not in stats:
+            stats[conns] = []
+
+        stats[conns].append({
+            'avg_latency': avg_latency,
+            'throughput_rate': throughput_rate,
+        })
+
+    # Sort stats for each number of connections in order of increasing
+    # throughput rate, and then extract average latencies and throughput rates
+    # as separate data series.
+    conns = sorted(stats.keys())
+    avg_latencies = {}
+    throughput_rates = {}
+    for c in conns:
+        stats[c] = sorted(stats[c], key=lambda s: s['throughput_rate'])
+        avg_latencies[c] = []
+        throughput_rates[c] = []
+        for s in stats[c]:
+            avg_latencies[c].append(s['avg_latency'])
+            throughput_rates[c].append(s['throughput_rate'])
+            logging.info('For %d connection(s): '
+                         'throughput rate = %.6f tx/s\t'
+                         'average latency = %.6fs' %
+                         (c, s['throughput_rate'], s['avg_latency']))
+
+    return (avg_latencies, throughput_rates)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/qa/reporting/requirements.txt b/scripts/qa/reporting/requirements.txt
@@ -0,0 +1,11 @@
+contourpy==1.0.5
+cycler==0.11.0
+fonttools==4.37.4
+kiwisolver==1.4.4
+matplotlib==3.6.1
+numpy==1.23.4
+packaging==21.3
+Pillow==9.2.0
+pyparsing==3.0.9
+python-dateutil==2.8.2
+six==1.16.0
-Original file line number
+Diff line change
@@ Expand Up / @@ -69,3 +69,5 @@ test/fuzz/**/*.zip @@
     *.pdf
     *.gz
     *.dvi
+    # Python virtual environments
+    .venv
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Reporting Scripts

		Basically, see the [Tendermint v0.34 Reporting Scripts](https://github.com/tendermint/tendermint/blob/v0.34.x/scripts/qa/reporting/README.md).