-
Notifications
You must be signed in to change notification settings - Fork 130
Conversation
* add blockchain_height gauge * add blockchain_difficulty_total gauge * add blockchain_announcedBlock_ingest histogram This involved some deep pluming such that the metrics system needs to be created in the PantheonCommand, along with trickle down effects into other consensus engines. This is likely where it should live anyway. Testing necessitated turning the PrometheusMetricsSystem::init method into a singleton accessed via PrometheusMetricsSystem::instance.
I'm all ears for different names for the metrics. We may want to see how they evolve and go on a renaming binge at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only real concern is making PrometheusMetricsSystem
a singleton as that breaks isolation of Pantheon instances running in the same JVM. The description indicated that was required for testing, but I don't see any usages of it outside of production code so not sure why it was needed. What did I miss?
MetricCategory.BLOCKCHAIN, | ||
"difficulty_total", | ||
"Total difficulty of the chainhead", | ||
() -> (double) this.getChainHead().getTotalDifficulty().toLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast to double is proving to be an annoyingly common pattern. Should we introduce a LongSupplier
version of gauges? Even if it just did the cast internally it would tidy code up. Also, in hind-sight I should have used DoubleSupplier
. Might be a separate PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because I also see the potential of a UInt256Supplier.
metricsSystem.createTimer( | ||
MetricCategory.BLOCKCHAIN, | ||
"pantheon_blockchain_announcedBlock_ingest", | ||
"Time to ingest a single announced block"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it meaningful to track importing an announced block separately to just importing a block? ie should we just time how long BlockImporter
takes as that will capture all block imports? I could see it being useful to track how long it takes from first hearing about a block to it being on-chain, but this metric doesn't quite capture that since it only starts the timer after the new block has been retrieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was timing the amount of time to process the block. Its separate code paths IIRc between download and announced blocks, and different tasks (download I see being split into at least three non-adjacent tasks : headers, bodies, receipts).
I could push this down into the persistBlock task, but I was hoping to have the number for the logging statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the timing in the log message is probably justification enough for this. I think at some point having a timer inside BlockImporter
itself which will capture every block imported makes sense, but there are going to be a number of different timings related to block import (e.g. timing import of each batch of blocks when synchronizing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pluming it into tasks is a much larger PR. I'll leave it here for now, but it may move.
LOG.info( | ||
"Successfully imported announced block {} ({}).", | ||
"Successfully imported announced block {} ({}) in {}ms.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the time in the log message is a nice touch.
@@ -48,9 +49,16 @@ | |||
private static final String PANTHEON_PREFIX = "pantheon_"; | |||
private final Map<MetricCategory, Collection<Collector>> collectors = new ConcurrentHashMap<>(); | |||
|
|||
private static final Supplier<MetricsSystem> INSTANCE = | |||
Suppliers.memoize(PrometheusMetricsSystem::init); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong as it makes the metrics system JVM-wide whereas we want to be able to run two separate Pantheon instances in the same JVM and have them entirely isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My Prometheus Server code introduced the test breakage. So this is reverted.
@@ -422,6 +426,10 @@ public void run() { | |||
Configurator.setAllLevels("", logLevel); | |||
} | |||
|
|||
if (metricsSystem == null) { | |||
metricsSystem = PrometheusMetricsSystem.instance(); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would metricsSystem
not be null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an early attempt to address my server test breakage.
# Conflicts: # acceptance-tests/src/test/java/tech/pegasys/pantheon/tests/acceptance/dsl/node/ThreadPantheonNodeRunner.java # pantheon/src/main/java/tech/pegasys/pantheon/RunnerBuilder.java # pantheon/src/main/java/tech/pegasys/pantheon/cli/PantheonCommand.java # pantheon/src/main/java/tech/pegasys/pantheon/cli/PantheonControllerBuilder.java # pantheon/src/test/java/tech/pegasys/pantheon/RunnerTest.java # pantheon/src/test/java/tech/pegasys/pantheon/cli/CommandTestAbstract.java # pantheon/src/test/java/tech/pegasys/pantheon/cli/PantheonCommandTest.java
@@ -95,6 +95,7 @@ public void initMocks() throws Exception { | |||
when(mockRunnerBuilder.webSocketConfiguration(any())).thenReturn(mockRunnerBuilder); | |||
when(mockRunnerBuilder.dataDir(any())).thenReturn(mockRunnerBuilder); | |||
when(mockRunnerBuilder.bannedNodeIds(any())).thenReturn(mockRunnerBuilder); | |||
when(mockRunnerBuilder.metricsSystem(any())).thenReturn(mockRunnerBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a whole lot more lines to add a new object to the builder before the builder remodel.
# Conflicts: # pantheon/src/main/java/tech/pegasys/pantheon/RunnerBuilder.java # pantheon/src/main/java/tech/pegasys/pantheon/cli/PantheonCommand.java
…in other commands.
PR description
This involved some deep pluming such that the metrics system needs to be
created in the PantheonCommand, along with trickle down effects into other
consensus engines. This is likely where it should live anyway.
Testing necessitated turning the PrometheusMetricsSystem::init method
into a singleton accessed via PrometheusMetricsSystem::instance.