Add new inventory monitor #4665

chimp1984 · 2020-10-17T16:37:05Z

New inventory monitor.

Myself and @jmacxx will work in it and we use that issue for defining the spec.
Current URL: http://46.101.179.224/
Source: https://github.com/chimp1984/bisq/tree/add-InventoryMonitor-module

High level concept:

Instead of requesting the data and then check if all seeds deliver the expected data we add a new request message and the seeds tells us how many objects per data type they have (as well as some other data). This reduces load from 8 MB (we had to exclude the largest data as it would have been much more then) to a few kb. It also does not require that the monitor runs the full Bisq code base but only a tor node and only need to understand the messages which do not contain domain specific dependencies, so its very lightweight both for monitor and seeds.

Goal:

Goal is to get a reliable monitor which can be used for alerting operators if a seed is not in the state it should be.
To achieve that we try to be lightweight and keep things as simple as possible. Flexibility to add new metric types is a goal as well. UI should provide a quick overview, so that with a quick look at it one can see if all is ok, or if there are issues with any seed.

With the https://monitor.bisq.network/ project that goal was never met as false positive rate and instability of the monitor made it impossible to use it for that purpose.

This project is not aiming to compete with feature richness and sophisticated UI of https://monitor.bisq.network. It is intended for devs and operators not for users, thought it is public and users can see it as well, but it is not a goal that it is user-friendly for people who are not familiar with the context.

Current state:

Currently we write html data and provide it via a simple http server inside the monitor app (in java). Parallel we write json data for each response. The data are a hash map of string keys and string values to be flexible for future changes and updates. Type conversion from string to integer or long need to be done per key type. Flexibility is here preferred over type safety.

Example json:

"requestStartTime": 1602948574936,
  "responseTime": 1602948576319,
  "inventory": {
    "blindVoteHash": "3f3b46ecd254e6d3739f8ef76ca1b2e5db92dc19",
    "BlindVotePayload": "316",
    "proposalHash": "dad7456b93944c10f93325b1f78817a92d579ee9",
    "usedMemory": "890",
    "sentMessagesPerSec": "2.24",
    "TempProposalPayload": "66",
    "numConnections": "30",
    "MailboxStoragePayload": "585",
    "AccountAgeWitness": "64253",
    "jvmStartTime": "1602932539817",
    "TradeStatistics3": "76325",
    "receivedMessagesPerSec": "12.86",
    "numBsqBlocks": "81436",
    "daoStateHash": "3fbc3417575aa125c191d69d4ee00b25910d44a2",
    "RefundAgent": "1",
    "Filter": "2",
    "sentData": "639.861 MB",
    "ProposalPayload": "514",
    "receivedData": "939.049 MB",
    "Mediator": "3",
    "Alert": "1",
    "OfferPayload": "437",
    "SignedWitness": "4588",
    "daoStateChainHeight": "653182"

Currently there are 7 seed nodes updated to provide those data and we request every 5 minutes.

Open tasks:

Rethink the file name strategy for json files. Json files are written with the timestamp in ms as file name. A better approach for dealing with historical data would be to use a global persisted counter and use that as file name.
Add checks for data deviations (get average data of all seeds per request and compare individual how far it is away from average. maybe use past requests as well for certain data?). Apply level of warning/alert
Add notification via keybase for alerts. First use a new custom channel to not spam ops while developing. Once false positive rate is low enough point to ops
Add web app reading the json data and displaying recent request results
Add a sub view (on top) with compressed warnings/alerts info. Should be empty most of the time (e.g. "all seeds are ok").
Add support for displaying historical data. Show the warnings/alerts info summary seems to be the most important.
Add button to zoom into a request cycle to see details data
Remove http server from java app once not needed anymore

Priorities per data types

Prio 1:

blindVoteHash, proposalHash need to be the same for all seed nodes at the blocks when those get set (I need to look up when that is and it would be good to add those blocks to the hashmap).
daoStateHash needs to be the same for all seeds with the same block. Changes with each block.
If any of those data is not matching its a severe failure and the op need to be alerted.

Deviation of numBsqBlocks and daoStateChainHeight must be in low range. It is super rare that > 3 blocks are created in very short time. So I would suggest deviation of > 3 blocks is an alert. Still could be valid case but a look up in blockexplorers will resolve that for ops.

Prio 2:

Mediator and RefundAgent must not be 0 (thought RefundAgent could be theoretically). If not its severe error.

Prio 3:

Mediator and RefundAgent should be the same most of the time. Only when a mediator revokes or get added there might be a difference as some seeds might get it earier then others. Those events are vary rare.
Similar is true for Filter and Alert. They should be the same most of the time, just when new ones get published deviation is expected but even then rather rare.

Prio 4:

ProposalPayload, BlindVotePayload and TempProposalPayload should be the same most of the time. Here its a bit more complex as after a certain block those data cannot be added anymore so that they are valid for the DAO, thought technically they can be added. I would suggest to accept low level of deviation (e.g. 10%) but should some color if the data is not the same as that is the 95% case.

Prio 5:

SignedWitness, AccountAgeWitness, MailboxStoragePayload, TradeStatistics3: Those get added all the time but at a low pace. Deviations of < 10% are normal. > 30% should be considered as error.

Prio 6:

OfferPayload gets added and removed all the time. If a big marketmaker goes online/offline it is expected that 100 offers or more are different. We have about 300-550 offers. As far I observed it is rare that deviation is > 100. I would suggest deviation < 10% is normal. 10 - 30% is light warning but still can be a valid case. 30 - 50% should get a severe warning but still no alert. > 50% should send alert to op.

Others:

jvmStartTime: seeds restart once a day: if that time > 1 day and 2 hours send alert.
usedMemory: So far 500MB - 1 GB seems to be normal. If > 1 GB send warning to op.
numConnections: depends on maxConnections set by op (we should prob. add the maxCon param, currenty they use 30 but they could use diff. values per seed). If numConnections > 2 x maxConnections send an alert.
sentMessagesPerSec, receivedMessagesPerSec, sentData, receivedData: Lets obsever a bit normal values and then add alerts if deviation gets larger as usual. Also recent changes in P2P network should lower receivedMessagesPerSec still to get as low as sentMessagesPerSec, might need a while until most users have updated.

The text was updated successfully, but these errors were encountered:

chimp1984 · 2020-10-17T19:14:30Z

@wiz @ripcurlx Is there budget for @jmacxx work? I dont know yet estimated costs, but should stay in a rather low range as we want to keep all as simple as possible. I will delay all my comp. requests until Bisq is more profitable, so from my side there is no budget needed.

@jmacxx Could you add a very rough estimation for your efforts?

wiz · 2020-10-17T21:45:39Z

Yeah this sounds like ops budget, do you have any cost estimate? Also I think this GitHub issue should be moved to the projects repo

chimp1984 · 2020-10-18T03:34:17Z

Moved to bisq-network/projects#45

chimp1984 changed the title ~~Inventory monitor spec~~ Add new inventory monitor Oct 17, 2020

chimp1984 mentioned this issue Oct 18, 2020

Provide a reliable lightweight monitor with notifications bisq-network/projects#45

Closed

chimp1984 closed this as completed Oct 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new inventory monitor #4665

Add new inventory monitor #4665

chimp1984 commented Oct 17, 2020 •

edited

Loading

chimp1984 commented Oct 17, 2020

wiz commented Oct 17, 2020

chimp1984 commented Oct 18, 2020

Add new inventory monitor #4665

Add new inventory monitor #4665

Comments

chimp1984 commented Oct 17, 2020 • edited Loading

New inventory monitor.

High level concept:

Goal:

Current state:

Example json:

Open tasks:

Priorities per data types

Prio 1:

Prio 2:

Prio 3:

Prio 4:

Prio 5:

Prio 6:

Others:

chimp1984 commented Oct 17, 2020

wiz commented Oct 17, 2020

chimp1984 commented Oct 18, 2020

chimp1984 commented Oct 17, 2020 •

edited

Loading