-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rippled seemingly not aware of its own state, reports incorrectly #2428
Comments
ouch |
What version of rippled are you running? Might help them with debugging |
Good point! 0.90.0 |
This is on a stock installation of Ubuntu 16.04. Only nodejs and rippled are on this box - rippled was installed using alien (as per the guide). I've located the debug.log from a couple of minutes before and after this occurred:
The "missing node" hashes are, respectively, 37142507 (the ledger in question above), and 37142528 (21 ledgers later). |
I'm not sure if this will help, but it might close the gaps in your reported state: |
That's very useful, thanks. I since rebooted with online_delete disabled. It seemed to forget all ledgers and start acquiring over again from that point, which struck me as odd. |
@professorhantzen have you experienced this issue since restarting? |
@bachase No. I haven't done extensive testing, but it seems to be working as expected now. |
I just grep'd debug.log for ERR. It turned up that a few hours later the second missing node error happened again:
I also specifically queried the full contents of that ledger and it seemed to work fine. |
@professorhantzen thank you for submitting the issue. The short answer is complete ledgers in server_info is only an estimate and does not accurately represent what is stored in the database. I have answered your questions below in more detail.
|
@sublimator Excellent point, thank you. |
Welcome Micky :)
|
@miguelportilla Thanks for your great explanation. How is "complete_ledgers" populated? If all of the mechanisms by which a ledger could be added or deleted from the db are defined, shouldn't it possible in principle for "complete_ledgers" to remain accurate without affecting performance? If "complete_ledgers" were only updated at the point of each ledger addition or deletion, there should be no need for an iterative scan. Or is it limited by how asynchrony is handled? |
The challenge is server startup. Consider a server that has full history. It can't serve queries until it has confirmed that it has tens of millions of ledgers. But it's not practical to check every entry in every ledger. The server compromises and checks the SQLite databases and assumes that the back end contains a database if the SQLite database says it should. If the back end database (the one that holds the actual ledger nodes) is corrupt or was deleted (without deleting the SQLite databases), then the server will be lied to. Note that the server will not update its complete ledgers unless it has some reason to. Say your server happens to have ledgers from 10,000,000 up in its SQLite databases. It will detect this on startup and report that in complete ledgers. But if you deleted your node back end database, then it won't actually have those ledgers. When it discovers that it doesn't have them, it will update its tracking of what ledgers are complete. But it won't update complete ledgers to add them even if it happens to retrieve them because it wasn't asked to. During operation, your server acquires all ledgers that it sees on the network and updates its tracking. But if it's only configured to have, say, 100,000 ledgers, it won't add them to its tracking of complete ledgers later during the run. The discovery for ledgers we "happen to have" only occurs on startup. |
good as it gets ?
|
Taking off my rose-coloured glasses, the present situation looks like this: Every rippled node is restricted to having only 0.2% knowledge (1/500th) of which ledgers it has available. The minimum reason for this is because one rippled node may have one missing ledger. If it might further discussion, would something like the following work (apologies if my naive view on rippled limits its usefulness):
In all cases, the goal is that if a bit is set to on in complete_ledgers, this ledger can reliably be expected to be available on disk, and as the value is normally updated during ordinary ledger close events or backfill retrieval, this shouldn't slow things down. Users could be advised if they want their server to run at optimum, they should not "manually delete" database files to free up space, and instead a delete_ledgers RPC command could be provided for the purpose (which would update complete_ledgers as it goes). |
The current "complete_ledgers" does what its designed to do, which as I understand it is to provide a reasonable enough estimate such that rippled can start up and get to work quickly. This made me think the existing problem is partly due to nomenclature. The existing value is not the complete_ledgers. It's an estimate. Perhaps it would be better if "complete_ledgers" was called "complete_ledgers_estimate", or - in the case of also implementing something that provides an accurate measure, split "complete_ledgers" into two:
Thus, any agent that relies on the value - such as ripple-lib - can switch to "complete_ledgers.actual", whereas rippled could continue to operate as normal on "complete_ledgers.estimate". |
Keep it seriously ... ssssssssss
|
Well, anything more thorough than the current method has the potential to take up to 500 times longer. At present, the fastest the above server appears to do the 1/500th indexing is about 31250 ledgers/sec (so 62.5 actual ledger checks) - this is on a pretty decent machine with SSD's benchmarked at 2-3Gb/sec reads. At that rate it might take 20 minutes to "index" a complete set of ledgers. That times 500 to do each one and it's looking like a one week startup time. Obviously it's fair to assume any decent solution won't take 500 times as long, but its also fair to assume any solution will necessarily fall somewhere between the existing time and that. |
Closing this issue - I think we had a sufficient explanation. If there are additional issues please submit a new ticket. |
Just started running a stock rippled server (other than a switch to a nudb backend and increasing the node size, its "out-of-the-box").
After running for a few hours, I did the following (all requests over local ws and with a few ledgers-worth of delay in between):
So:
The text was updated successfully, but these errors were encountered: