-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorg handling may be broken in 0.20 #3940
Comments
It also looks like whatever state the process is in is blocking shutdown. After the first sigterm, it logs that it's gracefully shutting down and you can force shutdown by pushing ctrl-c again. A second sigterm usually kills the process at this point but it's not responding. A sigkill also does nothing, though I don't think sigkills are currently handled. |
Hey, thanks for opening this issue! Unfortunately our
|
I just deliberately put testnet ord into recovery mode and SIGKILL worked for me. Weird that it doesn't work for you. I'll try simulating reorgs on regtest next and see what happens. |
Unfortunately, I already reverted the instance back to 0.19. I'll start a new 0.20 one and try to simulate a reorg, but it may take some time. For SIG KILL, I just tried doing a SIGKILL on a normally running instance and it's ignored completely 👀 Maybe this is due to it running inside a container, but it shouldn't affect the signal as I'm executing it inside a bash terminal in the container ( |
On regtest it seems to recover without a problem.
|
Hmm, I guess there may have been something else wrong. It's just really strange that it happened across multiple instances. I'll try upgrade again and see if it happens. Will close this ticket for now and reopen if I can reproduce it and get more info. Thanks for looking into it 🙌 |
Mainnet reorg crashing on my side as well could we reopen this to see if we can get to the root of the issue? 🙏 |
What block was the reorg at and do you have any log outputs? |
I have no logs sorry but the block was 863888, will report back if anything comes up |
Got the above while testing regtest:
I'm not able to consistently reproduce this error with the steps described above. ord version: 0.20.0 |
Version |
yep, happened to me on testnet3 also. 0.21.2 ,now can not update .
|
I merged #2365 yesterday. This should fix that issue. I've been running it on https://testnet4.ordinals.com (which has very frequent reorgs) and it been going smoothly. Please try it out and let me know if it acutally fixes this. |
@raphjaph Tested this with 0.22.1 and reorg logic is still broken but in a different way.
Opened a PR to fix this: #4169 |
Good catch! That PR will solve a few issues. The other big one that has caught people out before and required a full reindex is when they use Ord in command line mode and not as a server and the index is only updated when they run a command. If they happen to run a command just as a reorg is happening, they will get a save point on a reorged block with the current prod logic. Another issue it would solve is something that happened with the testnet3 attack and the constant reorgs. We had an instance of Ord running on testnet3 and there was a reorg. It rolled back the 30 or so blocks and reindexed, but immediately hit another reorg before it could create a save point, so it rolled back to the previous save point, now about 50 blocks back. Once it caught up, the same thing happened and it eventually ended up in a state where it would roll back about 1000 blocks with the testnet3 craziness going on. This also ended up filling the volume and we ended up having to scale the volume to 1500GB and changing our blocks indexed per run to 1 so that it could create save points reliably. |
It would also be good to find out the cause of the volume leak. I'm not sure why the db size keeps on growing and it never recovers the space once it passes a point where it has recovered from a reorg. |
On space reclaim, on above scenario unfortunately it never recovered from the reorg, it was stuck in an infinite loop to recover from reorg. Maybe its related to this #3856 , tried committing twice on reorg logic but did not seem to help |
There was a reorg on Testnet around block 2904360. Ord 0.20 successfully picked it up and executed the rollback, but the rollback has been stuck for over 2 hours where it used to be close to instant before. This is on multiple instances of the indexer, not just one, so it's not an outlier issue.
Could something have broken in a new version of redb?
The text was updated successfully, but these errors were encountered: