-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrations permanently stuck if gitea is restarted during the migration #13513
Comments
This definitely should get fixed permanently but I'm also open for any manual workarounds that don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually... |
Just delete the repository from admin panel and then migrate it again. |
I will spend another 5 hours re-initializing all of them >.> is there a way to kick off the migrations manually? |
You could use the API? |
why is your gitea being restarted so much? |
(This also leads to the question as to why the migration isn't being cancelled when the machine is restarted, and why the migration stuff isn't restartable...) |
Well, it froze and prevented anyone from SSHing in and had to be force killed. 🙃 So 0 for 2 now. Why is gitea not fault tolerant is a question-as-a-response, lol. |
Yes, I'm fully aware how OSS works (check my profile). The sort of silly question why I'm restarting gitea (faults happen in production...) deserved an answer in-kind. Gitea is not critical for us, I'm not demanding anything, etc. Thank you for the links, I'll patiently await 1.13 then 🙂 |
@Qix- gitea is trying to be tolerant - just SQLite is very limited ... so if you dont use it for your ~5 repos but mirror 74+ repos and more, you realy should consider moving to mysql |
@6543 Why? SQLite is very robust if used correctly. It's been around for decades and is used successfully in production (see: android) every day by billions of users. That's a weak argument. I'm not trying to debate here, I was simply reporting a bug. There's no reason, however, to insinuate that lack of fault tolerance is somehow my fault. It's a bug, it's nobody's fault, and I'm grateful for the project of course. I was simply filing a bug. |
I have nothing against you, I just want to point out that SQLite easily deadlocks when it is used by multiple actors (yes we are trying to get rid of it). And thanks for bug-reporting, without we would not be aware of many bugs 👍 |
@Qix- I'm sorry if you thought that: #13513 (comment) was an inappropriate question It isn't inappropriate, because repos should get deleted if the migration is cancelled because of shutdown. The deadlock explains why they weren't and is the root cause of the problems you are seeing. |
I merely insinuated that a web service would be more robust if it could survive unexpected shutdowns. Gitea being force-killed put it into a corrupted state that cannot be resumed or error-corrected, which is a dist-sys problem. I'm a dist-sys architect; asking me "why are you restarting [a web service]" is like asking me "why did you make your server's power go out during a thunderstorm?". I didn't want that to happen, but it happens. A robust service would be fault tolerant of that. With a single instance running, I highly doubt this is purely SQLite's fault (there are not multiple actors here). Perhaps I'm missing implementation details, but it seems like maybe something could be improved to increase the robustness against failures. That's all I was implying. 🙂 I wasn't trying to put anyone down, but I didn't see how the question fit the bug report at all. |
(@Qix- your replies are reading very aggressively - I'm sorry if mine are reading in the same way. I'm not trying to be aggressive or defensive here.) There already is code to clean up a migration if it fails or gitea is shutdown during a migration - however, this relies on the db not being totally deadlocked at that point. Clearly - that is not a completely robust solution as assuming that the connection to the db was OK at shutdown is probably not something we can rely on and rather we need something that can look at in progress tasks and allow them to be cancelled. It's worth noting however that if SQLite has gone down like this we're in serious trouble - the goroutines block until the db context is killed at hammer - by which time all git operations have to die too. The migration as a whole could and should have a context which is cancelled at shutdown but xorm does not provide a way for us to make a db request with a specific context (AFAIK) so I don't think there is a way. <- OK it looks like this is actually possible just need to set the session context - this would mean propogating the context down to the models package Sequencing these things is not simple - and the answer is that sqlite deadlocks are IMO critical security issues to be solved as soon as possible. Now it would be helpful to provide some way of cancelling migrations - which has been discussed on a different issue and is also not simple. Tasks can run on different gitea instances so the request to cancel a migration would have to be published somewhere - and then caught by the reading gitea and before being cancelled. But of course that would not solve the issue you were having as it was due to a deadlock. I hope that now you see why asking why you were stopping and starting gitea so much is relevant. If you're having to stop and start a web service constantly because of a problem with it - the bug that is forcing you to restart may be the actual reason you're seeing. |
I'm not being aggressive, I just seem to have a different viewpoint than you about software robustness. A fault tolerant web service has the property that, in the event of a failure of any kind, it is able to error-correct and resume operations without manual intervention. There could be a new cron-job; pseudo-code:
I don't see how what I'm saying is "aggressive", I apologize if you've perceived it that way. However, I'm not going to pretend the current behavior is correct or that it's not a bug. If you're not interested in fixing it, that's fine - I can find another solution, it's not a problem. However, I wanted to let you know that this is indeed an issue and that I simply wanted to express that the two responses - "why are you restarting?" and "It's SQLite's problem" - don't make much sense to me as they do not address the fault tolerance point. If SQLite makes it easy for gitea to fail, then gitea should probably have error-correcting logic to correct any errors SQLite might cause. That's all. |
I had to restart it once. I don't know where you got the idea that I was just constantly bringing it up and down. It froze the entire external sshd instance once and that was enough for it to ignore all of the migrations. |
Locking as this issue has been closed and whenever a comment is made 400+ get an email. |
[x]
):Description
#8812 (comment)
Same as mentioned there. Forcefully restarting gitea while a migration is happening will cause any unfinished/pending migrations to hang indefinitely. Manually running the cron tasks in the administration panel does nothing.
I just spent about 5 hours scouring the web for clone links for a bunch of dependencies we need to mirror, I would really prefer not to have to do that again.
Screenshots
The text was updated successfully, but these errors were encountered: