Migrations permanently stuck if gitea is restarted during the migration #13513

Qix- · 2020-11-11T03:17:38Z

Gitea version (or commit ref): 1.12.5
Git version: 2.20.1
Operating system: Debian 10, used the "getting started on linux" instructions from the main site
Database (use [x]):
- PostgreSQL
- MySQL
- MSSQL
- SQLite
Can you reproduce the bug at https://try.gitea.io:
- Yes (provide example URL)
- No, cannot restart try.gitea.io manually
Log gist:

Description

Same as mentioned there. Forcefully restarting gitea while a migration is happening will cause any unfinished/pending migrations to hang indefinitely. Manually running the cron tasks in the administration panel does nothing.

I just spent about 5 hours scouring the web for clone links for a bunch of dependencies we need to mirror, I would really prefer not to have to do that again.

Screenshots

The text was updated successfully, but these errors were encountered:

Qix- · 2020-11-11T03:19:41Z

This definitely should get fixed permanently but I'm also open for any manual workarounds that don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually...

lunny · 2020-11-11T04:33:13Z

Just delete the repository from admin panel and then migrate it again.

Qix- · 2020-11-11T10:38:11Z

don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually

I will spend another 5 hours re-initializing all of them >.> is there a way to kick off the migrations manually?

zeripath · 2020-11-11T10:54:40Z

You could use the API?

zeripath · 2020-11-11T10:54:58Z

why is your gitea being restarted so much?

zeripath · 2020-11-11T10:56:31Z

(This also leads to the question as to why the migration isn't being cancelled when the machine is restarted, and why the migration stuff isn't restartable...)

Qix- · 2020-11-11T10:56:49Z

Well, it froze and prevented anyone from SSHing in and had to be force killed. 🙃 So 0 for 2 now.

Why is gitea not fault tolerant is a question-as-a-response, lol.

zeripath · 2020-11-11T11:01:10Z

OK so you're running SQLite in production and you've hit #13271.

That was fixed by: #13505 and should be fixed in 1.13 by #13507.

Bugs happen. No-one is paying any of us to work on Gitea.

Qix- · 2020-11-11T11:07:54Z

Bugs happen. No-one is paying any of us to work on Gitea.

Yes, I'm fully aware how OSS works (check my profile). The sort of silly question why I'm restarting gitea (faults happen in production...) deserved an answer in-kind. Gitea is not critical for us, I'm not demanding anything, etc.

Thank you for the links, I'll patiently await 1.13 then 🙂

6543 · 2020-11-11T11:58:35Z

@Qix- gitea is trying to be tolerant - just SQLite is very limited ... so if you dont use it for your ~5 repos but mirror 74+ repos and more, you realy should consider moving to mysql

Qix- · 2020-11-11T12:04:06Z

@6543 Why? SQLite is very robust if used correctly. It's been around for decades and is used successfully in production (see: android) every day by billions of users.

That's a weak argument. I'm not trying to debate here, I was simply reporting a bug. There's no reason, however, to insinuate that lack of fault tolerance is somehow my fault. It's a bug, it's nobody's fault, and I'm grateful for the project of course.

I was simply filing a bug.

6543 · 2020-11-11T12:07:58Z

I have nothing against you, I just want to point out that SQLite easily deadlocks when it is used by multiple actors (yes we are trying to get rid of it).

And thanks for bug-reporting, without we would not be aware of many bugs 👍

zeripath · 2020-11-11T12:40:15Z

@Qix- I'm sorry if you thought that: #13513 (comment) was an inappropriate question

It isn't inappropriate, because repos should get deleted if the migration is cancelled because of shutdown. The deadlock explains why they weren't and is the root cause of the problems you are seeing.

Qix- · 2020-11-11T12:44:28Z

I merely insinuated that a web service would be more robust if it could survive unexpected shutdowns. Gitea being force-killed put it into a corrupted state that cannot be resumed or error-corrected, which is a dist-sys problem.

I'm a dist-sys architect; asking me "why are you restarting [a web service]" is like asking me "why did you make your server's power go out during a thunderstorm?". I didn't want that to happen, but it happens. A robust service would be fault tolerant of that.

With a single instance running, I highly doubt this is purely SQLite's fault (there are not multiple actors here). Perhaps I'm missing implementation details, but it seems like maybe something could be improved to increase the robustness against failures.

That's all I was implying. 🙂 I wasn't trying to put anyone down, but I didn't see how the question fit the bug report at all.

zeripath · 2020-11-11T13:44:12Z

(@Qix- your replies are reading very aggressively - I'm sorry if mine are reading in the same way. I'm not trying to be aggressive or defensive here.)

There already is code to clean up a migration if it fails or gitea is shutdown during a migration - however, this relies on the db not being totally deadlocked at that point.

Clearly - that is not a completely robust solution as assuming that the connection to the db was OK at shutdown is probably not something we can rely on and rather we need something that can look at in progress tasks and allow them to be cancelled. It's worth noting however that if SQLite has gone down like this we're in serious trouble - the goroutines block until the db context is killed at hammer - by which time all git operations have to die too. The migration as a whole could and should have a context which is cancelled at shutdown but xorm does not provide a way for us to make a db request with a specific context (AFAIK) so I don't think there is a way. <- OK it looks like this is actually possible just need to set the session context - this would mean propogating the context down to the models package

Sequencing these things is not simple - and the answer is that sqlite deadlocks are IMO critical security issues to be solved as soon as possible.

Now it would be helpful to provide some way of cancelling migrations - which has been discussed on a different issue and is also not simple. Tasks can run on different gitea instances so the request to cancel a migration would have to be published somewhere - and then caught by the reading gitea and before being cancelled. But of course that would not solve the issue you were having as it was due to a deadlock.

I hope that now you see why asking why you were stopping and starting gitea so much is relevant. If you're having to stop and start a web service constantly because of a problem with it - the bug that is forcing you to restart may be the actual reason you're seeing.

Qix- · 2020-11-11T14:24:48Z

I'm not being aggressive, I just seem to have a different viewpoint than you about software robustness.

A fault tolerant web service has the property that, in the event of a failure of any kind, it is able to error-correct and resume operations without manual intervention.

There could be a new cron-job; pseudo-code:

IF (number_of_migrations_running < migration_concurrency)
AND (query_number_of_unstarted_migrations > 0)
THEN
    start_migration
END

I don't see how what I'm saying is "aggressive", I apologize if you've perceived it that way. However, I'm not going to pretend the current behavior is correct or that it's not a bug. If you're not interested in fixing it, that's fine - I can find another solution, it's not a problem. However, I wanted to let you know that this is indeed an issue and that I simply wanted to express that the two responses - "why are you restarting?" and "It's SQLite's problem" - don't make much sense to me as they do not address the fault tolerance point.

If SQLite makes it easy for gitea to fail, then gitea should probably have error-correcting logic to correct any errors SQLite might cause.

That's all.

6543 · 2020-11-11T14:32:42Z

@Qix- Since what you suggest is a new topic i have created a new issue ... #13515

keep bugs and requests seperated ...

Qix- · 2020-11-11T16:26:03Z

If you're having to stop and start a web service constantly because of a problem with it

I had to restart it once. I don't know where you got the idea that I was just constantly bringing it up and down. It froze the entire external sshd instance once and that was enough for it to ignore all of the migrations.

techknowlogick · 2020-11-11T16:42:50Z

Locking as this issue has been closed and whenever a comment is made 400+ get an email.

Qix- changed the title ~~Migrations permanently stuck if gitea restarting during the migration~~ Migrations permanently stuck if gitea is restarted during the migration Nov 11, 2020

lunny added the type/bug label Nov 11, 2020

Qix- closed this as completed Nov 11, 2020

6543 mentioned this issue Nov 11, 2020

[Feature] Let gitea recover from DB deadlocks #13515

Closed

go-gitea locked and limited conversation to collaborators Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrations permanently stuck if gitea is restarted during the migration #13513

Migrations permanently stuck if gitea is restarted during the migration #13513

Qix- commented Nov 11, 2020 •

edited

Loading

Qix- commented Nov 11, 2020 •

edited

Loading

lunny commented Nov 11, 2020

Qix- commented Nov 11, 2020

zeripath commented Nov 11, 2020

zeripath commented Nov 11, 2020

zeripath commented Nov 11, 2020

Qix- commented Nov 11, 2020

zeripath commented Nov 11, 2020

Qix- commented Nov 11, 2020 •

edited

Loading

6543 commented Nov 11, 2020

Qix- commented Nov 11, 2020

6543 commented Nov 11, 2020

zeripath commented Nov 11, 2020

Qix- commented Nov 11, 2020 •

edited

Loading

zeripath commented Nov 11, 2020 •

edited

Loading

Qix- commented Nov 11, 2020

6543 commented Nov 11, 2020

Qix- commented Nov 11, 2020

techknowlogick commented Nov 11, 2020

Migrations permanently stuck if gitea is restarted during the migration #13513

Migrations permanently stuck if gitea is restarted during the migration #13513

Comments

Qix- commented Nov 11, 2020 • edited Loading

Description

Screenshots

Qix- commented Nov 11, 2020 • edited Loading

lunny commented Nov 11, 2020

Qix- commented Nov 11, 2020

zeripath commented Nov 11, 2020

zeripath commented Nov 11, 2020

zeripath commented Nov 11, 2020

Qix- commented Nov 11, 2020

zeripath commented Nov 11, 2020

Qix- commented Nov 11, 2020 • edited Loading

6543 commented Nov 11, 2020

Qix- commented Nov 11, 2020

6543 commented Nov 11, 2020

zeripath commented Nov 11, 2020

Qix- commented Nov 11, 2020 • edited Loading

zeripath commented Nov 11, 2020 • edited Loading

Qix- commented Nov 11, 2020

6543 commented Nov 11, 2020

Qix- commented Nov 11, 2020

techknowlogick commented Nov 11, 2020

Qix- commented Nov 11, 2020 •

edited

Loading

Qix- commented Nov 11, 2020 •

edited

Loading

Qix- commented Nov 11, 2020 •

edited

Loading

Qix- commented Nov 11, 2020 •

edited

Loading

zeripath commented Nov 11, 2020 •

edited

Loading