Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations permanently stuck if gitea is restarted during the migration #13513

Closed
2 of 6 tasks
Qix- opened this issue Nov 11, 2020 · 19 comments
Closed
2 of 6 tasks

Migrations permanently stuck if gitea is restarted during the migration #13513

Qix- opened this issue Nov 11, 2020 · 19 comments
Labels

Comments

@Qix-
Copy link

Qix- commented Nov 11, 2020

  • Gitea version (or commit ref): 1.12.5
  • Git version: 2.20.1
  • Operating system: Debian 10, used the "getting started on linux" instructions from the main site
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No, cannot restart try.gitea.io manually
  • Log gist:

Description

#8812 (comment)

Same as mentioned there. Forcefully restarting gitea while a migration is happening will cause any unfinished/pending migrations to hang indefinitely. Manually running the cron tasks in the administration panel does nothing.

I just spent about 5 hours scouring the web for clone links for a bunch of dependencies we need to mirror, I would really prefer not to have to do that again.

Screenshots

image

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

This definitely should get fixed permanently but I'm also open for any manual workarounds that don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually...

@Qix- Qix- changed the title Migrations permanently stuck if gitea restarting during the migration Migrations permanently stuck if gitea is restarted during the migration Nov 11, 2020
@lunny lunny added the type/bug label Nov 11, 2020
@lunny
Copy link
Member

lunny commented Nov 11, 2020

Just delete the repository from admin panel and then migrate it again.

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually

I will spend another 5 hours re-initializing all of them >.> is there a way to kick off the migrations manually?

@zeripath
Copy link
Contributor

You could use the API?

@zeripath
Copy link
Contributor

why is your gitea being restarted so much?

@zeripath
Copy link
Contributor

(This also leads to the question as to why the migration isn't being cancelled when the machine is restarted, and why the migration stuff isn't restartable...)

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

Well, it froze and prevented anyone from SSHing in and had to be force killed. 🙃 So 0 for 2 now.

Why is gitea not fault tolerant is a question-as-a-response, lol.

@zeripath
Copy link
Contributor

OK so you're running SQLite in production and you've hit #13271.

That was fixed by: #13505 and should be fixed in 1.13 by #13507.

Bugs happen. No-one is paying any of us to work on Gitea.

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

Bugs happen. No-one is paying any of us to work on Gitea.

Yes, I'm fully aware how OSS works (check my profile). The sort of silly question why I'm restarting gitea (faults happen in production...) deserved an answer in-kind. Gitea is not critical for us, I'm not demanding anything, etc.

Thank you for the links, I'll patiently await 1.13 then 🙂

@Qix- Qix- closed this as completed Nov 11, 2020
@6543
Copy link
Member

6543 commented Nov 11, 2020

@Qix- gitea is trying to be tolerant - just SQLite is very limited ... so if you dont use it for your ~5 repos but mirror 74+ repos and more, you realy should consider moving to mysql

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

@6543 Why? SQLite is very robust if used correctly. It's been around for decades and is used successfully in production (see: android) every day by billions of users.

That's a weak argument. I'm not trying to debate here, I was simply reporting a bug. There's no reason, however, to insinuate that lack of fault tolerance is somehow my fault. It's a bug, it's nobody's fault, and I'm grateful for the project of course.

I was simply filing a bug.

@6543
Copy link
Member

6543 commented Nov 11, 2020

I have nothing against you, I just want to point out that SQLite easily deadlocks when it is used by multiple actors (yes we are trying to get rid of it).

And thanks for bug-reporting, without we would not be aware of many bugs 👍

@zeripath
Copy link
Contributor

@Qix- I'm sorry if you thought that: #13513 (comment) was an inappropriate question

It isn't inappropriate, because repos should get deleted if the migration is cancelled because of shutdown. The deadlock explains why they weren't and is the root cause of the problems you are seeing.

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

I merely insinuated that a web service would be more robust if it could survive unexpected shutdowns. Gitea being force-killed put it into a corrupted state that cannot be resumed or error-corrected, which is a dist-sys problem.

I'm a dist-sys architect; asking me "why are you restarting [a web service]" is like asking me "why did you make your server's power go out during a thunderstorm?". I didn't want that to happen, but it happens. A robust service would be fault tolerant of that.

With a single instance running, I highly doubt this is purely SQLite's fault (there are not multiple actors here). Perhaps I'm missing implementation details, but it seems like maybe something could be improved to increase the robustness against failures.

That's all I was implying. 🙂 I wasn't trying to put anyone down, but I didn't see how the question fit the bug report at all.

@zeripath
Copy link
Contributor

zeripath commented Nov 11, 2020

(@Qix- your replies are reading very aggressively - I'm sorry if mine are reading in the same way. I'm not trying to be aggressive or defensive here.)

There already is code to clean up a migration if it fails or gitea is shutdown during a migration - however, this relies on the db not being totally deadlocked at that point.

Clearly - that is not a completely robust solution as assuming that the connection to the db was OK at shutdown is probably not something we can rely on and rather we need something that can look at in progress tasks and allow them to be cancelled. It's worth noting however that if SQLite has gone down like this we're in serious trouble - the goroutines block until the db context is killed at hammer - by which time all git operations have to die too. The migration as a whole could and should have a context which is cancelled at shutdown but xorm does not provide a way for us to make a db request with a specific context (AFAIK) so I don't think there is a way. <- OK it looks like this is actually possible just need to set the session context - this would mean propogating the context down to the models package

Sequencing these things is not simple - and the answer is that sqlite deadlocks are IMO critical security issues to be solved as soon as possible.

Now it would be helpful to provide some way of cancelling migrations - which has been discussed on a different issue and is also not simple. Tasks can run on different gitea instances so the request to cancel a migration would have to be published somewhere - and then caught by the reading gitea and before being cancelled. But of course that would not solve the issue you were having as it was due to a deadlock.


I hope that now you see why asking why you were stopping and starting gitea so much is relevant. If you're having to stop and start a web service constantly because of a problem with it - the bug that is forcing you to restart may be the actual reason you're seeing.

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

I'm not being aggressive, I just seem to have a different viewpoint than you about software robustness.

A fault tolerant web service has the property that, in the event of a failure of any kind, it is able to error-correct and resume operations without manual intervention.

There could be a new cron-job; pseudo-code:

IF (number_of_migrations_running < migration_concurrency)
AND (query_number_of_unstarted_migrations > 0)
THEN
    start_migration
END

I don't see how what I'm saying is "aggressive", I apologize if you've perceived it that way. However, I'm not going to pretend the current behavior is correct or that it's not a bug. If you're not interested in fixing it, that's fine - I can find another solution, it's not a problem. However, I wanted to let you know that this is indeed an issue and that I simply wanted to express that the two responses - "why are you restarting?" and "It's SQLite's problem" - don't make much sense to me as they do not address the fault tolerance point.

If SQLite makes it easy for gitea to fail, then gitea should probably have error-correcting logic to correct any errors SQLite might cause.

That's all.

@6543
Copy link
Member

6543 commented Nov 11, 2020

@Qix- Since what you suggest is a new topic i have created a new issue ... #13515


keep bugs and requests seperated ...

@Qix-
Copy link
Author

Qix- commented Nov 11, 2020

If you're having to stop and start a web service constantly because of a problem with it

I had to restart it once. I don't know where you got the idea that I was just constantly bringing it up and down. It froze the entire external sshd instance once and that was enough for it to ignore all of the migrations.

@techknowlogick
Copy link
Member

Locking as this issue has been closed and whenever a comment is made 400+ get an email.

@go-gitea go-gitea locked and limited conversation to collaborators Nov 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants