-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Multiprocess migration and db_cleanup
#4175
feat(server): Multiprocess migration and db_cleanup
#4175
Conversation
Example output of a 2-product server's startupFrom the import of a database created under 6.22 to 6.23 + this patch.
|
a69b946
to
4e9001a
Compare
interactive=False, | ||
env=environ) | ||
if init_instead_of_upgrade: | ||
LOG.info("[%s] Initialising...", endpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are pretty firmly embedded in US english, certainly for "initialization".
run_migrations_offline() | ||
else: | ||
run_migrations_online() | ||
raise NotImplementedError(f"Offline '{schema}' migration is not possible!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offline migration has been deleted. I'm not sure, what it was, but if it's removed, do we need this NotImplementedError
? I mean, will it ever be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline migrations are an official Alembic feature (alembic -c [config file] -n [schema] upgrade/downgrade [from-revision]:[to-revision] --sql
) in which it does not actually perform migration on an actual database, but generates an SQL script which, when fed to a real database, does the migration. This is great... as long as your migrations scripts only ever touch schema (DDL) but become impossible when the migration scripts want to do something with real data.
If we had been pedantic enough in the past, the removal of this function should have happened at the first moment a schema migration script was introduced which actually queries the database.
Because we have scripts that require live (online) migration, it is impossible to implement offline migrations going forward. No, I do not think we ever used or documented this feature ourselves. However, the lack of the run_migrations_offline()
function (which contents were the automatically generated one from the very beginning of the project) does not prevent the user from exercising the --sql
flag on alembic
. This means that we have no way of actually preventing this bad workflow prior to this moment in this script.
We are in the post-truth AI era where people will eventually go and do unfathomably misguided things, after which thoughtless issues will be reported, which makes our lives harder when we have to decipher bad-quality exception traces, see below. And even if misguided for this specific project, the "industry standard" way of using schema migrations in the "web world" is that you do that with a separate command that looks somewhat or completely unlike your normal commands: either calling alembic ... upgrade
directly; or through a wrapper, such as manage.py migrate
/sqlmigrate
or flask db upgrade
. The fact that CodeChecker does this (and even enforces this, but gracefully) within the normal workflow makes us the odd one out.
Without this change, an attempt at running an offline migration results in a harder-to-understand exception somewhere deep inside the migration scripts, pointing to the location where the first actual database access would need to happen, but can not happen, because offline mode was requested. I think this exception popping out is scary because it talks about things that essentially "point inside the standard library", so to say, and the end of the exception claims that a library type lacks a very much keyworded, special function???
(Also note that some SQL statements still get emitted, before the error happens.)
PRAGMA foreign_keys=OFF;
DROP INDEX ix_reports_checker_id;
ALTER TABLE reports RENAME checker_id TO checker_id_lookup;
PRAGMA foreign_keys=ON;
ALTER TABLE reports RENAME "analyzer_name_MOVED_TO_checkers" TO analyzer_name;
ALTER TABLE reports RENAME "checker_name_MOVED_TO_checkers" TO checker_id;
ALTER TABLE reports RENAME severity_moved_to_checkers TO severity;
ALTER TABLE reports RENAME "checker_cat_UNUSED" TO checker_cat;
ALTER TABLE reports RENAME "bug_type_UNUSED" TO bug_type;
Traceback (most recent call last):
File "./venv_dev/bin/alembic", line 8, in <module>
sys.exit(main())
File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in main
CommandLine(prog=prog).main(argv=argv)
File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 553, in main
self.run_cmd(cfg, options)
File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 530, in run_cmd
fn(
File "./venv_dev/lib/python3.8/site-packages/alembic/command.py", line 335, in downgrade
script.run_env()
File "./venv_dev/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env
util.load_python_file(self.dir, "env.py")
File "./venv_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
module = load_module_py(module_id, path)
File "./venv_dev/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 848, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "./web/server/codechecker_server/migrations/report/env.py", line 126, in <module>
run_migrations_offline()
File "./web/server/codechecker_server/migrations/report/env.py", line 99, in run_migrations_offline
context.run_migrations()
File "<string>", line 8, in run_migrations
File "./venv_dev/lib/python3.8/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
self.get_context().run_migrations(**kw)
File "./venv_dev/lib/python3.8/site-packages/alembic/runtime/migration.py", line 560, in run_migrations
step.migration_fn(**kw)
File "./web/server/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 563, in downgrade
downgrade_reports()
File "./web/server/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 472, in downgrade_reports
Base.prepare(conn, reflect=True)
File "./venv_dev/lib/python3.8/site-packages/sqlalchemy/ext/automap.py", line 786, in prepare
cls.metadata.reflect(
File "./venv_dev/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 4558, in reflect
with bind.connect() as conn:
AttributeError: __enter__
With this change, the exception is thrown earlier (before any SQL statement printouts could happen), the stack trace is much more trivial, and even without a stack trace, the error message is plain in telling us what is going wrong.
Traceback (most recent call last):
File "./venv_dev/bin/alembic", line 8, in <module>
sys.exit(main())
File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in main
CommandLine(prog=prog).main(argv=argv)
File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 553, in main
self.run_cmd(cfg, options)
File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 530, in run_cmd
fn(
File "./venv_dev/lib/python3.8/site-packages/alembic/command.py", line 335, in downgrade
script.run_env()
File "./venv_dev/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env
util.load_python_file(self.dir, "env.py")
File "./venv_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
module = load_module_py(module_id, path)
File "./venv_dev/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 848, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "./web/server/codechecker_server/migrations/report/env.py", line 60, in <module>
raise NotImplementedError(f"Offline '{schema}' migration is not possible!")
NotImplementedError: Offline 'report' migration is not possible!
This patch allows the server to run the schema migration and `db_cleanup` actions in parallel across the configured products, to speed up these operations. Previously, the migration and cleanup ran in a sequential job on all products, forfeiting the benefit of having multiple CPUs on a server. As each product is a separate database and there must not be any shared resource between products, it is safe to run each migration in a separate process, in parallel. Migrations and cleanup is prepared for scheduling in a deterministic order, `ORDER BY endpoint`. (Previously it was done by the `ROWID`.) The connection to the "config" database is released early on to prevent a timeout on the unused and not changing configuration database from crashing the server during a longer-running product migration. Added a facility to create beautiful logging output in migration scripts and for the cleanup routines. This log output will now necessarily include the product's `endpoint` identifier, as the log messages are no longer sequential.
4e9001a
to
1a0be78
Compare
@bruntib I rebased the patch, GitHub misrenders the changes from previous patch snapshot. I did the following changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much, this is a great work and a huge improvement!
This patch allows the server to run the schema migration and
db_cleanup
actions in parallel across the configured products, to speed up these operations. Previously, the migration and cleanup ran in a sequential job on all products, forfeiting the benefit of having multiple CPUs on a server. As each product is a separate database and there must not be any shared resource between products, it is safe to run each migration in a separate process, in parallel.Migrations and cleanup is prepared for scheduling in a deterministic order,
ORDER BY endpoint
. (Previously it was done by theROWID
.)The connection to the "config" database is released early on to prevent a timeout on the unused and not changing configuration database from crashing the server during a longer-running product migration.
Added a facility to create beautiful logging output in migration scripts and for the cleanup routines. This log output will now necessarily include the product's
endpoint
identifier, as the log messages are no longer sequential.