Add V1 of database schema migrator #107
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces native database migration support to Mountaineer. Right now this is still experimental. Explore using it while local but make sure to backup your data before applying changes.
Background
The industry standard migration package for SQLAlchemy is Alembic. Alembic is a powerful and robust solution for migrations; I've used it almost exclusively in past companies. But it suffers from non-trivial setup complexity and in steady state it's sometimes unclear what migration responsibility Alembic owns, versus what should be delegated to SQLAlchemy. At least some of this complexity is required by its focus on migration parity with SQLAlchemy - which itself has to support language dialects across MySQL, Postgres, SQLLite, Oracle, etc. Other migration libraries suffer from their own drawbacks.
With our focus on native and robust Postgres integration, Mountaineer has a lighter lift. I wanted to address the following pain points that I heard most commonly with other migration toolings:
Architecture
Mountaineer clients define their data objects as
SQLModel
schemas. When the application executable is loaded into memory, all theSQLModel
objects are loaded into a global registry. These in-memory objects represent the current state of your application code - essentially the "desired state" that the application should be migrated to. Our goal is to figure out how we can get from the current database state to that desired state. The first stage is populating metadata objects that contain a simplified state of the underlying table/column/type definitions. This logic is separated into:DatabaseSerializer
- Cover the creation ofDBObjects
from the database to memory.DatabaseMemorySerializer
- Cover the creation ofDBObjects
from theSQLModel
schema definitions.Once we have these intermediary representations, we can then build up a DAG of the database dependencies that are defined in Postgres, and a separate DAG for the dependencies that are in-memory. This DAG solves our order-of-operation question of what objects to create first. For instance: tables must be created before columns, and columns before constraints, but enum types need to be created before any column that references them is created. Deletion typically forces the inverse migration (all columns need to be deleted before types, constraints before columns, etc). By comparing these two DAGs for object equality, we can determine what objects need to be modified.
We then topographically sort the dependencies to ensure all required dependencies are fulfilled before we try to perform the migration. Since we have a direct alignment of the "old" object state to the "new" object state, we can easily apply the correct lifecycle method if we need to modify the state that's in the database. Each database metadata object supports the following lifecycle methods:
Migration files
All data changes live in separate migration files. These migration files are automatically generated by the algorithm that was just described. Generate them through the Mountaineer CLI and modify them as you need to handle your data migrations.
The format of these files should be familiar to users that have used Alembic previously; there's an
up
function that covers the migration to the next revision and adown
function that covers reverting to the previous versions. These are standard dependency injection functions, so you can use any dependency injector in your application if you want to inject other variables. By default we just supply themigrator: Migrator
which is a shallow wrapper that provides a database session (with an open connection) alongside anactor
object that includes some common migration recipes.Client Integration
Following the current standard for Mountaineer CLI integrations, we require clients to explicitly define their CLI endpoints. We include basic handlers for import in
mountaineer.migrations.cli
. Clients can integrate like so: