Add V1 of database schema migrator #107

piercefreeman · 2024-05-07T00:21:59Z

This PR introduces native database migration support to Mountaineer. Right now this is still experimental. Explore using it while local but make sure to backup your data before applying changes.

Background

The industry standard migration package for SQLAlchemy is Alembic. Alembic is a powerful and robust solution for migrations; I've used it almost exclusively in past companies. But it suffers from non-trivial setup complexity and in steady state it's sometimes unclear what migration responsibility Alembic owns, versus what should be delegated to SQLAlchemy. At least some of this complexity is required by its focus on migration parity with SQLAlchemy - which itself has to support language dialects across MySQL, Postgres, SQLLite, Oracle, etc. Other migration libraries suffer from their own drawbacks.

With our focus on native and robust Postgres integration, Mountaineer has a lighter lift. I wanted to address the following pain points that I heard most commonly with other migration toolings:

Zero config required, and what configuration is required should be specified code. Avoid jinja templates.
Baked-in support for common Postgres types that overlap with Python, most specifically Enums and datetimes.
File-based groundtruth of migration logic, so it can be audited in source control and customized in case users have more sophisticated migrations they want to run.
API surface should be simple, with atomic Python functions. Direct database queries (or integration with ORM objects in limited cases) can be used for more complex migration logic.

Architecture

Mountaineer clients define their data objects as SQLModel schemas. When the application executable is loaded into memory, all the SQLModel objects are loaded into a global registry. These in-memory objects represent the current state of your application code - essentially the "desired state" that the application should be migrated to. Our goal is to figure out how we can get from the current database state to that desired state. The first stage is populating metadata objects that contain a simplified state of the underlying table/column/type definitions. This logic is separated into:

DatabaseSerializer - Cover the creation of DBObjects from the database to memory.
DatabaseMemorySerializer - Cover the creation of DBObjects from the SQLModel schema definitions.

Once we have these intermediary representations, we can then build up a DAG of the database dependencies that are defined in Postgres, and a separate DAG for the dependencies that are in-memory. This DAG solves our order-of-operation question of what objects to create first. For instance: tables must be created before columns, and columns before constraints, but enum types need to be created before any column that references them is created. Deletion typically forces the inverse migration (all columns need to be deleted before types, constraints before columns, etc). By comparing these two DAGs for object equality, we can determine what objects need to be modified.

We then topographically sort the dependencies to ensure all required dependencies are fulfilled before we try to perform the migration. Since we have a direct alignment of the "old" object state to the "new" object state, we can easily apply the correct lifecycle method if we need to modify the state that's in the database. Each database metadata object supports the following lifecycle methods:

@abstractmethod
async def create(self, actor: DatabaseActions):
    pass

@abstractmethod
async def migrate(self, previous: Self, actor: DatabaseActions):
  pass

@abstractmethod
async def destroy(self, actor: DatabaseActions):
  pass

Migration files

All data changes live in separate migration files. These migration files are automatically generated by the algorithm that was just described. Generate them through the Mountaineer CLI and modify them as you need to handle your data migrations.

The format of these files should be familiar to users that have used Alembic previously; there's an up function that covers the migration to the next revision and a down function that covers reverting to the previous versions. These are standard dependency injection functions, so you can use any dependency injector in your application if you want to inject other variables. By default we just supply the migrator: Migrator which is a shallow wrapper that provides a database session (with an open connection) alongside an actor object that includes some common migration recipes.

from mountaineer.migrations.migrator import Migrator
from mountaineer.migrations.migration import MigrationRevisionBase
from mountaineer.migrations.dependency import MigrationDependencies
from fastapi.param_functions import Depends

class MigrationRevision(MigrationRevisionBase):
    up_revision: str = "1715044020"
    down_revision: str | None = None

    async def up(
        self,
        migrator: Migrator = Depends(MigrationDependencies.get_migrator),
    ):
        await migrator.actor.add_not_null(table_name="article", column_name="author")

    async def down(
        self,
        migrator: Migrator = Depends(MigrationDependencies.get_migrator),
    ):
        await migrator.actor.drop_not_null(table_name="article", column_name="author")

Client Integration

Following the current standard for Mountaineer CLI integrations, we require clients to explicitly define their CLI endpoints. We include basic handlers for import in mountaineer.migrations.cli. Clients can integrate like so:

from click import group, option

from mountaineer.io import async_to_sync
from mountaineer.migrations.cli import handle_apply, handle_generate, handle_rollback
from myapp.config import AppConfig

@group
def migrate():
    pass

@migrate.command()
@option("--message", required=False)
@async_to_sync
async def generate(message: str | None):
    _ = AppConfig()  # type: ignore
    await handle_generate(message=message)

@migrate.command()
@async_to_sync
async def apply():
    _ = AppConfig()  # type: ignore
    await handle_apply()

@migrate.command()
@async_to_sync
async def rollback():
    _ = AppConfig()  # type: ignore
    await handle_rollback()

piercefreeman added 3 commits May 6, 2024 17:21

Add core data migration logic

b20ddc9

Add database-backed migration test runner

12af665

Use StrEnum from compatibility layer

5b512de

piercefreeman force-pushed the feature/database-migrations-v1 branch from 8d65564 to 5b512de Compare May 7, 2024 01:24

piercefreeman added 3 commits May 6, 2024 18:25

Rename db metadata objects to stubs

306cdc9

Add user message to header

7bddca9

Fix 3.10 compatibility with migration generics

c545379

piercefreeman merged commit 5370842 into main May 7, 2024
19 checks passed

piercefreeman deleted the feature/database-migrations-v1 branch May 7, 2024 02:09

piercefreeman mentioned this pull request May 7, 2024

Release of database migration support #108

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add V1 of database schema migrator #107

Add V1 of database schema migrator #107

piercefreeman commented May 7, 2024 •

edited

Loading

Add V1 of database schema migrator #107

Add V1 of database schema migrator #107

Conversation

piercefreeman commented May 7, 2024 • edited Loading

Background

Architecture

Migration files

Client Integration

piercefreeman commented May 7, 2024 •

edited

Loading