Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add V1 of database schema migrator #107

Merged
merged 6 commits into from
May 7, 2024

Conversation

piercefreeman
Copy link
Owner

@piercefreeman piercefreeman commented May 7, 2024

This PR introduces native database migration support to Mountaineer. Right now this is still experimental. Explore using it while local but make sure to backup your data before applying changes.

Background

The industry standard migration package for SQLAlchemy is Alembic. Alembic is a powerful and robust solution for migrations; I've used it almost exclusively in past companies. But it suffers from non-trivial setup complexity and in steady state it's sometimes unclear what migration responsibility Alembic owns, versus what should be delegated to SQLAlchemy. At least some of this complexity is required by its focus on migration parity with SQLAlchemy - which itself has to support language dialects across MySQL, Postgres, SQLLite, Oracle, etc. Other migration libraries suffer from their own drawbacks.

With our focus on native and robust Postgres integration, Mountaineer has a lighter lift. I wanted to address the following pain points that I heard most commonly with other migration toolings:

  • Zero config required, and what configuration is required should be specified code. Avoid jinja templates.
  • Baked-in support for common Postgres types that overlap with Python, most specifically Enums and datetimes.
  • File-based groundtruth of migration logic, so it can be audited in source control and customized in case users have more sophisticated migrations they want to run.
  • API surface should be simple, with atomic Python functions. Direct database queries (or integration with ORM objects in limited cases) can be used for more complex migration logic.

Architecture

Mountaineer clients define their data objects as SQLModel schemas. When the application executable is loaded into memory, all the SQLModel objects are loaded into a global registry. These in-memory objects represent the current state of your application code - essentially the "desired state" that the application should be migrated to. Our goal is to figure out how we can get from the current database state to that desired state. The first stage is populating metadata objects that contain a simplified state of the underlying table/column/type definitions. This logic is separated into:

  • DatabaseSerializer - Cover the creation of DBObjects from the database to memory.
  • DatabaseMemorySerializer - Cover the creation of DBObjects from the SQLModel schema definitions.

Once we have these intermediary representations, we can then build up a DAG of the database dependencies that are defined in Postgres, and a separate DAG for the dependencies that are in-memory. This DAG solves our order-of-operation question of what objects to create first. For instance: tables must be created before columns, and columns before constraints, but enum types need to be created before any column that references them is created. Deletion typically forces the inverse migration (all columns need to be deleted before types, constraints before columns, etc). By comparing these two DAGs for object equality, we can determine what objects need to be modified.

We then topographically sort the dependencies to ensure all required dependencies are fulfilled before we try to perform the migration. Since we have a direct alignment of the "old" object state to the "new" object state, we can easily apply the correct lifecycle method if we need to modify the state that's in the database. Each database metadata object supports the following lifecycle methods:

@abstractmethod
async def create(self, actor: DatabaseActions):
    pass

@abstractmethod
async def migrate(self, previous: Self, actor: DatabaseActions):
  pass

@abstractmethod
async def destroy(self, actor: DatabaseActions):
  pass

Migration files

All data changes live in separate migration files. These migration files are automatically generated by the algorithm that was just described. Generate them through the Mountaineer CLI and modify them as you need to handle your data migrations.

The format of these files should be familiar to users that have used Alembic previously; there's an up function that covers the migration to the next revision and a down function that covers reverting to the previous versions. These are standard dependency injection functions, so you can use any dependency injector in your application if you want to inject other variables. By default we just supply the migrator: Migrator which is a shallow wrapper that provides a database session (with an open connection) alongside an actor object that includes some common migration recipes.

from mountaineer.migrations.migrator import Migrator
from mountaineer.migrations.migration import MigrationRevisionBase
from mountaineer.migrations.dependency import MigrationDependencies
from fastapi.param_functions import Depends

class MigrationRevision(MigrationRevisionBase):
    up_revision: str = "1715044020"
    down_revision: str | None = None

    async def up(
        self,
        migrator: Migrator = Depends(MigrationDependencies.get_migrator),
    ):
        await migrator.actor.add_not_null(table_name="article", column_name="author")

    async def down(
        self,
        migrator: Migrator = Depends(MigrationDependencies.get_migrator),
    ):
        await migrator.actor.drop_not_null(table_name="article", column_name="author")

Client Integration

Following the current standard for Mountaineer CLI integrations, we require clients to explicitly define their CLI endpoints. We include basic handlers for import in mountaineer.migrations.cli. Clients can integrate like so:

from click import group, option

from mountaineer.io import async_to_sync
from mountaineer.migrations.cli import handle_apply, handle_generate, handle_rollback
from myapp.config import AppConfig

@group
def migrate():
    pass

@migrate.command()
@option("--message", required=False)
@async_to_sync
async def generate(message: str | None):
    _ = AppConfig()  # type: ignore
    await handle_generate(message=message)

@migrate.command()
@async_to_sync
async def apply():
    _ = AppConfig()  # type: ignore
    await handle_apply()

@migrate.command()
@async_to_sync
async def rollback():
    _ = AppConfig()  # type: ignore
    await handle_rollback()

@piercefreeman piercefreeman force-pushed the feature/database-migrations-v1 branch from 8d65564 to 5b512de Compare May 7, 2024 01:24
@piercefreeman piercefreeman merged commit 5370842 into main May 7, 2024
19 checks passed
@piercefreeman piercefreeman deleted the feature/database-migrations-v1 branch May 7, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant