Add something like GORM's AutoMigrate #456

vmihailenco · 2022-02-22T12:11:08Z

First, we should provide some unified way to inspect current database schema via using INFORMATION_SCHEMA.TABLES or something similar. Then we can sync the database by comparing the actual database schema with available Go models.

The text was updated successfully, but these errors were encountered:

ly95 · 2022-08-02T10:25:03Z

+1

bevzzz · 2022-09-14T07:51:26Z

@vmihailenco I've been reading through the issues and noticed this one had help wanted on it. I only started to work with bun recently, but this seems like something I could try to do. Would you like some help with it?

vmihailenco · 2022-09-14T11:50:58Z

@bevzzz The help is welcome here, but you should be warned that this is not a small issue and it will probably require several weeks of work.

If you decide to work on this, my advice would be to learn how GORM AutoMigrate works and what we can learn from it. Writing your thoughts here so others can help would be the next step.

I did not have a chance to learn from GORM AutoMigrate yet, but I guess the plan should look like this:

Improve Bun dialects with ability to inspect database schema, for example, pgdialect should be capable to return a list of tables with all columns. I would start with a single PostgreSQL dialect for now.
Decide which operations we want to support, for example, we can start with 2 operations:

package pgdialect

func (Dialect) RenameColumn(...)
func (Dialect) ChangeColumnType(...)

Use the database schema (step 1) to detect changes and apply commands from step 2 to auto-migrate the schema. Don't get too ambitious and only start with a single operation, for example, renaming a column.

Each step can be done separately so we have a chance to send feedback at the earlier stage.

Also pinging @dizzyfool and @LdDl who done some excellent work on https://github.com/dizzyfool/genna and https://github.com/LdDl/bungen and might be interested to help designing the API for step 1.

bevzzz · 2022-09-15T20:40:22Z

@vmihailenco Thanks for getting back in touch so quickly!
Yes, you're right, bringing this feature will take some time and research, so I really appreciate the steps that you've suggested.
I will familiarize myself with the way it's done in GORM and get back with a proposed solution for the 2 operations you've mentioned.

P.S. I plan to start working on it on the next weekend, I guess we are not in a rush here, right?

vmihailenco · 2022-09-16T06:33:55Z

P.S. I plan to start working on it on the next weekend, I guess we are not in a rush here, right?

No rush for sure. Even making some progress on step 1) will make a change in the long run.

LdDl · 2022-09-21T11:46:51Z

TLDR: just bunch of SQL queries to extract reference information about Postgres tables and my first-look thoughts

My POV:

Reasons to start with PostgreSQL
1. SQLite will be tough since it has not sugar-ish syntax to ALTER columns. Hard copy should be done, something like this:
```
create a_copy(...some new columns...old columns)
insert into a_copy from a (everything, but skip new columns)
drop a
rename a_copy -> a
```
1. Not sure about MySQL, I do not remember if it allows to alter columns. It could be done in same way as with SQLite I guess (with almost same pseudocode, but for columns);
2. MSSQL - I haven't touch it for a thousand years;
3. PostgreSQL - I guess there are not obvious difficulties, aren't there?

Straightforward SQL queries to extract PostgreSQL schemas/tables information:

Basic query to extract all tables with theirs parent scherma

select table_schema, table_name from information_schema.tables where table_type = 'BASE TABLE'
-- Ignore table_type == 'VIEW'
-- Add (table_schema, table_name) or table_name filters if needed

Query to extract relations between tables. Each relation is represented by constraint (foreigh key basically), source schema.table, set of source columns (which do reference), target schema.table, set of target columns (which are referenced). Sets (both source and target) should be properly aggregated - I believe array_agg function could do the job without additional order by call inside of this function.

with
    schemas as (
        select nspname, oid
        from pg_namespace
    ),
    tables as (
        select oid, relnamespace, relname, relkind
        from pg_class
    ),
    columns as (
        select attrelid, attname, attnum
        from pg_attribute a
        where a.attisdropped = false
    )
select distinct 
        co.conname            as constraint_name,
        ss.nspname            as schema_name,
        s.relname             as table_name,
        array_agg(sc.attname) as columns,
        ts.nspname            as target_schema,
        t.relname             as target_table,
        array_agg(tc.attname) as target_columns
from pg_constraint co
left join tables s on co.conrelid = s.oid
left join schemas ss on s.relnamespace = ss.oid
left join columns sc on s.oid = sc.attrelid and sc.attnum = any (co.conkey)
left join tables t on co.confrelid = t.oid
left join schemas ts on t.relnamespace = ts.oid
left join columns tc on t.oid = tc.attrelid and tc.attnum = any (co.confkey)
where co.contype = 'f'
    and co.conrelid in (select oid from pg_class c where c.relkind = 'r')
    and array_position(co.conkey, sc.attnum) = array_position(co.confkey, tc.attnum)
group by constraint_name, schema_name, table_name, target_schema, target_table;
-- Add (ss.nspname, s.relname) filter if needed

And most not obvious part (in terms of SQL query): columns.

Each column is represented by: schema location (either 'public' or not), table_name, column name, column type (in string representation), position of column in table (note: indexing starts with 1), is column part of pk?, is column fk?, is column could be null?, default value for column (in string representation) if it has been provided, size of varchar (it is null for columns of non-text type), is column of enum type?
Note: I'm not that postgres enum type guy, so I'm not sure if enum_values is correct way to get needed information;

with
	    enums as (
	        select distinct true                   as is_enum,
	                        sch.nspname            as table_schema,
	                        tb.relname             as table_name,
	                        col.attname            as column_name,
                            array_agg(e.enumlabel) as enum_values
	        from pg_class tb
	        left join pg_namespace sch on sch.oid = tb.relnamespace
	        left join pg_attribute col on col.attrelid = tb.oid
	        inner join pg_enum e on e.enumtypid = col.atttypid
			group by 1, 2, 3, 4
	    ),
	    arrays as (
	        select sch.nspname  as table_schema,
	               tb.relname   as table_name,
	               col.attname  as column_name,
	               col.attndims as array_dims
	        from pg_class tb
	        left join pg_namespace sch on sch.oid = tb.relnamespace
	        left join pg_attribute col on col.attrelid = tb.oid
	        where col.attndims > 0
	    ),
	    info as (
			select distinct
			 	kcu.table_schema as table_schema,
				kcu.table_name   as table_name,
				kcu.column_name  as column_name,
				array_agg((
					select constraint_type::text 
					from information_schema.table_constraints tc 
					where tc.constraint_name = kcu.constraint_name 
						and tc.constraint_schema = kcu.constraint_schema 
						and tc.constraint_catalog = kcu.constraint_catalog
					limit 1
				)) as constraint_types
			from information_schema.key_column_usage kcu
			group by kcu.table_schema, kcu.table_name, kcu.column_name
	    )
	select distinct c.table_schema = 'public' as is_public,
                    c.table_schema            as schema_name,
	                c.table_name              as table_name,
	                c.column_name             as column_name,
                    c.ordinal_position        as ordinal,
	                case
	                when i.constraint_types is null
	                then false
	                else 'PRIMARY KEY'=any (i.constraint_types)
	                end                                    as is_pk,
	                'FOREIGN KEY'=any (i.constraint_types) as is_fk,
	                c.is_nullable = 'YES'                  as nullable,
	                c.data_type = 'ARRAY'                  as is_array,
	                coalesce(a.array_dims, 0)              as dims,
	                case
	                when e.is_enum = true
	                then 'varchar'
	                else ltrim(c.udt_name, '_')
	                end                         as type,
	                c.column_default            as def,
                    c.character_maximum_length  as len,
					e.enum_values 				as enum
	from information_schema.tables t
	left join information_schema.columns c using (table_name, table_schema)
	left join info i using (table_name, table_schema, column_name)
	left join arrays a using (table_name, table_schema, column_name)
	left join enums e using (table_name, table_schema, column_name)
	where
		 t.table_type = 'BASE TABLE'
	order by 1 desc, 2, 3, 5 asc, 6 desc nulls last
    -- add (t.table_schema, t.table_name) filter if needed

I do think that it will easier to NOT rewrite those queries in ORM way since it is hard to maintain for bad ORM users (even being good at Raw SQL)
I face need of ESPECIALLY AddColumn/DropColumn/RenameColumn functions on my job. Case: we have table with columns id, some_name, some_field_a, created_at, deleted_at and we need to add some_field_b right before created_at, so it leads us to add column, add copy of audit field, drop old audit field, rename new audit fields - pretty much struggle for task.

Indices are least used by me in term of migrations (since in my job I always try to think about them only once and do not change them very often)
I believe automigrate tool must have option to turn off auto migration for provided tables (or even columns?) if it needed
Something must be done with partitions (which have INHERITS (schema_name.parent_table) in DDL). It is real pain for me sometimes when I work with another instruments (diesel-rs). May be some placeholders/regexps? I have no idea.

p.s. Recently I was impressed by this project https://github.com/kyleconroy/sqlc . Could be very cool to add something like this for ORM purposes based on SQL migrations files...but I guess this sounds like a new topic?

vmihailenco · 2022-09-21T12:21:53Z

👍 We need to start with something small and grow it up so it works at least for SQLite and Postgres which have a lot in common.

Most likely we should start by familiarizing ourselves with the API already provided by database/sql, e.g. https://pkg.go.dev/database/sql/driver#RowsColumnTypeDatabaseTypeName

p.s. Recently I was impressed by this project https://github.com/kyleconroy/sqlc . Could be very cool to add something like this for ORM purposes based on SQL migrations files...but I guess this sounds like a new topic?

Yeah, projects like sqlc and Ents open a whole new world of code generation for you :)

bevzzz · 2022-11-17T15:43:48Z

@vmihailenco I wasn't able to look into that yet because I had a lot of work in the past couple of months.
I am still very interested in developing this feature -- I will be taking some time off soon to work on it ✌️

vmihailenco · 2022-11-19T07:07:27Z

@bevzzz take your time :)

bevzzz · 2022-12-02T12:43:54Z

Here are my thoughts on how this feature could be implemented. It felt like a lot of ground to cover, so if some concepts need more clarification/discussion, please let me know.

I am planning to follow the steps suggested above, starting with Postgres:

1. Improve Bun dialects with ability to inspect database schema

For this step I think we'll need the following:

package inspect

type Inspector interface {
	Tables() []*Table
	HasTable(table interface{}) bool
	HasColumn(table interface{}, column string) bool
}

// Not sure if there should be an interface for that, can't think of different implementations
type Table struct {
	ParentSchema string
	Name         string
	Columns      []Column
}

// Each dialect can implement Column to handle the specifics
type Column interface {
	Type() *sql.ColumnType
	
	// Is the database column equivalent to the column definition in the Go-model?
	// Do we need to alter this column in migration?
	Equivalent(schema.Field) bool

	// Additional metadata that's not available in sql.ColumnType
	IsPK() bool
	IsAutoincrement() bool
}

We should define separate Inspector and Migrator interfaces, as that would allow reusing the Inspector interface in
contexts other than migration.
pgdialect (and other dialects) will implement Inspector and Column to handle specifics of each DBMS.
E.g. attributes such as isEnum, isArray arrayDims would be specific to Postgres.
Table can be extended to include indices and constraints.
sql.ColumnType contains a lot of information about table columns and is readily available in the standard lib's API. For existing tables, it can be accessed by db.NewSelect.Model(m).Limit(1).Rows().ColumnTypes(), which is how GORM does it (bun also uses ColumnTypes(), but for a different reason, see mapModel).
I would use what the standard lib provides us with and get additional metadata using the queries that @LdDl has so helpfully shared above (thanks a lot!).

2. Support `RenameColumn` and `ChangeColumnType`

Since Create/DropTable, Create/DropColumn, Create/DropIndex are already part of the IDB interface, I think the new methods for altering these objects would need to added to that interface for the sake of consistency. Doing an ALTER TABLE query is not auto-migration per se, so I wouldn't separate that into its own package.

package bun

type IDB interface {
	...
	NewAlterTable() *AlterTableQuery
	NewAlterIndex() *AlterIndexQuery
}

type AlterTableQuery struct {}

func (q *AlterTableQuery) RenameColumn(...) {}

func (q *AlterTableQuery) ChangeColumnType(...) {}

Some related research and thoughts that I would like to get your opinion on:

Renaming a column:
- Cannot be chained, i.e. every renaming requires a separate ALTER TABLE RENAME COLUMN query.
- After renaming a column, the user might want to rename the related index; I think we should provide a way to do that
  via a new AlterIndexQuery and leave maintaining consistent naming to the user.
- Postgres will update the column name in constraints, foreign keys, index definitions, etc.
Changing column type:
- Can be chained, i.e. renaming every column in a table can be done with a single query.
  Is that OK if we allow chaining ChangeColumnType() calls, while only doing one RenameColumn()?
- Should have USING clause.

Next steps: actual auto-migration

Despite how migration code is structured in GORM, I understand "auto-migration" as being this:

// Lives under migrate/auto
package auto

type Migrator interface {
    // SyncModels detects differences between Go-models and present database schema and resolves them
    SyncModels(ctx context.Context, models ...interface{})
}

Secondly, while this feature is useful in and of itself (for instance, to "initialize" the database on service start), the trouble I see with that is the "runtime" migration is that the history is not recorded and reverting it is not possible. So far my experience with auto-migrations was limited to Django's migrations, where running makemigrations results in a number of migration files (up/down) being generated.

Once we've got auto-migrations figured out, we could leverage the fact that all bun's queries implement Stringer and create "up" and "down" SQL-migration files for the user.
Meanwhile, should these auto-migrations be somehow recorder or do we just treat them as being multiple ALTER/DROP/CREATE queries packaged in one method?

P.S.: I prefer SyncModels() over AutoMigrate() because it is more descriptive. Does that name sound fine to you,
or do you think we should stick with the already-existing AutoMigrate?

On the side note:

I couldn't find a convenient mechanism in bun's API for specifying the schema to which a certain table belongs. Without it, it seems, our auto-migration would end up always creating new tables in the public schema (which is what dbfixture does).

It could be worked around with, e.g. BeforeCreateTableHook, but I feel like we should have a simpler solution (like adding a schema key in the bun: tag) for something so commonplace. What's your take on that? Should I create a separate issue for this feature?

bevzzz · 2022-12-04T22:04:20Z

Started working on ALTER TABLE query, that should cover a lot of the functionality we're missing for auto-migration.
As suggested above, we can introduce it as a separate feature before moving on with this topic.

Each step can be done separately so we have a chance to send feedback at the earlier stage.

The way I see it, the scope of ALTER TABLE functionality that we will need to support is defined by what users can specify in the model definition:

(Have I missed anything?*)

While it's not the first priority, I was thinking about the NewAddColumn() and NewDropColumn() methods that are already part of the bun.IDB API. Since bun is an SQL-first ORM, would it not make sense for those to be part of the AlterTableQuery rather than standalone statements? E.g.:

db.NewAlterTable().Model().AddColumn()  // ALTER TABLE model ADD COLUMN col
db.NewAlterTable().Model().DropColumn()  // ALTER TABLE model DROP COLUMN col

*There're also Table relations, but I haven' t gotten to that yet. I don't suppose bun creates FK-constraints when it sees a relationship defined between 2 models (or does it?). I will need to look at that later.

vmihailenco · 2022-12-05T12:09:12Z

Thanks for writing so many details 👍 I agree that we should start with adding a table inspector - perhaps just an ability to list columns in a table should be enough.

For this step I think we'll need the following:

Looks reasonable and it is hard to tell more without writing some code and getting your hands dirty. I would suggest to name the package sqlschema instead of inspect though.

Regarding the API, I expect to have something like:

import "github.com/uptrace/bun/sqlschema"

var db *bun.DB

inspector := sqlschema.NewInspector(db)

columns, err := inspector.Columns(ctx, tableName)
// err can be sqlschema.ErrNotExists if the table is missing
// columns is []sqlschema.Column

would it not make sense for those to be part of the AlterTableQuery rather than standalone statements? E.g.:

I think we need something like this to build SQL queries:

db.NewAlterTable().Model(...).AddColumn("").Opt1().Opt2().Exec(ctx)
db.NewAlterTable().Model(...).DropColumn("").Opt1().Opt2().Exec(ctx)

But it will get too complex very soon with RDMBS-specific features/options so we should also support something much simpler just for SyncModel:

var db *bun.DB

migrator := sqlschema.NewMigrator(db)

err := migrator.AddColumn(&somepkg.Column{
    Schema: "", // optional
    Table: "table_name",
    Column: "column_name",
    Type: ...,
})
err := migrator.DropColumn(col)
err := migrator.ChangeType(col, newType)

Underneath, it may use Bun API to generate the query or just generate and execute SQL directly.

It could be worked around with, e.g. BeforeCreateTableHook, but I feel like we should have a simpler solution (like adding a schema key in the bun: tag) for something so commonplace. What's your take on that? Should I create a separate issue for this feature?

It probably can be supported via a tag like you suggested bun:"schema:name", but I think we can safely ignore that detail for now. It should not affect the design.

bevzzz · 2022-12-06T16:33:17Z

I have created a draft PR (it is very much WIP) so that we could exchange ideas with specific code at hand. I'm also not sure if some of my code breaks the project's code style, so if it does, please let me know how I could change it.

It only includes ALTER TABLE functionality for now, just in case we want to ship that independently of auto-migration.
Once the remaining tasks here are finished and we agree on the API (it should accommodate the differences between the DBMS, and I haven't looked into that yet), I will add the documentation for this too.

Appreciate your suggestions a lot, and looking forward to hearing your feedback on the progress so-far!

vmihailenco · 2022-12-08T08:32:43Z

Just left a code review and some suggestions. Most of them are just thoughts from the top of my head so please be patient with me :)

And thanks for working on this!

bevzzz · 2022-12-28T10:26:01Z

Found this bug while working on it, so I guess I'll do a PR for that first, because I'd like to use some of the underlying functionality for ALTER TABLE too.

bevzzz · 2023-11-01T21:58:00Z

I opened a PR which adds schema inspection and RenameTable auto-migration to Postgres dialect.

I took a look at how GORM, ent, and django handle this and tried to come up with a solution that fits bun's existing APIs.

The scope of changes we want to support is pretty much that what can be described in a bun.Model:

Create/Drop/Rename tables, move table to a different schema
Add/Drop/Rename columns, alter column type + its constraints (data type, PK, default, unique, nullable)
Change FK relations
(Have I missed anything?)

The tricky part will be resolving dependencies between different migrations (e.g. create table before referencing it in an FK constraint).

Additionally, it would be nice to provide some options like:

Disable DROP (ent calls it "append-only" mode)
Control how FK constraints are named if the related entities are renamed
Control which tables are included in the database inspection (by default all)

vmihailenco added the help wanted Extra attention is needed label Feb 22, 2022

vmihailenco added the enhancement New feature or request label Aug 2, 2022

bevzzz mentioned this issue Dec 6, 2022

Introduce ALTER TABLE query #726

Closed

bevzzz mentioned this issue Dec 7, 2022

Table's schema in the bun tag #728

Closed

vmihailenco mentioned this issue Aug 1, 2023

contribution proposal: buncli, a commandline to interact with db through bun definitions #875

Closed

bevzzz mentioned this issue Nov 1, 2023

AutoMigrate for Postgres dialect #926

Merged

3 tasks

j2gg0s closed this as completed Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add something like GORM's AutoMigrate #456

Add something like GORM's AutoMigrate #456

vmihailenco commented Feb 22, 2022

ly95 commented Aug 2, 2022

bevzzz commented Sep 14, 2022

vmihailenco commented Sep 14, 2022

bevzzz commented Sep 15, 2022

vmihailenco commented Sep 16, 2022

LdDl commented Sep 21, 2022 •

edited

Loading

vmihailenco commented Sep 21, 2022 •

edited

Loading

bevzzz commented Nov 17, 2022 •

edited

Loading

vmihailenco commented Nov 19, 2022

bevzzz commented Dec 2, 2022 •

edited

Loading

bevzzz commented Dec 4, 2022 •

edited

Loading

vmihailenco commented Dec 5, 2022

bevzzz commented Dec 6, 2022 •

edited

Loading

vmihailenco commented Dec 8, 2022

bevzzz commented Dec 28, 2022

bevzzz commented Nov 1, 2023 •

edited

Loading

Add something like GORM's AutoMigrate #456

Add something like GORM's AutoMigrate #456

Comments

vmihailenco commented Feb 22, 2022

ly95 commented Aug 2, 2022

bevzzz commented Sep 14, 2022

vmihailenco commented Sep 14, 2022

bevzzz commented Sep 15, 2022

vmihailenco commented Sep 16, 2022

LdDl commented Sep 21, 2022 • edited Loading

vmihailenco commented Sep 21, 2022 • edited Loading

bevzzz commented Nov 17, 2022 • edited Loading

vmihailenco commented Nov 19, 2022

bevzzz commented Dec 2, 2022 • edited Loading

1. Improve Bun dialects with ability to inspect database schema

2. Support RenameColumn and ChangeColumnType

Next steps: actual auto-migration

bevzzz commented Dec 4, 2022 • edited Loading

vmihailenco commented Dec 5, 2022

bevzzz commented Dec 6, 2022 • edited Loading

vmihailenco commented Dec 8, 2022

bevzzz commented Dec 28, 2022

bevzzz commented Nov 1, 2023 • edited Loading

LdDl commented Sep 21, 2022 •

edited

Loading

vmihailenco commented Sep 21, 2022 •

edited

Loading

bevzzz commented Nov 17, 2022 •

edited

Loading

bevzzz commented Dec 2, 2022 •

edited

Loading

2. Support `RenameColumn` and `ChangeColumnType`

bevzzz commented Dec 4, 2022 •

edited

Loading

bevzzz commented Dec 6, 2022 •

edited

Loading

bevzzz commented Nov 1, 2023 •

edited

Loading