Operations can refer to previous ones in a migration file #441

kvch · 2024-10-29T15:10:19Z

This draft PR introduces the idea I had for letting operations refer to previously created resources in the same migration file. It is not yet complete, but I am putting it here to discuss whether this approach is acceptable to everyone.

I introduced two changes to the existing migration process:

Added a new function to Operation named DeriveSchema. It adds the newly introduced resources to the active schema.
Tracking on temporary names during the migration. Some resources are created with a temporary name before complete is called (e.g. tables).

The consequences of these changes are the following:

newSchema is no longer the "live" schema in the database. During Validate and Start phase it can contain resources that are not yet created. At this point, I do not see the problem with this.
All migration types have to implement DeriveSchema.

Closes #203
Closes #239

kvch · 2024-10-29T15:22:24Z

examples/40_create_table_and_index.json

@@ -0,0 +1,36 @@
+{


This already works

andrew-farries

This looks like it's roughly a good direction. I've written some thoughts on doing this as part of a previous WIP attempt to do this.

Making Start, Complete, Rollback, and Validate work

Making Start operations aware of schema changes made by previous migrations is fairly easy; they just need to ensure they lookup any table or column names in s.Tables rather than using the name directly.

Making Validate work is harder. Validate could take a schema and update it in the same way that the Start operations do. Then Validate could be made indirection-aware in the same was as the Start methods; looking up all names in s.Tables. Alternatively,Validate could be changed to lookup either final or temporary versions of each name.

Complete works fine as is; the operations are completed in order so any tables/columns have the final name expected by a subsequent operation.

Not sure about Rollback. Operations would have to be run in reverse order, at least.

Making backfills work

Operations that cause backfills:

add column
drop constraint
alter column

Beware of double-backfills.

Two ops that duplicate the same column - the second op will try to duplicate the duplicated column because of the name mapping in the virtual schema.
Two ops that both create triggers for the same column will fail because they will both try to create a trigger with the same name.
Maybe we choose to not support two ops that backfill the same column in the same migration.

Virtual schema

We take the information about the columns from the virtual schema that is built up by each successive Start operation. This virtual schema is a mix of the schema that was retrieved from the database and the schema that is being built up by the operations that have been run so far. This schema contains limited information about new tables and columns - just their names and, for a table, the names of each of its columns.

Restrictions:

Can't alter a column that in a table that was created in the same migration, or a column in an existing table that was added by an 'add column' operation.
- New columns have very limited information about them in the virtual schema, no type etc so they can't be duplicated, and there won't be enough info about PKs, UNIQUE etc to be able to perform a backfill.
You can add a new column to a table created in an earlier op, but backfill won't be possible, ie you can't specify 'up'
- Again, the new table doesn't have enough into about it in the virtual schema to be able to perform a backfill.

Is it feasible to refresh the schema after each operation from the database?

This would overwrite any changes made by the operations so far, eg the mapping of old names to new names.
Could build a 'schema overlay' that contains the indirections made by the operations so far and put that over the schema retrieved from the database.
It's either that or put enough info about new tables and columns in the virtual schema manually to be able to perform backfills and duplications.

andrew-farries · 2024-11-06T09:29:11Z

A more incremental approach is in #449.

Going one operation at a time, ensuring it can be started and validated when run on objects created in previous operations.

kvch · 2024-11-06T13:42:31Z

Closing in favour of the incremental approached kicked off by @andrew-farries

kvch added 4 commits October 28, 2024 15:51

init

a9869cf

Merge remote-tracking branch 'upstream/main' into feature-derive-schema

9ee1a66

next?

4ef9ddc

more changes

b181ad7

kvch requested a review from andrew-farries October 29, 2024 15:21

kvch commented Oct 29, 2024

View reviewed changes

examples/40_create_table_and_index.json

@@ -0,0 +1,36 @@

{

Copy link

Contributor Author

kvch Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already works

exekias reacted with heart emoji

andrew-farries reviewed Oct 30, 2024

View reviewed changes

kvch closed this Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operations can refer to previous ones in a migration file #441

Operations can refer to previous ones in a migration file #441

kvch commented Oct 29, 2024 •

edited

Loading

kvch Oct 29, 2024

andrew-farries left a comment

andrew-farries commented Nov 6, 2024 •

edited

Loading

kvch commented Nov 6, 2024

Operations can refer to previous ones in a migration file #441

Operations can refer to previous ones in a migration file #441

Conversation

kvch commented Oct 29, 2024 • edited Loading

kvch Oct 29, 2024

Choose a reason for hiding this comment

andrew-farries left a comment

Choose a reason for hiding this comment

Making Start, Complete, Rollback, and Validate work

Making backfills work

Virtual schema

andrew-farries commented Nov 6, 2024 • edited Loading

kvch commented Nov 6, 2024

kvch commented Oct 29, 2024 •

edited

Loading

andrew-farries commented Nov 6, 2024 •

edited

Loading