Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operations can refer to previous ones in a migration file #441

Closed
wants to merge 4 commits into from

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Oct 29, 2024

This draft PR introduces the idea I had for letting operations refer to previously created resources in the same migration file. It is not yet complete, but I am putting it here to discuss whether this approach is acceptable to everyone.

I introduced two changes to the existing migration process:

  1. Added a new function to Operation named DeriveSchema. It adds the newly introduced resources to the active schema.
  2. Tracking on temporary names during the migration. Some resources are created with a temporary name before complete is called (e.g. tables).

The consequences of these changes are the following:

  • newSchema is no longer the "live" schema in the database. During Validate and Start phase it can contain resources that are not yet created. At this point, I do not see the problem with this.
  • All migration types have to implement DeriveSchema.

Closes #203
Closes #239

@kvch kvch requested a review from andrew-farries October 29, 2024 15:21
@@ -0,0 +1,36 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already works

Copy link
Collaborator

@andrew-farries andrew-farries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's roughly a good direction. I've written some thoughts on doing this as part of a previous WIP attempt to do this.

Making Start, Complete, Rollback, and Validate work

Making Start operations aware of schema changes made by previous migrations is fairly easy; they just need to ensure they lookup any table or column names in s.Tables rather than using the name directly.

Making Validate work is harder. Validate could take a schema and update it in the same way that the Start operations do. Then Validate could be made indirection-aware in the same was as the Start methods; looking up all names in s.Tables. Alternatively,Validate could be changed to lookup either final or temporary versions of each name.

Complete works fine as is; the operations are completed in order so any tables/columns have the final name expected by a subsequent operation.

Not sure about Rollback. Operations would have to be run in reverse order, at least.

Making backfills work

Operations that cause backfills:

  • add column
  • drop constraint
  • alter column

Beware of double-backfills.

  • Two ops that duplicate the same column - the second op will try to duplicate the duplicated column because of the name mapping in the virtual schema.
  • Two ops that both create triggers for the same column will fail because they will both try to create a trigger with the same name.
  • Maybe we choose to not support two ops that backfill the same column in the same migration.

Virtual schema

We take the information about the columns from the virtual schema that is built up by each successive Start operation. This virtual schema is a mix of the schema that was retrieved from the database and the schema that is being built up by the operations that have been run so far. This schema contains limited information about new tables and columns - just their names and, for a table, the names of each of its columns.

Restrictions:

  • Can't alter a column that in a table that was created in the same migration, or a column in an existing table that was added by an 'add column' operation.
    • New columns have very limited information about them in the virtual schema, no type etc so they can't be duplicated, and there won't be enough info about PKs, UNIQUE etc to be able to perform a backfill.
  • You can add a new column to a table created in an earlier op, but backfill won't be possible, ie you can't specify 'up'
    • Again, the new table doesn't have enough into about it in the virtual schema to be able to perform a backfill.

Is it feasible to refresh the schema after each operation from the database?

  • This would overwrite any changes made by the operations so far, eg the mapping of old names to new names.
  • Could build a 'schema overlay' that contains the indirections made by the operations so far and put that over the schema retrieved from the database.
    It's either that or put enough info about new tables and columns in the virtual schema manually to be able to perform backfills and duplications.

@andrew-farries
Copy link
Collaborator

andrew-farries commented Nov 6, 2024

A more incremental approach is in #449.

Going one operation at a time, ensuring it can be started and validated when run on objects created in previous operations.

@kvch
Copy link
Contributor Author

kvch commented Nov 6, 2024

Closing in favour of the incremental approached kicked off by @andrew-farries

@kvch kvch closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants