Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback operation on Start failure #73

Merged
merged 1 commit into from
Sep 6, 2023

Conversation

andrew-farries
Copy link
Collaborator

Some migration operations create new objects in the database; new columns, triggers and functions. When such a migration fails to start (eg a change column type operation where some existing rows in the database can't be converted to the new type), all database objects created by Start should be cleaned up.

This PR calls Rollback on Start errors to ensure that this cleanup happens.

Copy link
Member

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should roll back all operations, WDYT?

I all cases I believe we should make it clear in the logs that we failed to proceed and rolled back afterwards

andrew-farries added a commit that referenced this pull request Sep 1, 2023
Implement the **change column type** operation. A change type migration
looks like this:

```json
{
  "name": "18_change_column_type",
  "operations": [
    {
      "change_type": {
        "table": "reviews",
        "column": "rating",
        "type": "integer",
        "up": "CAST(rating AS integer)",
        "down": "CAST(rating AS text)"
      }
    }
  ]
}
```
This migration changes the type of the `rating` column from `TEXT` to
`INTEGER`.

The implementation is very similar to the **set NOT NULL** operation
(#63):

* On `Start`:
  *  Create a new column having the new type
* Backfill the new column with values from the existing column,
converting the types using the `up` SQL.
* Create a trigger to populate the new column when values are written to
the old column, converting types with `up`.
* Create a trigger to populate the old column when values are written to
the new column, converting types with `down`.
* On `Complete`
  * Remove triggers
  * Drop the old column
  * Rename the new column to the old column name.
* On `Rollback`
  * Remove the new column and both triggers.

The migration can fail in at least 2 ways:
* The initial backfill of existing rows on `Start` fails due to the type
conversion not being possible on one or more rows. In the above example,
any existing rows with `rating` values not representable as an `INTEGER`
will cause a failure on `Start`.
* In this case, the failure is reported and the migration rolled back
(#73)
* During the rollout period, unconvertible values are written to the old
version schema. The `up` trigger will fail to convert the values and the
`INSERT`/`UPDATE` will fail.
* Some form of data quarantine needs to be implemented here, copying the
invalid rows elsewhere and blocking completion of the migration until
those rows are handled in some way).

The PR also adds example migrations to the `/examples` directory.
@andrew-farries andrew-farries force-pushed the rollback-on-start-failure branch from 3d3b69c to 2785290 Compare September 1, 2023 12:43
@andrew-farries
Copy link
Collaborator Author

I wonder if we should roll back all operations, WDYT?

I all cases I believe we should make it clear in the logs that we failed to proceed and rolled back afterwards

You mean there are some operations for which you think we shouldn't roll back? This PR adds rollbacks for all Starts, for all operations.

Or do you mean that we should also rollback on failed Completes too?

@exekias
Copy link
Member

exekias commented Sep 5, 2023

You mean there are some operations for which you think we shouldn't roll back? This PR adds rollbacks for all Starts, for all operations.

ah, I didn't realize we rollback the whole migration, all good!

Still, I think the way we handle reporting to the user can be improved. We should clearly state that there was an error with start first, then that we are rolling back. In case of error during roll back show the error too.

It feels like rollback should happen from the cmd package (startCmd), instead of here

@andrew-farries
Copy link
Collaborator Author

Still, I think the way we handle reporting to the user can be improved. We should clearly state that there was an error with start first, then that we are rolling back. In case of error during roll back show the error too.

The current behavior is almost as you describe. In the case of a start + rollback failure the error output is:

Error: unable to execute start operation: failed to backfill column: pq: invalid input syntax for type integer: "not a number"
unable to execute rollback operation: pq: syntax error at or near "ALTERx"

In the case of just a start error (and a successful rollback):

Error: unable to execute start operation: failed to backfill column: pq: invalid input syntax for type integer: "not a number"

We could also add a message in this case (eg 'migration successfully rolled back'), but IMO it's not worth it; users will assume that a failed start will not leave anything behind in the database.

It feels like rollback should happen from the cmd package (startCmd), instead of here

I think the behaviour belongs in the package rather than the CLI as this is functionality that all clients will need.

@andrew-farries andrew-farries merged commit 41c9fff into main Sep 6, 2023
@andrew-farries andrew-farries deleted the rollback-on-start-failure branch September 6, 2023 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants