Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only select existing columns in instance database migration and make form and instance migrations follow recommended patterns #3250

Merged
merged 6 commits into from
Aug 1, 2019

Conversation

lognaturel
Copy link
Member

@lognaturel lognaturel commented Jul 26, 2019

Closes #3224

Expected behavior:

  • v1.22 to 1.23: instances are correctly copied over with full status information (same as master)
  • v1.23 to 1.22: instances read from disk again because the downgrade fails (same as master)
  • downgraded v1.22 to v1.23: instances read from disk again because instance info was "stuck" in tmp table (on master, saving instances fails instead)

Note: I considered copying records from the tmp table in the v1.22 -> v1.23 -> v1.22 case but I think that's going to be rare enough that it's not a big deal. It's a little annoying to use status information but not terrible.

What has been done to verify that this works as intended?

Deleted the odk folder, installed v1.22, filled some forms and saved some as finalized and some as not. Upgraded to v1.23, verified that the column removal was correctly performed and that form instance status is the same in the UI.

Downgraded back to v1.22. Verified that the downgrade failed by looking at the db:

  • adb shell
  • cd /sdcard/odk/metadata
  • sqlite3 instances.db
  • pragma user_version (is 4, should be 5 because the downgrade failed)
  • .dump (includes a _tmp table but it shouldn't)

Upgraded back to 1.23, verified that all the necessary cleanup was done with the same steps as above.

Why is this the best possible solution? Were any other approaches considered?

This makes a number of changes to bring migrations in line with how they are intended to work:

  • b0c4a65 fixes the short-term problem by making sure that when copying values from a table, column names that don't exist are never selected. This is the simplest possible approach I could think of.

  • e9988b8 run onUpgrade and onDowngrade in transactions. I believe this was suppressed to avoid the user seeing a nasty SQL error in the case of a migration failure. The side effect is that the user thinks the migration has worked but has a database that's in a corrupt state. An example of that is the downgrade to v1.22. Now, if there is a problem in the migration, the user will know about it on launch and Collect will be unusable until they take action. This more catastrophic failure will hopefully mean we actually know about real migration problems and can fix them rather than users being in a slightly broken state. I considered making this change later but we know it's an issue now and addressing it here means that downgrading to v1.23 will be done transactionally. This will help us catch and address problems like this one more easily.

  • c5773e9 switches the order that the copy and rename in order to drop columns is done to match what is described in https://sqlite.org/lang_altertable.html. I think that with our simple schema this doesn't matter for now but since there's a documented recommendation with a warning that the other way could fail, we should follow the documented process.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

The risk is that the downgrade behavior to v1.23 could be worse in some way. I do think it is more likely to result in scary error messages for users but that will be good in the long term because we will be able to troubleshoot and fix the failures rather than having odd behavior later such as not being able to save instances.

Do we need any specific form for testing your changes? If so, please attach one.

No.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No.

Before submitting this PR, please make sure you have:

  • run ./gradlew checkAll and confirmed all checks still pass OR confirm CircleCI build passes and run ./gradlew connectedDebugAndroidTest locally.
  • verified that any code or assets from external sources are properly credited in comments and/or in the about file.
  • verified that any new UI elements use theme colors. UI Components Style guidelines

@lognaturel
Copy link
Member Author

@grzesiek2010 @seadowg Would be great to get a review from one or both of you.

@lognaturel lognaturel added this to the v1.23 milestone Jul 26, 2019
@lognaturel lognaturel changed the title Issue 3224 Only select existing columns in instance database migration, perform form and instance database migrations in transactions and delete instances tmp table on upgrade Jul 26, 2019
@lognaturel lognaturel changed the title Only select existing columns in instance database migration, perform form and instance database migrations in transactions and delete instances tmp table on upgrade Only select existing columns in instance database migration and make form and instance migrations follow recommended patterns Jul 26, 2019
Both methods confirm that the provided URI is for the database they're concerned about. Show the database name in a user-friendly way instead of the URI since we know exactly what database we're dealing with.
onUpgrade and onDowngrade are executed within a database transaction. That means that if an exception occurs, the database does not get an updated version number and the version change can try again next time. By catching exceptions, we were allowing the database to get into a corrupt state.
Copy link
Member

@seadowg seadowg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it deals with the problem to me. I think it would be really helpful to have a test that drives out the change though (like onDowngrade_handlesColumnsDeletedInLaterVersions). Firstly I'd hope that could make it clearer what the before and after states are for the database and also hopefully catch problems with our downgrades in the future. It seems like it'd be pretty easy to introduce subtle problems in onUpgrade and onDowngrade that won't get caught until reading from the database later in the app.

@lognaturel
Copy link
Member Author

🙈 Yes, of course, sorry.

@lognaturel
Copy link
Member Author

lognaturel commented Jul 30, 2019

Here is my attempt at an Espresso test against the real database: 3709176. I can't get it to work and I'm out of ideas. It does pass in its current form but the last part of the migration is commented out and the assertion is over the tmp table rather than the actual table. It almost looks like the table rename is failing in some way. I've tried the test files in the real app and they both get migrated as expected. I've tried progressively stranger things like adding an additional transaction around just those two statements or adding a sleep in the test before getting the column names.

@seadowg or @grzesiek2010, if one of you has a chance to look at it during your day, please do. Feel free to push in this branch if you get it working. I'm hoping that somehow I've overlooked something that will be obvious to one of you. If you'd approach the tests in a totally different way, that's fine by me too.

Alternately, consider going straight to QA.

(Ugh, I just realized that moving the save and restore calls out of Before/After into the test itself makes no sense.)

@seadowg
Copy link
Member

seadowg commented Jul 30, 2019

@lognaturel hmmmm. I'll have a play with that test this afternoon.

@seadowg
Copy link
Member

seadowg commented Jul 30, 2019

@lognaturel From experimenting with this it seems to be that the test runs into problems because the SQLiteDatabase object you get back from getReadableDatabase() is somehow out of date. If you add this line after you've initialized the InstanceDatabaseHelper the test passes:

databaseHelper.getReadableDatabase().close();

Why? I'm not 100% sure. From glancing at the source code for SQLiteOpenHelper (onDowngrade is called here) it does look like there is a possibility of getting something stale back from getReadableDatabase() as it doesn't create a new connection after running onUpgrade/onDowngrade - but it does manually set the database version which fooled your assertion around that. As far as I can see we can insert that line (probably with a comment and wrapped in a method) and try not to lose any sleep over it although it is very tempting to dive deeper at some point :neckbeard:.

On a more opinionated note, I'd personally opt to not have these tests be parametrized. I think it's a really nice approach when you have simple inputs/outputs to test. However, in tests like this where we're setting up a "situation" of sorts I find it adds a bit of machinery that can make the test feel further away from documentation of what is supposed to be happening. For example, the assertions are less specific to to the functionality you're testing. I'd opt for something like this:

@Test
public void whenDowngradingFromVersionWithExtraColumn_dropsThatColumn() throws Exception {
    writeDatabaseFile("database" + File.separator + "instances_v7000_added_fakeColumn.db");
    InstancesDatabaseHelper databaseHelper = new InstancesDatabaseHelper();
    databaseHelper.getReadableDatabase().close();

    SQLiteDatabase db = databaseHelper.getReadableDatabase();
    assertThat(db.getVersion(), is(InstancesDatabaseHelper.DATABASE_VERSION));

    List<String> newColumnNames = InstancesDatabaseHelper.getInstancesColumnNames(db);
    assertThat(newColumnNames, not(hasItem("fakeColumn")));
}

To me it feels like we get a test that's code and assertions (and their failure messages) are more specific. What do you think?

@lognaturel
Copy link
Member Author

lognaturel commented Jul 30, 2019

it does look like there is a possibility of getting something stale back from getReadableDatabase() as it doesn't create a new connection after running onUpgrade/onDowngrade

But... but... onDowngrade and onUpgrade are wrapped in transactions! The fact that the assertions over the tmp table passed means that the tmp table was properly being populated. Isn't a transaction supposed to be... transactional?!

try not to lose any sleep over it although it is very tempting to dive deeper at some point :neckbeard:.

Arrrrrg. Ok, I'll try. Thanks so much for this new idea, as much as I still don't understand it.

it adds a bit of machinery that can make the test feel further away from documentation of what is supposed to be happening.

I don't disagree with you. I did it for two reasons: to ensure that the two cases I was working with were identical as I was experimenting and to set us up for a really low-friction way of testing all future migrations. I added upgrade from v4 to illustrate what I mean. If we were to do them as separate cases we'd end up having to copy, paste and tweak for each new database dump, right?

I typically do prefer really targeted assertions. In this case, we get errors that look like:

  • extra column - Expected: iterable containing ["_id", "displayName", "submissionUri", "canEditWhenComplete", "instanceFilePath", "jrFormId", "jrVersion", "status", "date", "deletedDate"] but: Not matched: "fakeColumn"
  • missing column - Expected: iterable containing ["_id", "displayName", "submissionUri", "canEditWhenComplete", "instanceFilePath", "jrFormId", "jrVersion", "status", "date", "deletedDate"] but: item 6: was "status"

I think these are pretty clear. Or do you still think it'd be better to split them out?

@seadowg
Copy link
Member

seadowg commented Jul 30, 2019

@lognaturel Yeha I'm very confused. My only worry is that it's a problem with our onDowngrade code in some subtle way... should we make an issue/task to have a fresh look into it (using a template app maybe) and double check it's definitely an Android problem? At least then we can file an issue with them.

I don't disagree with you. I did it for two reasons: to ensure that the two cases I was working with were identical as I was experimenting and to set us up for a really low-friction way of testing all future migrations.

Ah yeah I see what you're going for. I'm not a huge fan of the JUnit Parameterized stuff (this is probably my "bad attitude" though). I'm happy with how it is now and we can see how it evolves as we add more migration tests!

@lognaturel lognaturel dismissed seadowg’s stale review July 30, 2019 21:46

@seadowg says: "I'm happy with how it is now"

@lognaturel
Copy link
Member Author

lognaturel commented Jul 31, 2019

My only worry is that it's a problem with our onDowngrade code in some subtle way

Absolutely, same here. Let's see whether perhaps QA finds an issue. I've also posted on Stackoverflow to see whether someone else has an explanation. Depending on what we find out, I'll file the issue.

not a huge fan of the JUnit Parameterized stuff

I don't love the ✨magic ✨involved, mostly from the dependency injection. It overall feels a bit inelegant to me (e.g. parameterized annotations). What are your complaints?

@seadowg
Copy link
Member

seadowg commented Jul 31, 2019

Ah nice move on the SO. I'll make sure to upvote.

I don't love the ✨magic ✨involved, mostly from the dependency injection. It overall feels a bit inelegant to me (e.g. parameterized annotations). What are your complaints?

Same here: it just always feels clunky. I also don't like that it ends up meaning your parameterized test takes up a whole test class. In this case that's actually ok I suppose as upgrade/downgrade/creation is the only thing the class should be responsible for really.

@mmarciniak90
Copy link
Contributor

Downgrade apk is not allowed but I was able to reproduce the problem when I uninstalled apk.

Repro steps:

  1. v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  2. The version without fix was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  3. v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  4. The version without fix was installed, one form was finished -> Error was visible

I also verified the case with beta 0:

  1. v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  2. v1.23.beta0 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  3. v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  4. The version with fix was installed, one form was finished -> Form was finished with success

@lognaturel I noticed one more case where the user is still blocked. What do you think about it? Do you think that it is unlikely? I can create a separate issue for it.

Test Case where the error occurred and apk with fix is installed:

  1. v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  2. The version built on the master was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  3. v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
  4. The version built on the master was installed, one form was finished -> Error was visible. I did NOT uninstall Collect
  5. Collect with the fix was installed. Problem was still visible but error message changed
    Screenshot_20190731-121653

@lognaturel
Copy link
Member Author

Thanks, @mmarciniak90! I changed the message in 96f4e69 but looks like I forgot to describe that in the PR description, sorry. Basically, we know exactly what database we're failing to insert into so we might as well use a user-friendly description rather than the confusing URI.

You said:

v1.22.4 was installed, one form was finished -> Collect was uninstalled, /odk directory stayed on a device
The version built on the master was installed, one form was finished -> Error was visible. I did NOT uninstall Collect

This makes sense. When you went to 1.22, the database was downgraded to v4 with the down migration that left the tmp table. Then, when you installed from master, the database was upgraded to v5 but with no cleanup of the tmp table since that is added by this PR. Then when you installed the version from this PR, the database wasn't touched because the version is already v5.

We could change the database version to v6 now to ensure that anyone who had upgraded, downgraded, then upgraded again would be ok. I suppose it's possible some beta users could hit this but I would tend not to worry about it. I'll let you and @grzesiek2010 discuss whether you want to do that or whether you're ok releasing this as-is. If you want to add a v6, I believe the only changes needed would be to change the version to 6 and to add an onUpgrade case that exactly matches the v5 one.

@grzesiek2010
Copy link
Member

I agree the risk is not hight here and I would merge it. @mmarciniak90 le us know if everything else is fine.

@mmarciniak90
Copy link
Contributor

@grzesiek2010 yes, everything else is ok!

@opendatakit-bot unlabel "needs testing"
@opendatakit-bot label "behavior verified"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected error while saving form. Failed to insert row.
5 participants