Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL -> Postgres validation #813

Closed
10 tasks
kardan opened this issue Oct 17, 2014 · 21 comments
Closed
10 tasks

MySQL -> Postgres validation #813

kardan opened this issue Oct 17, 2014 · 21 comments
Assignees
Milestone

Comments

@kardan
Copy link
Contributor

kardan commented Oct 17, 2014

We need to be 100% sure that all data is transferred and also that things have not broken. Hence we need to create a one time test suit that validates that we are operational and can make the switch.

Part of the validation process have to be manual. We can't script a backup so we will have to have a checklist something like this:

Checklist (draft)

  • Make sure Postgres is installed with a database with correct user as owner & (template0)
  • Turn of traffic
  • Backup MySQL
  • Run script to collect validation data
  • Run mysql->pg script
  • Massage new db (remove trailing whitespace, unused tables such as workflow.*)?
  • Run validation script
  • Enable new pg database settings
  • Turn on traffic
  • Take care of mysql

Validation script

Needs to support both collecting data and validating against that data.

  • Make sure all rows are transferred (count all table rows?)
  • Validate the integrity (can one do md5s on all rows?)
  • Validate things like number of items and sums via the Django ORM
@kardan kardan self-assigned this Oct 17, 2014
@kardan kardan added this to the V3 - v43 milestone Oct 17, 2014
kardan added a commit that referenced this issue Oct 21, 2014
@carlio
Copy link
Contributor

carlio commented Oct 22, 2014

Can you also do this process with the testdata please? So that newly provisioned boxes have some data in them.

carlio added a commit that referenced this issue Oct 22, 2014
…box to use a new one which comes with postgres
@kardan
Copy link
Contributor Author

kardan commented Oct 22, 2014

Sure!

@kardan
Copy link
Contributor Author

kardan commented Oct 24, 2014

Good progress on this, need to add tests I had before I rewrote the script. But should not require much effort.

@kardan
Copy link
Contributor Author

kardan commented Oct 27, 2014

Running the script on a larger dataset I found I snag with location data. Projectlocation items starting with id 259 have a precision that is above the Postgres double precision type (15 ). Hence the long lat is concatenated in the postgres data. Since we don't want to loose data this is a problem.

BUT then the mysql original latitude data is: "5.19583258792768" - that is a very precise location.
If I understand https://en.wikipedia.org/wiki/Decimal_degrees correct 8 decimals would be a location almost within a mm. We have 14 decimals. This make me think that either I have something backwards or we have oddly precise locations stored.

@kardan
Copy link
Contributor Author

kardan commented Oct 27, 2014

A tour with vagrant:

We need to pull down a new base box with Postgres installed so, on the feature/813_postgres_v2 branch:

$ cd vagrant
$ vagrant destroy
$ git checkout -b feature/813_postgres_v2 origin/feature/813_postgres_v2
$ vagrant up
```shell
This will pull down the new base box with postgres installed. Once we have that let's setup postgres:
```shell
$ vagrant ssh
vagrant@rsr1:~$ sudo su postgres
postgres@rsr1:/home/vagrant$ psql
postgres=# CREATE USER rsr WITH PASSWORD 'password';
CREATE ROLE
postgres=# CREATE DATABASE rsr OWNER rsr TEMPLATE template0 ENCODING 'UTF8';
CREATE DATABASE
postgres=# \q
postgres@rsr1:/home/vagrant$ exit

Now we need to reconfigure our database settings, to use postgres as default and name the old mysql database to "mysql" which the validation script expects. Add the following to your 66_local.conf:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'rsr',
        'USER': 'rsr',
        'PASSWORD': 'password',
        'HOST': '192.168.50.101',
        'PORT': '5432',
    },

    'mysql': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'rsr',
        'USER': 'rsr',
        'PASSWORD': 'password',
        'HOST': 'mysql-localdev.localdev.akvo.org',
        'PORT': '',
    }
}

Let's start with deleting all old sessions by truncating the django_session table

$ mysql --user=rsr --password=password rsr
mysql> Truncate table django_session;
mysql> \q

Now it's time to setup the migration tool. The first time the script is executed a config file will be generated. So we will start by running the script, then edit the config and finally run the script with our settings.

vagrant@rsr1:~$ sudo su rsr
rsr@rsr1:/home/vagrant$ cd /var/akvo/rsr/
rsr@rsr1:~$ . venv/bin/activate
(venv)rsr@rsr1:~$ cd code

There is a config in the dev branch but maybe not on live so instructions are for non committed config file.

(venv)rsr@rsr1:~/code$ py-mysql2pgsql
No configuration file found.
A new file has been initialized at: mysql2pgsql.yml
Please review the configuration and retry...
(venv)rsr@rsr1:~/code$ vi mysql2pgsql.yml

Make sure the config file looks like this:

mysql:
 hostname: mysql-localdev.localdev.akvo.org
 port: 3306
 username: rsr
 password: password
 database: rsr
 compress: false
destination:
 # if file is given, output goes to file, else postgres
 file:
 postgres:
  hostname: 192.168.50.101
  port: 5432
  username: rsr
  password: password
  database: rsr

Now we can migrate and then run the validation script. Here we might want to introduce a massage script that clean up some data / removes tables - we need to review that. But having data with trailing whitespace is something which we might want to do. Anyway, let's migrate and validate:

(venv)rsr@rsr1:~/code$ py-mysql2pgsql
(venv)rsr@rsr1:~/code$ validate_postgres.py
....
--------------------------------------------------------
Ran n tests in n s
OK

Done 

(venv)rsr@rsr1:~/code$ exit
exit
vagrant@rsr1:~$ exit
$ 

Now we can remove the mysql database config entry. We still have the old mysql database installed so we should come up with a plan for taking care of that.

kardan added a commit that referenced this issue Oct 28, 2014
Script that inspects the databases and compare
via the Django ORM.
@kardan
Copy link
Contributor Author

kardan commented Oct 30, 2014

@zzgvh & @KasperBrandt I updated the comment above, would be glad for input / review. Actual validation is here: https://github.com/akvo/akvo-rsr/blob/feature/813_postgres/validate_postgres_data.py

@kardan
Copy link
Contributor Author

kardan commented Oct 30, 2014

Note: I gave up on using checksums on things like collection / column since that would have implied keeping things in a specific order.

@KasperBrandt KasperBrandt modified the milestones: V3 - v44 Alpha, V3 - v43 Nov 4, 2014
@KasperBrandt
Copy link
Contributor

My only comment is that when running (venv)rsr@rsr1:~/code$ py-mysql2pgsql for the first time, there was already a config file (which is similar to the one mentioned above). Other than that, RSR seems to run fine with postgres and all tests pass when these steps are followed.

I only tested it on the test database so far, will try importing the live DB later.

@kardan
Copy link
Contributor Author

kardan commented Nov 12, 2014

@zzgvh would be grateful for another go, now when the script is a bit less RAM consuming.

@kardan
Copy link
Contributor Author

kardan commented Nov 12, 2014

@carlio how are the VM's compared to test/staging/production in terms of memory?

@carlio
Copy link
Contributor

carlio commented Nov 12, 2014

Which VM? The Vagrant ones? I think they only have 384mb by default but you can tweak that in the Vagrantfile like this: https://github.com/akvo/akvo-provisioning/blob/develop/vagrant/boxes/puppet/Vagrantfile#L30

For test/UAT/live the machines have 4GB/2GB/4GB but test1 has many other services while uat1 is mostly empty.

@carlio
Copy link
Contributor

carlio commented Nov 12, 2014

Is is possible to run this offline separately? We could fire up a specialised VM only for this task with loads of RAM if you want.

@kardan
Copy link
Contributor Author

kardan commented Nov 12, 2014

@carlio mostly asking so we know what will happen when we get to test -> production phase

kardan added a commit that referenced this issue Nov 18, 2014
Reverted to not merge with future/rsr_v3 but base
on develop.
kardan added a commit that referenced this issue Nov 25, 2014
- added pg_alts.py a script that alters to the column
  types Django expects from Postgres.
- fixed flaky asserts by using other module load routine
- renamed validate_postgres.py to pg_tests.py
- created migration script
kardan added a commit that referenced this issue Nov 25, 2014
Since we switch user the password env variable was not carried
though.
kardan added a commit that referenced this issue Nov 25, 2014
@kardan
Copy link
Contributor Author

kardan commented Nov 25, 2014

New "help" at https://github.com/akvo/akvo-rsr/blob/feature/813_postgres_v2/pg_readme.md that includes a fix for the bug with wrong column types.

kardan added a commit that referenced this issue Nov 25, 2014
@adriancollier adriancollier modified the milestones: 2.6 Conakry, V3 - v49 Dec 15, 2014
KasperBrandt added a commit that referenced this issue Dec 15, 2014
@adriancollier
Copy link
Contributor

All work finalised - with this change, RSR should run on a Postgres database.

@KasperBrandt KasperBrandt modified the milestones: 2.6 Conakry, RSR v3 Jan 20, 2015
@MichaelAkvo MichaelAkvo moved this to Done in RSR Dec 8, 2022
@MichaelAkvo MichaelAkvo added this to RSR Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants