-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL -> Postgres validation #813
Comments
Can you also do this process with the testdata please? So that newly provisioned boxes have some data in them. |
…box to use a new one which comes with postgres
Sure! |
Good progress on this, need to add tests I had before I rewrote the script. But should not require much effort. |
Running the script on a larger dataset I found I snag with location data. Projectlocation items starting with id 259 have a precision that is above the Postgres double precision type (15 ). Hence the long lat is concatenated in the postgres data. Since we don't want to loose data this is a problem. BUT then the mysql original latitude data is: "5.19583258792768" - that is a very precise location. |
A tour with vagrant:We need to pull down a new base box with Postgres installed so, on the feature/813_postgres_v2 branch: $ cd vagrant
$ vagrant destroy
$ git checkout -b feature/813_postgres_v2 origin/feature/813_postgres_v2
$ vagrant up
```shell
This will pull down the new base box with postgres installed. Once we have that let's setup postgres:
```shell
$ vagrant ssh
vagrant@rsr1:~$ sudo su postgres
postgres@rsr1:/home/vagrant$ psql
postgres=# CREATE USER rsr WITH PASSWORD 'password';
CREATE ROLE
postgres=# CREATE DATABASE rsr OWNER rsr TEMPLATE template0 ENCODING 'UTF8';
CREATE DATABASE
postgres=# \q
postgres@rsr1:/home/vagrant$ exit Now we need to reconfigure our database settings, to use postgres as default and name the old mysql database to "mysql" which the validation script expects. Add the following to your 66_local.conf: DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'rsr',
'USER': 'rsr',
'PASSWORD': 'password',
'HOST': '192.168.50.101',
'PORT': '5432',
},
'mysql': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'rsr',
'USER': 'rsr',
'PASSWORD': 'password',
'HOST': 'mysql-localdev.localdev.akvo.org',
'PORT': '',
}
} Let's start with deleting all old sessions by truncating the django_session table $ mysql --user=rsr --password=password rsr
mysql> Truncate table django_session;
mysql> \q Now it's time to setup the migration tool. The first time the script is executed a config file will be generated. So we will start by running the script, then edit the config and finally run the script with our settings. vagrant@rsr1:~$ sudo su rsr
rsr@rsr1:/home/vagrant$ cd /var/akvo/rsr/
rsr@rsr1:~$ . venv/bin/activate
(venv)rsr@rsr1:~$ cd code There is a config in the dev branch but maybe not on live so instructions are for non committed config file. (venv)rsr@rsr1:~/code$ py-mysql2pgsql
No configuration file found.
A new file has been initialized at: mysql2pgsql.yml
Please review the configuration and retry...
(venv)rsr@rsr1:~/code$ vi mysql2pgsql.yml Make sure the config file looks like this: mysql:
hostname: mysql-localdev.localdev.akvo.org
port: 3306
username: rsr
password: password
database: rsr
compress: false
destination:
# if file is given, output goes to file, else postgres
file:
postgres:
hostname: 192.168.50.101
port: 5432
username: rsr
password: password
database: rsr Now we can migrate and then run the validation script. Here we might want to introduce a massage script that clean up some data / removes tables - we need to review that. But having data with trailing whitespace is something which we might want to do. Anyway, let's migrate and validate: (venv)rsr@rsr1:~/code$ py-mysql2pgsql
(venv)rsr@rsr1:~/code$ validate_postgres.py
....
--------------------------------------------------------
Ran n tests in n s
OK
Done
(venv)rsr@rsr1:~/code$ exit
exit
vagrant@rsr1:~$ exit
$ Now we can remove the mysql database config entry. We still have the old mysql database installed so we should come up with a plan for taking care of that. |
Script that inspects the databases and compare via the Django ORM.
@zzgvh & @KasperBrandt I updated the comment above, would be glad for input / review. Actual validation is here: https://github.com/akvo/akvo-rsr/blob/feature/813_postgres/validate_postgres_data.py |
Note: I gave up on using checksums on things like collection / column since that would have implied keeping things in a specific order. |
My only comment is that when running I only tested it on the test database so far, will try importing the live DB later. |
@zzgvh would be grateful for another go, now when the script is a bit less RAM consuming. |
@carlio how are the VM's compared to test/staging/production in terms of memory? |
Which VM? The Vagrant ones? I think they only have 384mb by default but you can tweak that in the Vagrantfile like this: https://github.com/akvo/akvo-provisioning/blob/develop/vagrant/boxes/puppet/Vagrantfile#L30 For test/UAT/live the machines have 4GB/2GB/4GB but test1 has many other services while uat1 is mostly empty. |
Is is possible to run this offline separately? We could fire up a specialised VM only for this task with loads of RAM if you want. |
@carlio mostly asking so we know what will happen when we get to test -> production phase |
Reverted to not merge with future/rsr_v3 but base on develop.
- added pg_alts.py a script that alters to the column types Django expects from Postgres. - fixed flaky asserts by using other module load routine - renamed validate_postgres.py to pg_tests.py - created migration script
Since we switch user the password env variable was not carried though.
New "help" at https://github.com/akvo/akvo-rsr/blob/feature/813_postgres_v2/pg_readme.md that includes a fix for the bug with wrong column types. |
All work finalised - with this change, RSR should run on a Postgres database. |
We need to be 100% sure that all data is transferred and also that things have not broken. Hence we need to create a one time test suit that validates that we are operational and can make the switch.
Part of the validation process have to be manual. We can't script a backup so we will have to have a checklist something like this:
Checklist (draft)
Validation script
Needs to support both collecting data and validating against that data.
The text was updated successfully, but these errors were encountered: