Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datahub-upgrade): Add support for Postgres migration #2740

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,15 @@ spec:
containers:
- name: datahub-upgrade-job
image: "{{ .Values.datahubUpgrade.image.repository }}:{{ .Values.datahubUpgrade.image.tag }}"
args: [ "-u", "NoCodeDataMigration", "-a", "batchSize=1000", "-a", "batchDelayMs=100" ]
args:
- "-u"
- "NoCodeDataMigration"
- "-a"
- "batchSize=1000"
- "-a"
- "batchDelayMs=100"
- "-a"
- "dbType={{ .Values.datahubUpgrade.noCodeDataMigration.sqlDbType }}"
env:
- name: DATAHUB_GMS_HOST
value: {{ printf "%s-%s" .Release.Name "datahub-gms" }}
Expand Down
2 changes: 2 additions & 0 deletions datahub-kubernetes/datahub/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ datahubUpgrade:
image:
repository: acryldata/datahub-upgrade
tag: "v0.8.3"
noCodeDataMigration:
sqlDbType: "MY_SQL"

global:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,16 @@
import io.ebean.EbeanServer;
import java.util.function.Function;

// Do we need SQL-tech specific migration paths?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I think we got our answer 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol. yes!

public class CreateAspectTableStep implements UpgradeStep {

private static final String DB_TYPE_ARG = "dbType";

enum DbType {
MY_SQL,
POSTGRES,
MARIA
}

private final EbeanServer _server;

public CreateAspectTableStep(final EbeanServer server) {
Expand All @@ -29,18 +36,45 @@ public int retryCount() {
@Override
public Function<UpgradeContext, UpgradeStepResult> executable() {
return (context) -> {

DbType targetDbType = context.parsedArgs().containsKey(DB_TYPE_ARG)
? DbType.valueOf(context.parsedArgs().get(DB_TYPE_ARG).get())
: DbType.MY_SQL;

String sqlUpdateStr;

switch (targetDbType) {
case POSTGRES:
sqlUpdateStr = "CREATE TABLE IF NOT EXISTS metadata_aspect_v2 (\n"
+ " urn varchar(500) not null,\n"
+ " aspect varchar(200) not null,\n"
+ " version bigint(20) not null,\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing, it seems that bigint(20) is not supported in Postgres. The equivalent is simply bigint, it seems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Just also verified. Thanks!

+ " metadata text not null,\n"
+ " systemmetadata text,\n"
+ " createdon timestamp not null,\n"
+ " createdby varchar(255) not null,\n"
+ " createdfor varchar(255),\n"
+ " constraint pk_metadata_aspect primary key (urn,aspect,version)\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ " constraint pk_metadata_aspect primary key (urn,aspect,version)\n"
+ " primary key (urn,aspect,version)\n"

+ ")";
break;
default:
// both mysql and maria
sqlUpdateStr = "CREATE TABLE IF NOT EXISTS metadata_aspect_v2 (\n"
+ " urn varchar(500) not null,\n"
+ " aspect varchar(200) not null,\n"
+ " version bigint(20) not null,\n"
+ " metadata longtext not null,\n"
+ " systemmetadata longtext,\n"
+ " createdon datetime(6) not null,\n"
+ " createdby varchar(255) not null,\n"
+ " createdfor varchar(255),\n"
+ " constraint pk_metadata_aspect primary key (urn,aspect,version)\n"
+ ")";
break;
}

try {
_server.execute(_server.createSqlUpdate("CREATE TABLE IF NOT EXISTS metadata_aspect_v2 (\n"
+ " urn varchar(500) not null,\n"
+ " aspect varchar(200) not null,\n"
+ " version bigint(20) not null,\n"
+ " metadata longtext not null,\n"
+ " systemmetadata longtext,\n"
+ " createdon datetime(6) not null,\n"
+ " createdby varchar(255) not null,\n"
+ " createdfor varchar(255),\n"
+ " constraint pk_metadata_aspect primary key (urn,aspect,version)\n"
+ ")"));
_server.execute(_server.createSqlUpdate(sqlUpdateStr));
} catch (Exception e) {
context.report().addLine(String.format("Failed to create table metadata_aspect_v2: %s", e.toString()));
return new DefaultUpgradeStepResult(
Expand Down
1 change: 1 addition & 0 deletions docker/datahub-upgrade/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ As of today, there are 2 supported upgrades:
to metadata_aspect_v2 table. Arguments:
- *batchSize* (Optional): The number of rows to migrate at a time. Defaults to 1000.
- *batchDelayMs* (Optional): The number of milliseconds of delay between migrated batches. Used for rate limiting. Defaults to 250.
- *dbType* (optional): The target DB type. Valid values are `MY_SQL`, `MARIA`, `POSTGRES`.
Copy link
Contributor

@frsann frsann Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's optional, which is the default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


2. **NoCodeDataMigrationCleanup**: Cleanses graph index, search index, and key-value store of legacy DataHub data (metadata_aspect table) once
the No Code Data Migration has completed successfully. No arguments.
Expand Down
32 changes: 22 additions & 10 deletions docs/advanced/no-code-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,22 +64,31 @@ cd docker/datahub-upgrade/nocode
./run_upgrade.sh
```

In both cases, the default environment variables will be used (`docker/datahub-upgrade/env/docker.env`). These assume
that your deployment is local. If this is not the case, you'll need to define your own environment variables to tell the
upgrade system where your DataHub containers reside.
Using this command, the default environment variables will be used (`docker/datahub-upgrade/env/docker.env`). These assume
that your deployment is local & that you are running MySQL. If this is not the case, you'll need to define your own environment variables to tell the
upgrade system where your DataHub containers reside and run

You can either
To update the default environment variables, you can either

1. Change `docker/datahub-upgrade/env/docker.env` in place and then run one of the above commands OR
2. Define a new ".env" file containing your variables and
execute `docker pull acryldata/datahub-upgrade && docker run acryldata/datahub-upgrade:latest -u NoCodeDataMigration`
2. Define a new ".env" file containing your variables and execute `docker pull acryldata/datahub-upgrade && docker run acryldata/datahub-upgrade:latest -u NoCodeDataMigration`

To see the required environment variables, see the [datahub-upgrade](../../docker/datahub-upgrade/README.md)
documentation.

To run the upgrade against a database other than MySQL, you can use the `-a dbType=<db-type>` argument.

Execute
```
./docker/datahub-upgrade.sh -u NoCodeDataMigration -a dbType=POSTGRES
```

where dbType can be either `MY_SQL`, `MARIA`, `POSTGRES`.

#### Docker Compose Deployments - Lose All Existing Data

This path is quickest but will wipe your Datahub's database.
This path is quickest but will wipe your DataHub's database.

If you want to make sure your current data is migrated, refer to the Docker Compose Deployments - Preserve Data section above.
If you are ok losing your data and re-ingesting, this approach is simplest.

Expand All @@ -98,7 +107,7 @@ git pull origin master
./docker/ingestion/ingestion.sh
```

After that, you will be upgraded and good to go.
After that, you will be ready to go.


##### How to fix the "listening to port 5005" issue
Expand Down Expand Up @@ -130,6 +139,9 @@ Once the storage layer has been migrated, subsequent runs of this job will be a

### Step 3 (Optional): Cleaning Up

Warning: This step clears all legacy metadata. If something is wrong with the upgraded metadata, there will no easy way to
re-run the migration.

This step involves removing data from previous versions of DataHub. This step should only be performed once you've
validated that your DataHub deployment is healthy after performing the upgrade. If you're able to search, browse, and
view your Metadata after the upgrade steps have been completed, you should be in good shape.
Expand All @@ -147,11 +159,11 @@ cd docker/datahub-upgrade/nocode
./run_clean.sh
```

In both cases, the default environment variables will be used (`docker/datahub-upgrade/env/docker.env`). These assume
Using this command, the default environment variables will be used (`docker/datahub-upgrade/env/docker.env`). These assume
that your deployment is local. If this is not the case, you'll need to define your own environment variables to tell the
upgrade system where your DataHub containers reside.

You can either
To update the default environment variables, you can either

1. Change `docker/datahub-upgrade/env/docker.env` in place and then run one of the above commands OR
2. Define a new ".env" file containing your variables and execute
Expand Down