Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide some flexibility on partition writes during migrations #53

Closed
dstreev opened this issue May 18, 2023 · 0 comments
Closed

Provide some flexibility on partition writes during migrations #53

dstreev opened this issue May 18, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@dstreev
Copy link
Collaborator

dstreev commented May 18, 2023

We have options to use or not use dynamic optimizations or specify prescriptive optimizations with DISTRIBUTE BY.

When partitions are large, the writes will only do 1 writer per partition. Which can take a long time, depending on the size.

When partitions are large, we need to be able to add an extra level in the 'DISTRIBUTE BY' to add more writers.

@dstreev dstreev added the enhancement New feature or request label May 18, 2023
@dstreev dstreev self-assigned this May 18, 2023
dstreev added a commit to dstreev/hms-mirror that referenced this issue Jun 1, 2023
…es for large tables. We'll make adjustments to DISTRIBUTE BY and Tez Groupings to provide more efficient/balanced migrations with better/more optimized file sizes after migration for migrations using SQL. cloudera-labs/hms-mirror#53

- AAdditional table filters (`-tfs|--table-filter-size-limit` and `-tfp|--table-filter-partition-count-limit`) that check a tables data size and partition count limits can also be applied to narrow the range of tables you'll process.  cloudera-labs/hms-mirror#55
- Add property to tables migrated with "STORAGE_MIGRATION" to identify and filter them out from future runs. cloudera-labs/hms-mirror#56
- `-cto|--compress-text-output` option and additional session level settings using basic stats.
- HDP3 scenario that doesn't support MANAGEDLOCATION element in database properties. cloudera-labs/hms-mirror#52

- AVRO Schema Only Fix..  cloudera-labs/hms-mirror#58
- Cleanup messaging around legacy config settings.
- Fix/Added `dbRegEx` command line parameter: cloudera-labs/hms-mirror#57

Configuration Breaking Change.  If you see note about `A configuration element is no longer valid, progress.  Please remove the element from the configuration yaml and try again.` with `Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "tblRegEx"`, please remove the properties `dbRegEx`, `tblRegEx` and `tblExcludeRegEx` from the config yaml.
@dstreev dstreev added this to the 1.5.6.0 milestone Jun 1, 2023
@dstreev dstreev closed this as completed Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant