-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glue catalog to Hive Metastore Migration script not working with partition table #15
Comments
Thank you for identifying the issue. I repeated the problem. It looks like there is a bug here: https://github.com/awslabs/aws-glue-samples/blob/fd8cab884e6f636be37f677cbfd7db7f6e9fc6ab/utilities/Hive_metastore_migration/src/hive_metastore_migration.py#L809 It parses partition keys and values from Glue into Hive partition names. The generated partName is like "year(string),month(string)=2015,02" whereas Hive expects something like "year=2015/month=02". I found that "year(string),month(string)=2015,02" actually works on "DESCRIBE test" query on Hive 1.0.0, so the author might wrote code based on this version of Hive. But "year=2015/month=02" is the standard format, so I'll push a bug fix for it. |
I apologize that I can't provide an ETA yet. A thorough fix may take some time to be pushed to GitHub, but to unblock yourself immediately, you may paste a quick fix code snippet below to replace the function with problem. It should work most of the time.
|
Got the same problem, Thanks |
…aws-samples#15) Update chat-app to latest API version
I'm running the script to migrate Glue catalog data crawled from hive style (key=value) partition data from s3 and then migrating to hive metstore(MySQL). And the partition that is getting created in the hive metastore is incorrect.
[+] https://github.com/awslabs/aws-glue-samples/tree/master/utilities/Hive_metastore_migration
Note: Looks like Glue catalog data crawled from partitioned S3 is fine, as launching New EMR cluster with the Glue catalog is working fine and partition information is correct. Also, Athena using the Glue is able to find the partition of the table properly. But the script migrating table information from glue catalog to metastore is getting messed up, hence creating totally wrong partition information in hive metastore.
** Please find the steps carried out: **
================
3) Kindly note that the catalog-2-migration script (export_from_datacatalog.py) will not work with the following key constraint error:
"duplicate entry for key 'UNIQUE_DATABASE'
.....
java.sql.BatchUpdateException: Field 'IS_REWRITE_ENABLED' doesn't have a default value"
I found the column 'IS_REWRITE_ENABLED' is in table hive.TBLS. A strange thing I found is this column can be NULL in table definition. However, the Spark job complains about the default value. So I manually login to my Hive metastore and updated the default value:
ALTER TABLE hive.TBLS ALTER IS_REWRITE_ENABLED SET DEFAULT 1;
After this small change, the Glue ETL job completed successfully. But the partition generated by the script is totally incorrect.
Partition messed up
Although the table description is same-
So its totally an issue with the migration script and I'm stuck with our migration process
So kindly look into the issue on an urgent basis and fix the script or provide me a workaround or solution.
The text was updated successfully, but these errors were encountered: