CDP 7.x+ is running Hive 3. Platforms running Hive 1/2 (CDH 5/6 and HDP 2) need to upgrade/convert tables so they are compatible with Hive 3.
The most significant adjustment is the definition of a "Managed Table". In Hive 1/2, a managed table is defined as "Drop the table, drop the data." In this state, a managed table will maintain the tables state in the 'metastore' AND the 'filesystem'. In Hive 3, a "Managed Table" has been elevated and is a transactional table (ACID).
Review the table below for details on what 'should' be converted to what.
Legacy Table Type | New Table Type | CDH Recommendation | HDP Recommendation |
---|---|---|---|
External | External | External | External |
Managed (non-transaction) | External / Purge | N/A | External / Purge 1 |
Managed (Transactional) | Managed (Transactional) | N/A | Managed (Transactional) |
Do NOT migrate "Legacy Managed" tables to "Managed Transactional" tables. This is not meant to dissuade you away from using ACID/Transactional table in anyway. It has been our experience that ACID tables support certain access methods which isn't compatible
The Cloudera upgrade documentation for CDH clusters outlines a series of steps to reduce the downtime and hasten the 'Hive' upgrade process by skipping the detailed evaluation of 'every' table in the metastore. But, in contrast, the work STILL need to be done. The hive-sre
tool generates scripts (below) that make up for the things skipped.
If you follow the CDP docs to 'expedite' the hive upgrade process, these steps can be done 'post' upgrade BEFORE you release the cluster to the community. Process 1 (missing directories) can be done BEFORE the upgrade, and may even reduce the number of items captured in the other steps.
Run the u3
process against all your databases and act upon the results from process ID's:
- 1 - Missing Directories
- 3 - Managed Table Conversions
- 5 - Kudu, Serdes, Decimal scale/precision issues