You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the source table is well compressed and the files are large, a map only job is ok to help with the transfer. In this case, no files are combined, so the mappers will only write to the partition from which they got the source file.
Also consider setting: hive.exec.orc.split.strategy=BI to help with file organization.
The text was updated successfully, but these errors were encountered:
When the source table is organized AND the table has large files, the DISTRIBUTE BY and DYNAMIC SORTING options, which introduce a REDUCE phase will get choked up on the writes. A partition will only be written to by a single file. When we really want multiple writers.
Added property to CLI: -so|--skip-optimizations. This will set the property hive.optimize.sort.dynamic.partition=false and NOT add DISTRIBUTE BY to the SQL statements.
When the source table is well compressed and the files are large, a map only job is ok to help with the transfer. In this case, no files are combined, so the mappers will only write to the partition from which they got the source file.
Also consider setting: hive.exec.orc.split.strategy=BI to help with file organization.
The text was updated successfully, but these errors were encountered: