Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable precision timestamp support for Hive write operations #5283

Merged
merged 6 commits into from
Nov 3, 2020

Conversation

aalbu
Copy link
Member

@aalbu aalbu commented Sep 24, 2020

Write path for timestamps now honors the precision specified in the Hive config (or session). Stats collection is only supported for millisecond precision; when a higher precision is specified, timestamp columns will be ignored for the purpose of collecting stats. The limitation will be addressed when #5170 is implemented.

@cla-bot cla-bot bot added the cla-signed label Sep 24, 2020
@aalbu aalbu added the WIP label Sep 24, 2020
@aalbu aalbu force-pushed the hive-nano-write branch 2 times, most recently from 403b98a to a05eb58 Compare September 24, 2020 16:47
@aalbu aalbu removed the WIP label Sep 25, 2020
@aalbu aalbu requested a review from findepi September 25, 2020 01:38
@findepi
Copy link
Member

findepi commented Sep 25, 2020

The data gets written with abundant precision

presto:tpch> set session hive.timestamp_precision = 'MILLISECONDS';
SET SESSION
presto:tpch> create table t(a) as SELECT localtimestamp(9);
CREATE TABLE: 1 row

presto:tpch> DESC t;
 Column |     Type     | Extra | Comment
--------+--------------+-------+---------
 a      | timestamp(3) |       |

but then:

presto:tpch> set session hive.timestamp_precision = 'NANOSECONDS';
SET SESSION
presto:tpch> SELECT * FROM t;
               a
-------------------------------
 2020-09-25 16:04:25.672304000

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial feedback

@findepi
Copy link
Member

findepi commented Sep 28, 2020

@dain a question to you -- a05fb8b#r495844291

@electrum a question to you -- a05fb8b#r495843768

(posting here, as this could have been lost in the noise)

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments (mostly minor)

@aalbu aalbu force-pushed the hive-nano-write branch 3 times, most recently from 8b3b684 to 2b157a5 Compare October 1, 2020 16:19
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%

@@ -522,7 +523,7 @@ else if (insertExistingPartitionsBehavior == InsertExistingPartitionsBehavior.ER
}

List<Type> types = dataColumns.stream()
.map(column -> column.getHiveType().getType(typeManager))
.map(column -> column.getHiveType().getType(typeManager, getTimestampPrecision(session).getPrecision()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getType(TypeManager) overload is still used; for example io.prestosql.plugin.hive.HiveMetadata#finishCreateTable, io.prestosql.plugin.hive.HiveMetadata#finishInsert call sites look relevant

please make sure there is an issue covering cleanup of the other ones

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. @dain should look at the potential RCBinary writer regression

@aalbu aalbu force-pushed the hive-nano-write branch 2 times, most recently from b2158ed to b483624 Compare November 3, 2020 00:50
@electrum electrum merged commit 7cf9f39 into trinodb:master Nov 3, 2020
@electrum
Copy link
Member

electrum commented Nov 3, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants