Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while parsing ColumnStatistics of type timestamp #20929

Closed
aaneja opened this issue Sep 21, 2023 · 3 comments
Closed

Error while parsing ColumnStatistics of type timestamp #20929

aaneja opened this issue Sep 21, 2023 · 3 comments
Assignees
Labels
bug iceberg Apache Iceberg related

Comments

@aaneja
Copy link
Contributor

aaneja commented Sep 21, 2023

The SPI allows ColumnStatistics of any type to be retrieved during SHOW STATS queries
This does not work for timestamp types since toStringLiteral and we throw an IllegalArgumentException

private static Expression toStringLiteral(Type type, double value)
{
if (type.equals(BigintType.BIGINT) || type.equals(IntegerType.INTEGER) || type.equals(SmallintType.SMALLINT) || type.equals(TinyintType.TINYINT)) {
return new StringLiteral(Long.toString(round(value)));
}
if (type.equals(DoubleType.DOUBLE) || type instanceof DecimalType) {
return new StringLiteral(Double.toString(value));
}
if (type.equals(RealType.REAL)) {
return new StringLiteral(Float.toString((float) value));
}
if (type.equals(DATE)) {
return new StringLiteral(LocalDate.ofEpochDay(round(value)).toString());
}
throw new IllegalArgumentException("Unexpected type: " + type);
}

Your Environment

An Iceberg table with a timestamp colum and table statistics up-to-date

 CREATE TABLE timestamp_test ( 
    "t" timestamp                               
 )                                              
 WITH (                                         
    format = 'PARQUET'                          
 )  

INSERT INTO timestamp_test values(TIMESTAMP '2001-08-22 03:04:05.321')

SHOW STATS FOR timestamp_test

Expected Behavior

SHOW STATS FOR <table>
should show min/max values for a timestamp column

Current Behavior

The query fails with below error (copied from the query info JSON) :

"failureInfo" : {
    "type" : "java.lang.IllegalArgumentException",
    "message" : "Unexpected type: timestamp",
    "suppressed" : [ ],
    "stack" : [ "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.toStringLiteral(ShowStatsRewrite.java:358)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.lambda$toStringLiteral$2(ShowStatsRewrite.java:341)", "java.base/java.util.Optional.map(Optional.java:265)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.toStringLiteral(ShowStatsRewrite.java:341)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.createColumnStatsRow(ShowStatsRewrite.java:300)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.buildStatisticsRows(ShowStatsRewrite.java:281)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.rewriteShowStats(ShowStatsRewrite.java:206)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.visitShowStats(ShowStatsRewrite.java:146)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite$Visitor.visitShowStats(ShowStatsRewrite.java:112)", "com.facebook.presto.sql.tree.ShowStats.accept(ShowStats.java:45)", "com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)", "com.facebook.presto.sql.rewrite.ShowStatsRewrite.rewrite(ShowStatsRewrite.java:109)", "com.facebook.presto.sql.rewrite.StatementRewrite.rewrite(StatementRewrite.java:58)", "com.facebook.presto.sql.analyzer.Analyzer.analyzeSemantic(Analyzer.java:90)", "com.facebook.presto.execution.SqlQueryExecution.\u003Cinit\u003E(SqlQueryExecution.java:206)", "com.facebook.presto.execution.SqlQueryExecution.\u003Cinit\u003E(SqlQueryExecution.java:103)", "com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:934)", "com.facebook.presto.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:168)", "com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)", "com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)", "com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)", "java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)", "java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)", "java.base/java.lang.Thread.run(Thread.java:829)" ],
    "errorCode" : {
      "code" : 65536,
      "name" : "GENERIC_INTERNAL_ERROR",
      "type" : "INTERNAL_ERROR",
      "retriable" : false
    },
    "errorCause" : "UNKNOWN"
  }

Possible Solution

Fix the toStringLiteral method

@aaneja
Copy link
Contributor Author

aaneja commented Sep 21, 2023

We don't see this bug for Hive tables because we don't set the range unless the column type is one of the below ones

private static boolean isRangeSupported(Type type)
{
return type.equals(TINYINT)
|| type.equals(SMALLINT)
|| type.equals(INTEGER)
|| type.equals(BIGINT)
|| type.equals(REAL)
|| type.equals(DOUBLE)
|| type.equals(DATE)
|| type instanceof DecimalType;
}

Note that TIMESTAMP is missing (and has a base of AbstractLongType)

@aaneja aaneja added the iceberg Apache Iceberg related label Sep 21, 2023
@github-project-automation github-project-automation bot moved this to 🆕 Unprioritized in Iceberg Support Sep 21, 2023
@gupteaj gupteaj self-assigned this Oct 26, 2023
@gupteaj gupteaj moved this from 🆕 Unprioritized to 🏗 In progress in Iceberg Support Oct 27, 2023
@gupteaj
Copy link
Contributor

gupteaj commented Oct 31, 2023

PR #21286

@gupteaj
Copy link
Contributor

gupteaj commented Oct 31, 2023

We don't see this bug for Hive tables because we don't set the range unless the column type is one of the below ones

private static boolean isRangeSupported(Type type)
{
return type.equals(TINYINT)
|| type.equals(SMALLINT)
|| type.equals(INTEGER)
|| type.equals(BIGINT)
|| type.equals(REAL)
|| type.equals(DOUBLE)
|| type.equals(DATE)
|| type instanceof DecimalType;
}

Note that TIMESTAMP is missing (and has a base of AbstractLongType)

@aaneja , @tdcmeehan , Hive tables will need additional support in HiveColumnStatistics class, similar to Date type.
It will need to calculate min/max values for timestamp column. I think, we can open a new issue for hive tables.

public static HiveColumnStatistics merge(HiveColumnStatistics first, HiveColumnStatistics second)
    {
        return new HiveColumnStatistics(
                mergeIntegerStatistics(first.getIntegerStatistics(), second.getIntegerStatistics()),
                mergeDoubleStatistics(first.getDoubleStatistics(), second.getDoubleStatistics()),
                mergeDecimalStatistics(first.getDecimalStatistics(), second.getDecimalStatistics()),
                mergeDateStatistics(first.getDateStatistics(), second.getDateStatistics()),
                mergeBooleanStatistics(first.getBooleanStatistics(), second.getBooleanStatistics()),
                reduce(first.getMaxValueSizeInBytes(), second.getMaxValueSizeInBytes(), MAX, true),
                reduce(first.getTotalSizeInBytes(), second.getTotalSizeInBytes(), ADD, true),
                reduce(first.getNullsCount(), second.getNullsCount(), ADD, false),
                reduce(first.getDistinctValuesCount(), second.getDistinctValuesCount(), MAX, false));
    }

@tdcmeehan tdcmeehan moved this from 🏗 In progress to 👀 Review in Iceberg Support Oct 31, 2023
@gupteaj gupteaj moved this from 👀 Review to ✅ Done in Iceberg Support Nov 3, 2023
@gupteaj gupteaj closed this as completed Nov 3, 2023
@github-project-automation github-project-automation bot moved this from 🆕 Unprioritized to ✅ Done in Bugs and support requests Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug iceberg Apache Iceberg related
Projects
Archived in project
Status: Done
Development

No branches or pull requests

2 participants