Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result mismatch when data contains empty map #4587

Closed
yma11 opened this issue Jan 31, 2024 · 7 comments
Closed

Result mismatch when data contains empty map #4587

yma11 opened this issue Jan 31, 2024 · 7 comments
Labels
bug Something isn't working triage

Comments

@yma11
Copy link
Contributor

yma11 commented Jan 31, 2024

Backend

VL (Velox)

Bug description

UT for reproduce:

    withTempPath {
      path =>
        Seq(
          Map[Int, String](1 -> null, 2 -> "200"),
          Map[Int, String](),
          Map[Int, String](1 -> "100", 2 -> "200", 3 -> "300"),
          null
        )
          .toDF("i")
          .write
          .parquet(path.getCanonicalPath)

        spark.read.parquet(path.getCanonicalPath).createOrReplaceTempView("map_tbl")

        runQueryAndCompare("select map_entries(i) from map_tbl") {
          checkOperatorMatch[ProjectExecTransformer]
        }
    }

Result mismatch:

== Physical Plan ==
*(1) Project [map_entries(i#144) AS map_entries(i)#152]
+- VeloxColumnarToRowExec
   +- ^(1) BatchScanExecTransformer[i#144] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f49a09d8-ad14-445a-a536-ca9f802dd495], PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<i:map<int,string>>, PushedFilters: [], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []

== Results ==

== Results ==
!== Correct Answer - 4 ==                   == Gluten Answer - 4 ==
 struct<>                                                struct<>
 [ArrayBuffer()]                                       [ArrayBuffer()]
![ArrayBuffer([1,100], [2,200], [3,300])]   [ArrayBuffer([0,null], [1,100], [2,200])]
 [ArrayBuffer([1,null], [2,200])]                [ArrayBuffer([1,null], [2,200])]
 [null]                                                      [null]
    

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@yma11 yma11 added bug Something isn't working triage labels Jan 31, 2024
@Yohahaha
Copy link
Contributor

@zhouyuan
Copy link
Contributor

related: #4778

@ulysses-you
Copy link
Contributor

@yma11 @zhouyuan does this issue still exist ? it seems has been fixed by facebookincubator/velox#9187

@yma11
Copy link
Contributor Author

yma11 commented May 31, 2024

Thanks for ping. Let me have a check.

@yma11
Copy link
Contributor Author

yma11 commented May 31, 2024

Verified using the UT.

@yma11 yma11 closed this as completed May 31, 2024
@ulysses-you
Copy link
Contributor

thank you @yma11 , shall we tune spark.gluten.sql.complexType.scan.fallback.enabled to false by default ?

@yma11
Copy link
Contributor Author

yma11 commented May 31, 2024

I think for customer, they can take a try to disable if there is no result mismatch issue in there scenario. We are still not so confident about complex type support in velox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

4 participants