-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NullPointerException when using VectorizedArrowReader to read a null column #10275
Comments
Fix NullPointerException when trying to add the vector's class name to the message for an UnsupportedOperationException
This test more closely follows the reproduction steps described in issue apache#10275
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Apache Iceberg version
1.5.1 (latest release)
Query engine
Other
Please describe the bug 🐞
I am writing a compatibility layer for Teradata so that it can access Iceberg tables stored in AWS S3. I am experiencing what at first glance appears to be a bug in Iceberg, but I'd like to get the opinion of the experts here. To be clear I am using Apache Iceberg 1.5.1 and Apache Arrow 15.0.0.
The problem is I am getting a NullPointerException thrown from GenericArrowVectorFactory.java line 224. The NPE is thrown on line 224 because
vector
is null.How do I get to this point? Here's the minimal test case:
Prerequisite:
repro:
The above SQL select statement works in AWS Athena, but fails in my code. My code is using an instance of
org.apache.iceberg.arrow.vectorized.ArrowReader$VectorizedCombinedScanIterator
The cause, as I see it, is that the one row in the table contains only three columns worth of data, but the current table schema defines four columns. Because of this difference in schemas Iceberg creates the following four readers, once for each column respectively:
VecorizedArrowReader
corresponding to columna
VecorizedArrowReader
corresponding to columnb
VecorizedArrowReader
corresponding to columnc
VecorizedArrowReader$NullVectorReader
corresponding to columna1
Naturally the
VecorizedArrowReader$NullVectorReader
instance contains anull
value for the vector. This instance is assigned at VectorizedReaderBuilder.java line 100.Continuing down the code path Iceberg calls
GenericArrowVectorAccessorFactory.getPlainVectorAccessor
. This method checks to see whethervector
is an instance of various *Vector types. Becausevector
has a value ofnull
it is not an instance of any type. Thus this method ends up in its ultimate fallback case and tries to throw an exception:The problem is that
vector
isnull
and this callingvector.getClass()
throws aNullPointerException
.The stack trace is:
So my questions:
null
value forvector
when building the message for the UnsupportedOperationException?p.s. I asked this question in the Slack channel but didn't get any traction. https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1714676216273989
The text was updated successfully, but these errors were encountered: