-
Notifications
You must be signed in to change notification settings - Fork 308
Conversation
Current coverage is 90.99% (diff: 86.66%)@@ master #117 diff @@
==========================================
Files 5 5
Lines 304 322 +18
Methods 269 270 +1
Messages 0 0
Branches 35 52 +17
==========================================
+ Hits 278 293 +15
- Misses 26 29 +3
Partials 0 0
|
Hey! are there any plans to merge this pull request in the next release? Since we are working with complex unions it would be of great help to us. Thanks! |
Rebased on top of current master. In the meantime, I'll be maintaining patched artifacts for 2.x and 3.x versions in this bintray repo: |
3. `union(something, null)`, where `something` is one of the supported Avro types listed above or is one of the supported `union` types. | ||
1. `union(int, long)` will be mapped to `LongType`. | ||
2. `union(float, double)` will be mapped to `DoubleType`. | ||
3. `union(something, null)`, where `something` is any supported Avro type. This will be mapped to the same Spark SQL type as that of `something`, with `nullable` set to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me curious about the union(union(int, long), null)
case. Let me quickly double-check to make sure that there's a test covering this corner-case to make sure that behavior hasn't changed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep this should work as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I verified this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had two minor style nits, but otherwise this looks good so I think that I'll merge it as soon as I run my own tests. I noticed that the old behavior doesn't seem to be 100% specified by unit tests, so I think that I'll submit my own followup PR to take each verifiable statement in the README and add a test case for it.
@@ -22,6 +22,8 @@ import java.nio.file.Files | |||
import java.sql.Timestamp | |||
import java.util.UUID | |||
|
|||
import org.apache.avro.generic.GenericData.{EnumSymbol, Fixed} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import should be grouped with the other Avro imports further down in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
2. `union(float, double)` will be mapped to `DoubleType`. | ||
3. `union(something, null)`, where `something` is any supported Avro type. This will be mapped to the same Spark SQL type as that of `something`, with `nullable` set to `true`. | ||
|
||
All other `union` types are considered complex. They will be mapped to `StructType` where field names are `member0`, `member1`, etc., in accordance with members of the `union`. This is consistent with the behavior when reading Parquet files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for remembering to add docs, by the way.
case other => | ||
sqlType match { | ||
case t: StructType if t.fields.length == avroSchema.getTypes.size => | ||
val fieldConverters = t.fields zip avroSchema.getTypes map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style nit: we tend to prefer .zip(..)
to infix notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
s"This mix of union types is not supported (see README): $other") | ||
case _ => | ||
// Convert complex unions to struct types where field names are member0, member1, etc. | ||
// This is consistent with the behavior when reading Parquet files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "reading Parquet files", I guess you mean reading Avro records as Parquet messages using parquet-avro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i meant avro parquet conversion. i've updated the comments
} | ||
case _ => throw new IncompatibleSchemaException( | ||
s"Cannot convert Avro schema to catalyst type because schema at path " + | ||
s"${path.mkString(".")} is not compatible " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd prefer "not supported" instead of "not compatible". "Compatible" is a term usually used in scenarios like schema evolution/merging, while here it means Spark SQL doesn't recognize some specific Avro schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to keep the original wording (there's another reference right below). I believe it is referring to the incompatibility between source avro schema and target catalyst type. I'm fine with either wording. Let me know and I can change or leave both references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chatted with @liancheng about this and we think it's fine to keep the current wording.
Also LGTM. Just two minor comments. Thanks! |
LGTM, so I'm going to merge this in to master now. Thanks for your contributions and patience, @jasonxh! |
A complex union will be converted to a struct type where field names are
member0
,member1
, etc., in accordance with members of the union. This is consistent with the behavior when reading Parquet files. Field values are resolved following existing schema translation rules. The test case is a good example of this translation.