Add support for complex union types #117

jasonxh · 2016-02-10T00:00:39Z

A complex union will be converted to a struct type where field names are member0, member1, etc., in accordance with members of the union. This is consistent with the behavior when reading Parquet files. Field values are resolved following existing schema translation rules. The test case is a good example of this translation.

codecov-io · 2016-02-10T00:04:59Z

Current coverage is 90.99% (diff: 86.66%)

Merging #117 into master will decrease coverage by 0.45%

@@             master       #117   diff @@
==========================================
  Files             5          5          
  Lines           304        322    +18   
  Methods         269        270     +1   
  Messages          0          0          
  Branches         35         52    +17   
==========================================
+ Hits            278        293    +15   
- Misses           26         29     +3   
  Partials          0          0

Powered by Codecov. Last update 16ed688...b59c82a

sweb · 2016-04-19T09:26:59Z

Hey!

are there any plans to merge this pull request in the next release? Since we are working with complex unions it would be of great help to us.

Thanks!
Florian

…support; applied patches databricks#117, databricks#89, databricks#132, databricks#130, databricks#73

jasonxh · 2016-10-15T05:58:46Z

Rebased on top of current master. In the meantime, I'll be maintaining patched artifacts for 2.x and 3.x versions in this bintray repo:
https://bintray.com/jasonxh/maven/spark-avro

JoshRosen · 2016-11-22T01:06:28Z

README.md

-3. `union(something, null)`, where `something` is one of the supported Avro types listed above or is one of the supported `union` types.
+1. `union(int, long)` will be mapped to `LongType`.
+2. `union(float, double)` will be mapped to `DoubleType`.
+3. `union(something, null)`, where `something` is any supported Avro type. This will be mapped to the same Spark SQL type as that of `something`, with `nullable` set to `true`.


This makes me curious about the union(union(int, long), null) case. Let me quickly double-check to make sure that there's a test covering this corner-case to make sure that behavior hasn't changed here.

yep this should work as expected

Yep, I verified this as well.

JoshRosen

I had two minor style nits, but otherwise this looks good so I think that I'll merge it as soon as I run my own tests. I noticed that the old behavior doesn't seem to be 100% specified by unit tests, so I think that I'll submit my own followup PR to take each verifiable statement in the README and add a test case for it.

JoshRosen · 2016-11-22T01:10:37Z

src/test/scala/com/databricks/spark/avro/AvroSuite.scala

@@ -22,6 +22,8 @@ import java.nio.file.Files
 import java.sql.Timestamp
 import java.util.UUID

+import org.apache.avro.generic.GenericData.{EnumSymbol, Fixed}


This import should be grouped with the other Avro imports further down in this file.

JoshRosen · 2016-11-22T01:13:53Z

README.md

+2. `union(float, double)` will be mapped to `DoubleType`.
+3. `union(something, null)`, where `something` is any supported Avro type. This will be mapped to the same Spark SQL type as that of `something`, with `nullable` set to `true`.
+
+All other `union` types are considered complex. They will be mapped to `StructType` where field names are `member0`, `member1`, etc., in accordance with members of the `union`. This is consistent with the behavior when reading Parquet files.


Thanks for remembering to add docs, by the way.

JoshRosen · 2016-11-22T01:18:15Z

src/main/scala/com/databricks/spark/avro/SchemaConverters.scala

+            case other =>
+              sqlType match {
+                case t: StructType if t.fields.length == avroSchema.getTypes.size =>
+                  val fieldConverters = t.fields zip avroSchema.getTypes map {


Minor style nit: we tend to prefer .zip(..) to infix notation.

liancheng · 2016-11-22T22:20:42Z

src/main/scala/com/databricks/spark/avro/SchemaConverters.scala

-            s"This mix of union types is not supported (see README): $other")
+          case _ =>
+            // Convert complex unions to struct types where field names are member0, member1, etc.
+            // This is consistent with the behavior when reading Parquet files.


By "reading Parquet files", I guess you mean reading Avro records as Parquet messages using parquet-avro?

yes i meant avro parquet conversion. i've updated the comments

liancheng · 2016-11-22T23:56:33Z

src/main/scala/com/databricks/spark/avro/SchemaConverters.scala

+                  }
+                case _ => throw new IncompatibleSchemaException(
+                  s"Cannot convert Avro schema to catalyst type because schema at path " +
+                    s"${path.mkString(".")} is not compatible " +


Nit: I'd prefer "not supported" instead of "not compatible". "Compatible" is a term usually used in scenarios like schema evolution/merging, while here it means Spark SQL doesn't recognize some specific Avro schema.

I was trying to keep the original wording (there's another reference right below). I believe it is referring to the incompatibility between source avro schema and target catalyst type. I'm fine with either wording. Let me know and I can change or leave both references.

I chatted with @liancheng about this and we think it's fine to keep the current wording.

liancheng · 2016-11-22T23:57:29Z

Also LGTM. Just two minor comments. Thanks!

JoshRosen · 2016-11-24T03:33:25Z

LGTM, so I'm going to merge this in to master now. Thanks for your contributions and patience, @jasonxh!

jasonxh mentioned this pull request Feb 10, 2016

Add option to convert unsupported types to string #116

Closed

juyttenh pushed a commit to kingcontext/spark-avro that referenced this pull request Jun 9, 2016

ability to link avro schema with spark sql schema; better avro union …

2b34173

…support; applied patches databricks#117, databricks#89, databricks#132, databricks#130, databricks#73

This was referenced Sep 26, 2016

complex union support #178

Closed

complex union support cwlaird3/spark-avro#1

Merged

Add support for complex union types

422aa5e

JoshRosen closed this Nov 22, 2016

JoshRosen reopened this Nov 22, 2016

JoshRosen self-assigned this Nov 22, 2016

JoshRosen added this to the 3.1.0 milestone Nov 22, 2016

JoshRosen added the enhancement label Nov 22, 2016

JoshRosen reviewed Nov 22, 2016

View reviewed changes

JoshRosen approved these changes Nov 22, 2016

View reviewed changes

liancheng reviewed Nov 22, 2016

View reviewed changes

Address PR comments

b59c82a

liancheng reviewed Nov 22, 2016

View reviewed changes

JoshRosen closed this in 7945751 Nov 24, 2016

JoshRosen mentioned this pull request Nov 24, 2016

Handling more complex unions in SchemaConverter #108

Closed

jasonxh deleted the hao/complex-union branch October 4, 2017 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for complex union types #117

Add support for complex union types #117

jasonxh commented Feb 10, 2016

codecov-io commented Feb 10, 2016 •

edited

Loading

sweb commented Apr 19, 2016

jasonxh commented Oct 15, 2016

JoshRosen Nov 22, 2016

jasonxh Nov 22, 2016

JoshRosen Nov 24, 2016

JoshRosen left a comment

JoshRosen Nov 22, 2016

jasonxh Nov 22, 2016

JoshRosen Nov 22, 2016

JoshRosen Nov 22, 2016

jasonxh Nov 22, 2016

liancheng Nov 22, 2016

jasonxh Nov 22, 2016

liancheng Nov 22, 2016

jasonxh Nov 23, 2016

JoshRosen Nov 24, 2016

liancheng commented Nov 22, 2016

JoshRosen commented Nov 24, 2016

Add support for complex union types #117

Add support for complex union types #117

Conversation

jasonxh commented Feb 10, 2016

codecov-io commented Feb 10, 2016 • edited Loading

Current coverage is 90.99% (diff: 86.66%)

sweb commented Apr 19, 2016

jasonxh commented Oct 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshRosen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liancheng commented Nov 22, 2016

JoshRosen commented Nov 24, 2016

codecov-io commented Feb 10, 2016 •

edited

Loading