Write Avro files. #404

bashir2 · 2018-10-30T00:29:09Z

Beside writing into BigQuery, we need to also support serializing variant records into some binary format. This is useful when the variants are needed to be used in contexts other than BigQuery. Avro format seems to be a good choice.

smrgit · 2018-10-30T00:39:24Z

I don't know much about this, but what about BCF? https://samtools.github.io/bcftools/bcftools.html

bashir2 · 2018-11-03T05:52:36Z

Thanks @smrgit for the note. The main idea here is that when a downstream pipeline needs to use Variant Transforms output, it has the option of both reading from the output BigQuery table OR directly use these Avro files. One of the reasons (among others) for choosing Avro is that it mimics the BigQuery row format when reading or writing. In other words, a pipeline can easily switch between reading a BigQuery table or its equivalent Avro output; the same also applies for writing (the difference being mainly the schema).

I have a working version on this branch which shows the sink part.

bashir2 added enhancement P2 labels Oct 30, 2018

bashir2 self-assigned this Oct 30, 2018

bashir2 mentioned this issue Nov 6, 2018

The first version of Avro output generation. #411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write Avro files. #404

Write Avro files. #404

bashir2 commented Oct 30, 2018

smrgit commented Oct 30, 2018

bashir2 commented Nov 3, 2018

Write Avro files. #404

Write Avro files. #404

Comments

bashir2 commented Oct 30, 2018

smrgit commented Oct 30, 2018

bashir2 commented Nov 3, 2018