Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Avro files. #404

Open
bashir2 opened this issue Oct 30, 2018 · 2 comments
Open

Write Avro files. #404

bashir2 opened this issue Oct 30, 2018 · 2 comments
Assignees

Comments

@bashir2
Copy link
Member

bashir2 commented Oct 30, 2018

Beside writing into BigQuery, we need to also support serializing variant records into some binary format. This is useful when the variants are needed to be used in contexts other than BigQuery. Avro format seems to be a good choice.

@smrgit
Copy link

smrgit commented Oct 30, 2018

I don't know much about this, but what about BCF? https://samtools.github.io/bcftools/bcftools.html

@bashir2
Copy link
Member Author

bashir2 commented Nov 3, 2018

Thanks @smrgit for the note. The main idea here is that when a downstream pipeline needs to use Variant Transforms output, it has the option of both reading from the output BigQuery table OR directly use these Avro files. One of the reasons (among others) for choosing Avro is that it mimics the BigQuery row format when reading or writing. In other words, a pipeline can easily switch between reading a BigQuery table or its equivalent Avro output; the same also applies for writing (the difference being mainly the schema).

I have a working version on this branch which shows the sink part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants