Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample lookup optimized tables with --append flag #607

Open
samanvp opened this issue Jun 1, 2020 · 0 comments
Open

Sample lookup optimized tables with --append flag #607

samanvp opened this issue Jun 1, 2020 · 0 comments

Comments

@samanvp
Copy link
Contributor

samanvp commented Jun 1, 2020

The way we create sample lookup optimized is inefficient, consider the following typical workflow:

  • Run VT for the first batch of VCF files (with --sample_lookup_optimized_output_table set and without --append).
  • Run VT for the second batch of VCF files (with --sample_lookup_optimized_output_table and --append set).
  • Run VT for the third batch of VCF files (with --sample_lookup_optimized_output_table and --append set).
    ...

Currently the way we load data into sample optimized tables is by querying variant optimized tables, flattening the call column, and then copying the result into sample optimized tables. In this implementation (#606), with each new run of VT (with --append set) we read all rows of variant optimized tables and load the result into sample optimized tables with write_disposition='WRITE_TRUNCATE'.

A more efficient implementation would be to flatten and add only newly added rows with write_disposition='WRITE_APPEND'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant