Skip to content
Alicia Martin edited this page Apr 23, 2020 · 23 revisions

Pan-ancestry GWAS of UKBiobank

FAQ's describe overarching study motivation and background.

Release data

We are planning to release the summary statistics in two formats:

  • For one or a few phenotypes, we recommend using the phenotype-specific flat files: see further description here.

  • For analysis the full dataset (all phenotypes, all populations), the summary statistics are available in Hail formats: see further description here.

Approach

Analysis was done using SAIGE implemented in Hail batch to parallelize across populations, phenotypes, and regions of the genome. More details can be found below:

  • Details about how we determined ancestry groups is here.
  • Description of GWAS pipeline and implementation is here.

The sample size for each population and the number of phenotypes run is as follows:

+-------+-----------+----------+
| pop   | n_samples | n_phenos |
+-------+-----------+----------+
| "AFR" |      6700 |     2474 |
| "AMR" |       991 |     1113 |
| "CSA" |      8998 |     2753 |
| "EAS" |      2752 |     1620 |
| "EUR" |    423837 |     7142 |
| "MID" |      1614 |     1372 |
+-------+-----------+----------+

Each phenotype may have fewer samples run, depending on data missingness, which can be found in the phenotype manifest, or n_cases and n_controls in the Hail MatrixTable.

Clone this wiki locally