Home

Pan-ancestry GWAS of UKBiobank

FAQ's describe overarching study motivation and background.

Release data

We are planning to release the summary statistics in two formats:

For one or a few phenotypes, we recommend using the phenotype-specific flat files: see further description here.
For analysis the full dataset (all phenotypes, all populations), the summary statistics are available in Hail formats: see further description here.

Approach

Analysis was done using SAIGE implemented in Hail batch to parallelize across populations, phenotypes, and regions of the genome. More details can be found below:

Details about how we determined ancestry groups is here.
Description of GWAS pipeline and implementation is here.

The sample size for each population and the number of phenotypes run is as follows:

+-------+-----------+----------+
| pop   | n_samples | n_phenos |
+-------+-----------+----------+
| "AFR" |      6700 |     2474 |
| "AMR" |       991 |     1113 |
| "CSA" |      8998 |     2753 |
| "EAS" |      2752 |     1620 |
| "EUR" |    423837 |     7142 |
| "MID" |      1614 |     1372 |
+-------+-----------+----------+

Each phenotype may have fewer samples run, depending on data missingness, which can be found in the phenotype manifest, or n_cases and n_controls in the Hail MatrixTable.

Home

Project FAQ
Data release
- Per-phenotype files
- Hail format
Approach
- QC
- Batch pipeline
Changelog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Pan-ancestry GWAS of UKBiobank

Release data

Approach

Home

Clone this wiki locally