-
Notifications
You must be signed in to change notification settings - Fork 23
Home
FAQ's describe overarching study motivation and background.
We are planning to release the summary statistics in two formats:
-
For one or a few phenotypes, we recommend using the phenotype-specific flat files: see further description here.
-
For analysis the full dataset (all phenotypes, all populations), the summary statistics are available in Hail formats: see further description here.
Analysis was done using SAIGE implemented in Hail batch to parallelize across populations, phenotypes, and regions of the genome. More details can be found below:
- Details about how we determined ancestry groups is here.
- Description of GWAS pipeline and implementation is here.
The sample size for each population and the number of phenotypes run is as follows:
+-------+-----------+----------+
| pop | n_samples | n_phenos |
+-------+-----------+----------+
| "AFR" | 6700 | 2474 |
| "AMR" | 991 | 1113 |
| "CSA" | 8998 | 2753 |
| "EAS" | 2752 | 1620 |
| "EUR" | 423837 | 7142 |
| "MID" | 1614 | 1372 |
+-------+-----------+----------+
Each phenotype may have fewer samples run, depending on data missingness, which can be found in the phenotype manifest, or n_cases
and n_controls
in the Hail MatrixTable.