Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: the format of file [ldscores/1.l2.M_5_50] is incorrect #89

Open
aydanasg opened this issue Jul 31, 2024 · 1 comment
Open

Error: the format of file [ldscores/1.l2.M_5_50] is incorrect #89

aydanasg opened this issue Jul 31, 2024 · 1 comment

Comments

@aydanasg
Copy link

Hi there,

I am running mtCOJO analysis using Alzheimer's Disease while conditioning on Small Vessel Disease. I am using LD reference from 1000 Genome reference for EUR population and LD scores and weights from https://github.com/bulik/ldsc as suggested in the tutorial. This is how I run it:

/rds/general/user/aa19618/home/mtCOJO/gcta-1.94.1-linux-kernel-3-x86_64/gcta64
--mbfile 1000G_EUR_Phase3.mtcojo_ref_data.txt
--mtcojo-file mtCojo_summary_data.list
--ref-ld-chr 1000G_Phase3_baselineLD_v2.2_ldscores/
--w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/
--out test_mtcojo_result

Ad this is the error message I get:


  • Genome-wide Complex Trait Analysis (GCTA)
  • version v1.94.1 Linux
  • Built at Nov 15 2022 21:14:25, by GCC 8.5
  • (C) 2010-present, Yang Lab, Westlake University
  • Please report bugs to Jian Yang [email protected]

Analysis started at 10:44:09 BST on Wed Jul 31 2024.
Hostname: login-a

Accepted options:
--mbfile 1000G_EUR_Phase3.mtcojo_ref_data.txt
--mtcojo-file mtCojo_summary_data.list
--ref-ld-chr 1000G_Phase3_baselineLD_v2.2_ldscores/
--w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/
--out test_mtcojo_result

There are 22 PLINK genotype files specified in [1000G_EUR_Phase3.mtcojo_ref_data.txt].

Reading the PLINK FAM files ....
489 individuals have been included from the PLINK FAM files.
Reading the PLINK BIM files ...
9997231 SNPs to be included from PLINK BIM files.

Reading GWAS summary data from [mtCojo_summary_data.list] ...
6134163 SNPs in common between the target trait and the covariate trait(s).
Filtering out SNPs with multiple alleles or missing value ...
5026 SNPs have missing value or mismatched alleles. These SNPs have been saved in [test_mtcojo_result.badsnps].
6129137 SNPs are retained after filtering.
There are 1294 genome-wide significant SNPs with p < 5.0e-08.

Reading PLINK BED files ...
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.1.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.4.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.7.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.9.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.11.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.12.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.18.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.19.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.20.bed, no SNPs retained on this chromosome.
Skip reading /rds/general/user/aa19618/projects/epinott/live/scripts/ldsc/required_files/1000G_EUR_Phase3_plink/1000G.EUR.QC.21.bed, no SNPs retained on this chromosome.
Genotype data for 489 individuals and 1294 SNPs have been included.
Calculating allele frequencies ...
Checking the difference in allele frequency between the GWAS summary datasets and the LD reference sample...
7796 SNP(s) have large difference of allele frequency between the GWAS summary data and the reference sample. These SNPs have been saved in [test_mtcojo_result.freq.badsnps].
Error: the format of file [1000G_Phase3_baselineLD_v2.2_ldscores/1.l2.M_5_50] is incorrect.
An error occurs, please check the options or data

Could you please advise me on what the issue could be? I have not experienced any issues with the formats of my ldscores files while using the same LD reference.

Thank you in advance,
Aydan

@longmanz
Copy link
Collaborator

longmanz commented Aug 1, 2024

Hi,
It seems that the rsIDs of your 1000G data and your GWAS data have very little overlap with each other. Could you check if the rsIDs in your GWAS data are really "rsID" instead of in the format of "chr:pos"? If the latter is the case, you will need to convert "chr:pos" to the corresponding rsIDs (using the corresponding genome reference build).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants