Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Datasets #16

Open
YOUKAINOYAMA opened this issue Oct 26, 2023 · 4 comments
Open

About Datasets #16

YOUKAINOYAMA opened this issue Oct 26, 2023 · 4 comments

Comments

@YOUKAINOYAMA
Copy link

Hello, thank you for sharing the code. I would like to replicate your work, and I have obtained access permission for ADNI. However, I'm facing difficulties in selecting the data according to the descriptions in the paper. If possible, could you share the dataset you filtered from ADNI with me, or provide some guidance on how to select file names or table names on the official ADNI website? My email is [email protected]. Thanks again for your work.

@sydat2701
Copy link

Hi, I'm facing the same problem with you. Did you solve it? If yes, could you share it with me, because I also really want to experience with this work. Thank you so much for your help.

@hnchan13
Copy link

hnchan13 commented Dec 7, 2023

I am facing the same issue as well and also have access to ADNI. Would anyone be able to help please or point out any mistakes I have made (if any)? The following are my issues:

Genetic Data

  • I had to add the line if vcf_file.endswith(".gz"): inside the for loop for vcf_file in files: of the python script filter_vcfs.py to prevent .vcf.gz.tbi files from being processed as errors were returned.

  • For filter_vcfs.py, it seems that only .pkl files and "log.txt" will be generated, however, after iterating through all the files, that is, the ADNI WGS (GATK) data, not a single .pkl file was generated. Therefore, the only file output was log.txt containing which contain boolean values (nearly if not all are False). Issue: No pickle files generated, therefore unable to feed this data into downstream code concat_vcfs.py

  • I am struggling to find the labels for the genetic data used in the MADDI study i.e. for the python script concat_vcfs.py on line 12 diag = pd.read_csv("YOUR_PATH_TO_DIAGNOSIS_TABLE"), I am unable to locate the diagnosis table. Issue: Unable to find diagnosis table on ADNI website

Additional issues faced during genetic data pre-processing
For : ./ADNI.808_indiv.minGQ_21.pass.ADNI_ID.chr3.vcf.gz

CSV reading complete
vcf: <pandas.io.parsers.readers.TextFileReader object at 0x7fe95fb15790>
Traceback (most recent call last):
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 100, in
main()
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 61, in main
vcf = pd.concat(vcf, ignore_index=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 422, in init
objs = list(objs)
^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1698, in next
return self.get_chunk()
^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/parsers.pyx", line 820, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 817 fields in line 1476784, saw 833

@hnchan13
Copy link

hnchan13 commented Dec 7, 2023

For the python script filter_vcfs.py on line 53, end = vcf_file.find("output.vcf"), it seems this value will always produce -1 given that none of the vcf_files contain "output.vcf", was this intended?

@YOUKAINOYAMA
Copy link
Author

YOUKAINOYAMA commented Dec 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants