About Datasets #16

YOUKAINOYAMA · 2023-10-26T03:59:25Z

Hello, thank you for sharing the code. I would like to replicate your work, and I have obtained access permission for ADNI. However, I'm facing difficulties in selecting the data according to the descriptions in the paper. If possible, could you share the dataset you filtered from ADNI with me, or provide some guidance on how to select file names or table names on the official ADNI website? My email is [email protected]. Thanks again for your work.

sydat2701 · 2023-12-07T02:44:45Z

Hi, I'm facing the same problem with you. Did you solve it? If yes, could you share it with me, because I also really want to experience with this work. Thank you so much for your help.

hnchan13 · 2023-12-07T08:35:24Z

I am facing the same issue as well and also have access to ADNI. Would anyone be able to help please or point out any mistakes I have made (if any)? The following are my issues:

Genetic Data

I had to add the line if vcf_file.endswith(".gz"): inside the for loop for vcf_file in files: of the python script filter_vcfs.py to prevent .vcf.gz.tbi files from being processed as errors were returned.
For filter_vcfs.py, it seems that only .pkl files and "log.txt" will be generated, however, after iterating through all the files, that is, the ADNI WGS (GATK) data, not a single .pkl file was generated. Therefore, the only file output was log.txt containing which contain boolean values (nearly if not all are False). Issue: No pickle files generated, therefore unable to feed this data into downstream code concat_vcfs.py
I am struggling to find the labels for the genetic data used in the MADDI study i.e. for the python script concat_vcfs.py on line 12 diag = pd.read_csv("YOUR_PATH_TO_DIAGNOSIS_TABLE"), I am unable to locate the diagnosis table. Issue: Unable to find diagnosis table on ADNI website

Additional issues faced during genetic data pre-processing
For : ./ADNI.808_indiv.minGQ_21.pass.ADNI_ID.chr3.vcf.gz

CSV reading complete
vcf: <pandas.io.parsers.readers.TextFileReader object at 0x7fe95fb15790>
Traceback (most recent call last):
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 100, in
main()
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 61, in main
vcf = pd.concat(vcf, ignore_index=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 422, in init
objs = list(objs)
^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1698, in next
return self.get_chunk()
^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/parsers.pyx", line 820, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 817 fields in line 1476784, saw 833

hnchan13 · 2023-12-07T09:32:26Z

For the python script filter_vcfs.py on line 53, end = vcf_file.find("output.vcf"), it seems this value will always produce -1 given that none of the vcf_files contain "output.vcf", was this intended?

YOUKAINOYAMA · 2023-12-07T15:22:48Z

Sorry, I'm not quite sure. I plan to conduct an experiment on this paper using the dataset I collected myself, and the author cannot disclose this medical dataset to us. 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用发件人: ***@***.***> 发送时间: 2023年12月7日 17:32 收件人: ***@***.***> 抄送: ***@***.***>; ***@***.***> 主题: Re: [rsinghlab/MADDi] About Datasets (Issue #16) For the python script filter_vcfs.py on line 53, end = vcf_file.find("output.vcf"), it seems this value will always produce -1 given that none of the vcf_files contain "output.vcf", was this intended? ― Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AY4FWYLA4RVMLMZ34CIDIFLYIGELJAVCNFSM6AAAAAA6QMXHWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBUHE4TKNZQGQ>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Datasets #16

About Datasets #16

YOUKAINOYAMA commented Oct 26, 2023

sydat2701 commented Dec 7, 2023

hnchan13 commented Dec 7, 2023 •

edited

Loading

hnchan13 commented Dec 7, 2023

YOUKAINOYAMA commented Dec 7, 2023 via email

About Datasets #16

About Datasets #16

Comments

YOUKAINOYAMA commented Oct 26, 2023

sydat2701 commented Dec 7, 2023

hnchan13 commented Dec 7, 2023 • edited Loading

hnchan13 commented Dec 7, 2023

YOUKAINOYAMA commented Dec 7, 2023 via email

hnchan13 commented Dec 7, 2023 •

edited

Loading