git clone https://github.com/HKU-BAL/ncov19_cytosine_depletion.git
cd ncov19_cytosine_depletion/
conda env create -f environment.yml
conda activate ncov19-ca
npm install
Nextstrain is also required, please follow the installation guide in the page.
- Use
crawler.ts
to download fasta from GISAID. - Run
align_fasta.py
,cleanup.py
,count_byday.py
,plot.py
,update_intro.py
andconcat_fasta.py
in order for processing the downloaded fasta.
- All scripts modify files under the
base_folder
, which is defined and if needed, should be modified in bothparams.py
andcrawler.ts
. Default is insidedownload_data/
under this directory. - The repository of Nextstrain/ncov need to be referenced in
params.py
after thencov_folder
variable. base_folder/fasta/
stores the.fasta
and.info
retrieved directly from GISAID.base_folder/aligned_fasta/
stores the aligned fasta against the reference.base_folder/processed_fasta/
stores the raw sequence fromfasta
that is qualified (sequence length>29k, N<5%), whilebase_folder/backup_fasta/
stores those that is not.