-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
74 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,16 +49,21 @@ The **output_docs_folder** sets the folder where your final .json files will be | |
#### VCF conversion config parameters | ||
The **num_variants** is the variable you need to write in case you are executing the vcf conversor (genomicVariations_vcf.py). This will tell the script how many vcf lines will be read and converted from the file(s). | ||
The **reference_genome** is the genome reference your the tool is using to map the position of the chromosomes. | ||
The **allele_frequency** let's you set a threshold for the allele frequency of the variants you want to convert from the vcf file. | ||
|
||
### Converting data from .vcf (.vcf.gz) file | ||
### Converting data from .vcf.gz file | ||
|
||
To convert data from .vcf (.vcf.gz) to .json, you will have to copy all the files you want to convert inside the [files_to_read folder](https://github.com/EGA-archive/beacon2-ri-tools-v2/tree/main/files/vcf/files_to_read). | ||
To convert data from .vcf.gz to .json, you will need to copy all the files you want to convert inside the [files_to_read folder](https://github.com/EGA-archive/beacon2-ri-tools-v2/tree/main/files/vcf/files_to_read). | ||
You will need to provide one .vcf.gz file file and save it in this folder. | ||
|
||
```bash | ||
docker exec -it ri-tools python genomicVariations_vcf.py | ||
``` | ||
This will generate the final .json file that is Beacon Friendly Format in the output_docs folder with the name of the collection followed by .json extension, e.g. genomicVariations.json. | ||
After that, if needed, export your documents from mongoDB to a .json file using this command: | ||
```bash | ||
docker exec ri-tools-mongo mongoexport --jsonArray --uri "mongodb://root:[email protected]:27017/beacon?authSource=admin" --collection genomicVariations | jq 'del(.[]._id)' > genomicVariations.json | ||
``` | ||
This will generate the final .json file that is Beacon Friendly Format. Bear in mind that this time, the file will be saved in the directory you are located, so if you want to save it in the output_docs folder, add it in the path of the mongoexport. | ||
|
||
### Creating the .csv file (if metadata or not having a vcf file for genomicVariations) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,16 @@ | ||
#### Input and Output files config parameters #### | ||
csv_filename='./csv/examples/genomicVariations.csv' | ||
csv_filename='./csv/output3.csv' | ||
output_docs_folder='./output_docs/' | ||
|
||
#### VCF Conversion config parameters #### | ||
num_variants=1000000 | ||
num_variants=10000000 | ||
allele_frequency=1 # introduce float number, leave 1 if you want to convert all the variants | ||
reference_genome='GRCh38' # Choose one between NCBI36, GRCh37, GRCh38 | ||
|
||
|
||
|
||
### MongoDB parameters ### | ||
database_host = 'mongo' | ||
database_port = 27017 | ||
database_user = 'root' | ||
database_password = 'example' | ||
database_name = 'beacon' | ||
database_auth_source = 'admin' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,3 +6,4 @@ python-dateutil==2.8.2 | |
tqdm==4.66.1 | ||
urllib3==2.0.7 | ||
cyvcf2==0.30.28 | ||
pymongo==4.6.1 |