Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

This repository contains all data related to the paper "Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases". For more information about this work, please contact:

Pedro Romero, pedro.romero@upch.pe (corresponding author)
Camila Castillo-Vilcahuaman, camila.castillo.v@upch.pe (owner of the repository)

Files in this repository

bold_data_species.txt: This file contains the BOLD data used in this study. It uses a .tabular format.
PATRIC_genome.csv: This file contains the PATRIC data used in this study. It uses a .csv format.
used_queries_per_inst_nucleotide.txt: This file explains the query words used in the Nucleotide data.
list: This file contains all query words used in a practical list for for loops.

Script used

Scripts used in this study can be run using Jupyter notebooks with a bash kernel on Binder:

BinderBash used from here: https://github.com/gjbex/BinderBash.

Additionally, we submited:

mining_peru_sequence_DB.pdf (ENGLISH)
mining_peru_secuencias_DB.pdf (SPANISH)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

Files in this repository

Script used

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

Files in this repository

Script used