Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 1.27 KB

README.md

File metadata and controls

26 lines (16 loc) · 1.27 KB

Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

This repository contains all data related to the paper "Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases". For more information about this work, please contact:

  1. Pedro Romero, [email protected] (corresponding author)
  2. Camila Castillo-Vilcahuaman, [email protected] (owner of the repository)

Files in this repository

  1. bold_data_species.txt: This file contains the BOLD data used in this study. It uses a .tabular format.
  2. PATRIC_genome.csv: This file contains the PATRIC data used in this study. It uses a .csv format.
  3. used_queries_per_inst_nucleotide.txt: This file explains the query words used in the Nucleotide data.
  4. list: This file contains all query words used in a practical list for for loops.

Script used

Scripts used in this study can be run using Jupyter notebooks with a bash kernel on Binder:

Binder

BinderBash used from here: https://github.com/gjbex/BinderBash.

Additionally, we submited:

  1. mining_peru_sequence_DB.pdf (ENGLISH)
  2. mining_peru_secuencias_DB.pdf (SPANISH)