Skip to content
ignasrum edited this page Nov 28, 2021 · 9 revisions

HoCoRT

Host Contamination Removal Tool (HoCoRT)

A host contamination removal tool for Linux and Mac OS

What is HoCoRT?

HoCoRT stands for Host Contamination Removal Tool. Its purpose is to simplify and improve the process of host contamination removal from sequencing reads. It does not do any quality checking or low complexity region masking, just host contamination removal.

What problem does HoCoRT solve?

As DNA sequencing technologies improved and High Throughput Sequencing (HTS) became accessible, the amount of DNA sequencing data increased greatly. These advancements greatly advanced the field of microbiome research which is very important to our understanding of the interaction between a host and the different microbial communities within it.

The raw microbiome sequencing data contains host sequences which we often consider as contamination and want to remove to both improve our analysis results, but also to avoid publishing sensitive data such as human DNA when researching the human microbiome.

This is currently done mostly in an ad-hoc way as there are few/no host contamination removal tools.

How does HoCoRT accomplish its goals?

HoCoRT wraps already existing aligners and classifiers such as Bowtie2 and Kraken2 to remove host contamination.

  1. The sequencing data is mapped to the reference genome.
  2. The sequences which map well are removed and the remaining sequences are written to output files.

Some of the pipelines combine multiple mappers/classifiers in an attempt to improve the results.

Clone this wiki locally