Map reads to rodrep #63

cnluzon · 2020-09-25T11:34:12Z

Sometimes we are interested in mapping reads to repetitive references to get an idea of repetitive element representation in the sample.

However this is not the same case as genomic reference since there is no generation of bigwig files and so on. It is just a step that would do a mapping and idxstats/flagstat of the resulting BAM file. These values would ideally be included in the mapping report (as global #reads mapped to given reference) and an extra report, resulting in a table of counts per sample and reference (as in idxstats file).

This can be done including some optional extra references index in the config.yaml and adding the necessary extra steps.

The text was updated successfully, but these errors were encountered:

marcelm · 2020-10-08T11:19:24Z

So if I understand correctly, "rodrep" refers to the part of RepBase that covers repeats in rodents.

If we add such a step, we have to exclude it from automated testing because of the way RepBase is licensed, see this Bioinformatics Stack Exchange question for a discussion.

Due to the licensing problems, we should IMO invest a little bit of time into investigating whether it would be possible to use some alternative as discussed in the answers to the SE question. Perhaps Dfam as suggested there works.

cnluzon · 2020-10-09T08:51:59Z

Due to the licensing problems, we should IMO invest a little bit of time inte investigating whether it would be possible to use some alternative as discussed in the answers to the SE answer. Perhaps Dfam as suggested there works.

I agree. I have had Dfam in my radar for a while because of this.

It can be kept in mind that the functionality of allowing this mapping step to a reference that is not necessarily a genome, where we are only interested in counts but not in bigwig files) can be conceived independently of where the data comes from.

It is true that we need some kind of dataset for testing, and if there is no useful open alternative to do this (in case Dfam wouldn't work) maybe there is no point in implementing it. But I guess we could also come up with some self-generated useful annotation.

There are other situations where this mapping option could be of use:

If set to another genome it allows to account for some degree of cross-contamination. We have had experimental settings before where this was meaningful because of how cells were grown, so we wanted to map to hg38 really, but account for mm9 mappings, for instance.
This may be redundant with the cutadapt step anyway, but: "decoy" like sequences like the ones present in Analysis sets. In this case it would not work as decoy but it would be some kind of QC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map reads to rodrep #63

Map reads to rodrep #63

cnluzon commented Sep 25, 2020

marcelm commented Oct 8, 2020 •

edited

Loading

cnluzon commented Oct 9, 2020

Map reads to rodrep #63

Map reads to rodrep #63

Comments

cnluzon commented Sep 25, 2020

marcelm commented Oct 8, 2020 • edited Loading

cnluzon commented Oct 9, 2020

marcelm commented Oct 8, 2020 •

edited

Loading