Skip to content

Commit

Permalink
Initial commit: reverse_complement.pl
Browse files Browse the repository at this point in the history
  • Loading branch information
singing-scientist committed May 30, 2018
1 parent 4ae9368 commit 1d285d0
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 4 deletions.
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Perl scripts providing **C**omputational **H**elp for the **A**nalysis of **Seq*
* **[gff2gtf.pl](#gff-to-gtf)**. You want to convert a GFF file to a simpler <a target="_blank" href="http://mblab.wustl.edu/GTF22.html">GTF</a> file, compatible with SNPGenie input.
* **[LinkGe\_all\_site\_pairs](#LinkGe-all-site-pairs)**. You have an aligned BAM file and a list of sites with their SNPs, and want to use Gabriel Starrett's <a target="_blank" href="https://github.com/gstarrett/LinkGe">LinkGe</a> program to check for linkage of SNPs in the same reads.
* **[remove\_seqs\_with\_stops.pl](#remove-seqs-with-stops)**. You have a FASTA file where each sequence is an in-frame coding sequence, and want to remove all sequences containing mid-sequence STOP codons.
* **[reverse\_complement.pl](#reverse-complement)**. You want to obtain the reverse complement of a sequence supplied at the command line.
* **[split\_fasta.pl](#split-fasta)**. You have a FASTA file, and want to create an individual FASTA file for each sequence inside.
* **[store\_fasta\_by\_ID.pl](#store-fasta-by-ID)**. You want to create a new FASTA file containing only a certain subset of another FASTA.
* **[vcf2revcom.pl](#vcf-to-revcom)**. You want to convert a VCF SNP report file, along with its accompanying FASTA file and <a target="_blank" href="http://mblab.wustl.edu/GTF22.html">GTF</a> file, to the reverse complement strand.
Expand Down Expand Up @@ -108,10 +109,6 @@ If a script isn't working, try working through the following checklist:

* <a name="gff-to-gtf"></a>**gff2gtf.pl**. (*Helpful for preparing **<a target="_blank" href="https://github.com/chasewnelson/snpgenie">SNPGenie</a>** input!*) **DEPRECATED!** Use a tool like <a target="_blank" href="https://ccb.jhu.edu/software/stringtie/gff.shtml#gffread">gffread</a>.

* <a name="remove-seqs-with-stops"></a>**remove\_seqs\_with\_stops.pl**. You have a FASTA file where each sequence is an in-frame coding sequence, and want to remove all sequences containing mid-sequence STOP codons. At the command line, provide this script with one argument: a FASTA (.fa or .fasta) file containing multiple coding sequences that all begin at the first position of the first codon (they need not be aligned to each other). This script will create a new FASTA file in the working directory, containing just those sequences lacking an in-frame mid-sequence STOP codon. Here's an example:

remove_seqs_with_stops.pl <my_fasta_file.fasta>

* <a name="LinkGe-all-site-pairs"></a>**LinkGe\_all\_site\_pairs.pl**. You have an aligned BAM file and a list of sites with their SNPs, and want to use Gabriel Starrett's <a target="_blank" href="https://github.com/gstarrett/LinkGe">LinkGe</a> program to check for linkage of SNPs in the same reads. For example, you may wish to test if high-frequency variant nucleotides are present in linked haplotypes in a deep-sequenced viral sample from a single host. At the command line, provide this script with the named arguments:
* *--snp\_file*: REQUIRED. One TAB-delimited, CLC-style "SNP report" file with, at minimum, the following three columns (with headers):
* **"Reference Position"**: the site number of the single nucleotide variant within the aligned BAM reads.
Expand All @@ -126,6 +123,14 @@ If a script isn't working, try working through the following checklist:

Note that this script requires you to <a target="_blank" href="https://github.com/gstarrett/LinkGe">install LinkGe</a>, and to <a target="_blank" href="http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/">add it to your computer's PATH</a>. Three dependencies are also required: Bio::DB::Sam, BioPerl, and Samtools version 0.1.15. See the <a target="_blank" href="https://github.com/gstarrett/LinkGe">LinkGe page</a> for details.

* <a name="remove-seqs-with-stops"></a>**remove\_seqs\_with\_stops.pl**. You have a FASTA file where each sequence is an in-frame coding sequence, and want to remove all sequences containing mid-sequence STOP codons. At the command line, provide this script with one argument: a FASTA (.fa or .fasta) file containing multiple coding sequences that all begin at the first position of the first codon (they need not be aligned to each other). This script will create a new FASTA file in the working directory, containing just those sequences lacking an in-frame mid-sequence STOP codon. Here's an example:

remove_seqs_with_stops.pl <my_fasta_file.fasta>

* <a name="reverse-complement"></a>**reverse\_complement.pl**. You want to obtain the reverse complement of a sequence supplied at the command line. Simply provide this script with one argument, the sequence:

reverse_complement.pl <nucleotide_sequence>

* <a name="split-fasta"></a>**split\_fasta.pl**. You want to create an individual FASTA file for each sequence in another FASTA file. At the command line, provide this script with one argument: a FASTA (.fa or .fasta) file containing multiple sequences. This script will create multiple files in the working directory, each containing one of the sequences. Here's an example:

split_fasta.pl <my_fasta_file.fasta>
Expand Down
21 changes: 21 additions & 0 deletions reverse_complement.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#! /usr/bin/perl

# Takes in one FASTA file as an argument and returns the reverse complement in a
# similarly named file

use strict;
use warnings;

# Read in the sequence from the command line
my $seq = $ARGV[0] or die "\n### PROVIDE A NUCLEOTIDE SEQUENCE\n\n";
chomp($seq);
$seq = uc($seq); # uc returns uppercase
$seq =~ tr/U/T/; # no RNA

my $rev_seq = reverse($seq);
my $rev_com_seq = $rev_seq;
$rev_com_seq =~ tr/ACGT/TGCA/;

print "$rev_com_seq\n";

exit;

0 comments on commit 1d285d0

Please sign in to comment.