Initial commit: reverse_complement.pl

chasewnelson · May 30, 2018 · 1d285d0 · 1d285d0
1 parent 4ae9368
commit 1d285d0
Show file tree

Hide file tree

Showing 2 changed files with 30 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -19,6 +19,7 @@ Perl scripts providing **C**omputational **H**elp for the **A**nalysis of **Seq*
 	* **[gff2gtf.pl](#gff-to-gtf)**. You want to convert a GFF file to a simpler <a target="_blank" href="http://mblab.wustl.edu/GTF22.html">GTF</a> file, compatible with SNPGenie input.
 	* **[LinkGe\_all\_site\_pairs](#LinkGe-all-site-pairs)**. You have an aligned BAM file and a list of sites with their SNPs, and want to use Gabriel Starrett's <a target="_blank" href="https://github.com/gstarrett/LinkGe">LinkGe</a> program to check for linkage of SNPs in the same reads.
 	* **[remove\_seqs\_with\_stops.pl](#remove-seqs-with-stops)**. You have a FASTA file where each sequence is an in-frame coding sequence, and want to remove all sequences containing mid-sequence STOP codons.
+	* **[reverse\_complement.pl](#reverse-complement)**. You want to obtain the reverse complement of a sequence supplied at the command line.
 	* **[split\_fasta.pl](#split-fasta)**. You have a FASTA file, and want to create an individual FASTA file for each sequence inside.
 	* **[store\_fasta\_by\_ID.pl](#store-fasta-by-ID)**. You want to create a new FASTA file containing only a certain subset of another FASTA.
 	* **[vcf2revcom.pl](#vcf-to-revcom)**. You want to convert a VCF SNP report file, along with its accompanying FASTA file and <a target="_blank" href="http://mblab.wustl.edu/GTF22.html">GTF</a> file, to the reverse complement strand.
@@ -108,10 +109,6 @@ If a script isn't working, try working through the following checklist:
 
 * <a name="gff-to-gtf"></a>**gff2gtf.pl**. (*Helpful for preparing **<a target="_blank" href="https://github.com/chasewnelson/snpgenie">SNPGenie</a>** input!*) **DEPRECATED!** Use a tool like <a target="_blank" href="https://ccb.jhu.edu/software/stringtie/gff.shtml#gffread">gffread</a>.
 
-* <a name="remove-seqs-with-stops"></a>**remove\_seqs\_with\_stops.pl**. You have a FASTA file where each sequence is an in-frame coding sequence, and want to remove all sequences containing mid-sequence STOP codons. At the command line, provide this script with one argument: a FASTA (.fa or .fasta) file containing multiple coding sequences that all begin at the first position of the first codon (they need not be aligned to each other). This script will create a new FASTA file in the working directory, containing just those sequences lacking an in-frame mid-sequence STOP codon. Here's an example:
-
-        remove_seqs_with_stops.pl <my_fasta_file.fasta>
-
 * <a name="LinkGe-all-site-pairs"></a>**LinkGe\_all\_site\_pairs.pl**. You have an aligned BAM file and a list of sites with their SNPs, and want to use Gabriel Starrett's <a target="_blank" href="https://github.com/gstarrett/LinkGe">LinkGe</a> program to check for linkage of SNPs in the same reads. For example, you may wish to test if high-frequency variant nucleotides are present in linked haplotypes in a deep-sequenced viral sample from a single host. At the command line, provide this script with the named arguments:
 	* *--snp\_file*: REQUIRED. One TAB-delimited, CLC-style "SNP report" file with, at minimum, the following three columns (with headers):
 		* **"Reference Position"**: the site number of the single nucleotide variant within the aligned BAM reads.
@@ -126,6 +123,14 @@ If a script isn't working, try working through the following checklist:
 
 	Note that this script requires you to <a target="_blank" href="https://github.com/gstarrett/LinkGe">install LinkGe</a>, and to <a target="_blank" href="http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/">add it to your computer's PATH</a>. Three dependencies are also required: Bio::DB::Sam, BioPerl, and Samtools version 0.1.15. See the <a target="_blank" href="https://github.com/gstarrett/LinkGe">LinkGe page</a> for details.
 
+* <a name="remove-seqs-with-stops"></a>**remove\_seqs\_with\_stops.pl**. You have a FASTA file where each sequence is an in-frame coding sequence, and want to remove all sequences containing mid-sequence STOP codons. At the command line, provide this script with one argument: a FASTA (.fa or .fasta) file containing multiple coding sequences that all begin at the first position of the first codon (they need not be aligned to each other). This script will create a new FASTA file in the working directory, containing just those sequences lacking an in-frame mid-sequence STOP codon. Here's an example:
+
+        remove_seqs_with_stops.pl <my_fasta_file.fasta>
+
+* <a name="reverse-complement"></a>**reverse\_complement.pl**. You want to obtain the reverse complement of a sequence supplied at the command line. Simply provide this script with one argument, the sequence:
+
+        reverse_complement.pl <nucleotide_sequence>
+
 * <a name="split-fasta"></a>**split\_fasta.pl**. You want to create an individual FASTA file for each sequence in another FASTA file. At the command line, provide this script with one argument: a FASTA (.fa or .fasta) file containing multiple sequences. This script will create multiple files in the working directory, each containing one of the sequences. Here's an example:
 
         split_fasta.pl <my_fasta_file.fasta>

diff --git a/reverse_complement.pl b/reverse_complement.pl
@@ -0,0 +1,21 @@
+#! /usr/bin/perl
+
+# Takes in one FASTA file as an argument and returns the reverse complement in a
+# similarly named file
+
+use strict;
+use warnings;
+
+# Read in the sequence from the command line
+my $seq = $ARGV[0] or die "\n### PROVIDE A NUCLEOTIDE SEQUENCE\n\n";
+chomp($seq);
+$seq = uc($seq); # uc returns uppercase
+$seq =~ tr/U/T/; # no RNA
+
+my $rev_seq = reverse($seq);
+my $rev_com_seq = $rev_seq;
+$rev_com_seq =~ tr/ACGT/TGCA/;
+
+print "$rev_com_seq\n";
+
+exit;