Skip to content

Commit

Permalink
Add gto_fasta_merge_streams and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
joaorafaelalmeida committed Oct 14, 2020
1 parent 50879cf commit 1fc26cc
Show file tree
Hide file tree
Showing 20 changed files with 201 additions and 11 deletions.
Binary file modified bin/gto
Binary file not shown.
Binary file added bin/gto_fasta_merge_streams
Binary file not shown.
Binary file modified bin/gto_fasta_split_streams
Binary file not shown.
1 change: 1 addition & 0 deletions conda/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,4 @@ cp bin/gto_segment $PREFIX/bin/
cp bin/gto_sum $PREFIX/bin/
cp bin/gto_upper_bound $PREFIX/bin/
cp bin/gto_word_search $PREFIX/bin/
cp bin/gto_fasta_split_streams $PREFIX/bin/
4 changes: 2 additions & 2 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@

package:
name: gto
version: '1.5.3'
version: '1.5.4'

source:
git_rev: v1.5.3
git_rev: v1.5.4
git_url: https://github.com/cobilab/gto.git

requirements:
Expand Down
Binary file modified manual/manual.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion manual/manual.tex
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
$^3$Department of Information and Communications Technologies, University of A Coru\~na, A Coru\~na, Spain\\
$^4$Department of Virology, University of Helsinki, Helsinki, Finland\\
~\\
Version 1.5.1
Version 1.5.4
}
\date{}
\maketitle
Expand Down
5 changes: 4 additions & 1 deletion manual/sections/FASTA_tools/FASTA_tools.tex
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ \chapter{FASTA tools}

\item \texttt{gto\char`_fasta\char`_split\char`_streams}: it splits and writes a FASTA file into three channels of information: headers, extra and DNA.

\item \texttt{gto\char`_fasta\char`_merge\char`_streams}: it merges the three channels of information (headers, extra and DNA) and writes it into a FASTA file.


\end{enumerate}

Expand All @@ -56,4 +58,5 @@ \chapter{FASTA tools}
\input{\FASTAToolsPath/FastaExtractPatternCoords.tex}
\input{\FASTAToolsPath/FastaComplement.tex}
\input{\FASTAToolsPath/FastaReverse.tex}
\input{\FASTAToolsPath/FastaSpitStreams.tex}
\input{\FASTAToolsPath/FastaSplitStreams.tex}
\input{\FASTAToolsPath/FastaMergeStreams.tex}
39 changes: 39 additions & 0 deletions manual/sections/FASTA_tools/FastaMergeStreams.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
\section{Program gto\char`_fasta\char`_merge\char`_streams}
The \texttt{gto\char`_fasta\char`_merge\char`_streams} merges the three channels of information (headers, extra and DNA) and writes it into a FASTA file. \\
For help type:
\begin{lstlisting}
./gto_fasta_merge_streams -h
\end{lstlisting}
In the following subsections, we explain the input and output paramters.

\subsection*{Input parameters}

The \texttt{gto\char`_fasta\char`_merge\char`_streams} program needs the three files resulting from the execution of the \texttt{gto\char`_fasta\char`_split\char`_streams} tool, and the output standard stream for computation. The output stream is a FASTA or Multi-FASTA file.\\
The attribution is given according to:
\begin{lstlisting}
Usage: ./gto_fasta_merge_streams [options] [[--] args]
or: ./gto_fasta_merge_streams [options]

It merges the three channels of information (headers, extra and DNA) and writes it into a FASTA file.

-h, --help Show this help message and exit

Basic options
-e, --extra=<str> Output file for the extra information
-d, --dna=<str> Output file for the DNA information
-H, --headers=<str> Output file for the headers information
> output Output FASTA file format (stdout)

Example: ./gto_fasta_merge_streams -e <filename> -d <filename> -H <filename> > output.fasta
\end{lstlisting}

\subsection*{Output}

The output of the \texttt{gto\char`_fasta\char`_merge\char`_streams} program is a FASTA or Multi-FASTA file.\\
Using the three output files of the \texttt{gto\char`_fasta\char`_split\char`_streams} tool as input in this example, the output of this tool is the following:
\begin{lstlisting}
>AB000264 |acc=AB000264|descr=Homo sapiens mRNA
ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCCGGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAAGTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCCGCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGCTAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA
>AB000263 |acc=AB000263|descr=Homo sapiens mRNA
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA
\end{lstlisting}
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ \subsection*{Input parameters}
-h, --help Show this help message and exit

Basic options
-e, --extra=<str> Output file for the extra information
-d, --dna=<str> Output file for the DNA information
-H, --headers=<str> Output file for the headers information
< input.fastq Input FASTA file format (stdin)

Example: ./gto_fasta_split_streams < input.fastq
Example: ./gto_fasta_split_streams -e <filename> -d <filename> -H <filename> < input.fasta
\end{lstlisting}
An example of such an input file is:
\begin{lstlisting}
Expand All @@ -41,4 +44,4 @@ \subsection*{Input parameters}

\subsection*{Output}

The output of the \texttt{gto\char`_fasta\char`_split\char`_streams} program are three files containing the headers, extra information and DNA.
The output of the \texttt{gto\char`_fasta\char`_split\char`_streams} program are three files containing the headers, extra information and DNA. The name of those files can be passed in the tool's paramenters. The default names are HEADERS.JV2, EXTRA.JV2 and DNA.JV2.
122 changes: 122 additions & 0 deletions src/FastaMergeStreams.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "argparse.h"
#include <unistd.h>


/*
* This application merges FASTA into three channels of information:
* - HEADERS;
* - EXTRA;
* - DNA.
*/
int main(int argc, char *argv[])
{

FILE *HEADERS, *EXTRA, *DNA;
int c, d = 0;
const char *output_headers = NULL;
const char *output_extra = NULL;
const char *output_dna = NULL;

char *programName = argv[0];
struct argparse_option options[] = {
OPT_HELP(),
OPT_GROUP("Basic options"),
OPT_STRING('e', "extra", &output_extra, "Output file for the extra information"),
OPT_STRING('d', "dna", &output_dna, "Output file for the DNA information"),
OPT_STRING('H', "headers", &output_headers, "Output file for the headers information"),
OPT_BUFF('>', "output", "Output FASTA file format (stdout)"),
OPT_END(),
};
struct argparse argparse;

char usage[250] = "\nExample: ";
strcat(usage, programName);
strcat(usage, " -e <filename> -d <filename> -H <filename> > output.fasta\n");

argparse_init(&argparse, options, NULL, programName, 0);
argparse_describe(&argparse, "\nIt merges the three channels of information (headers, extra and DNA) and writes it into a FASTA file.", usage);
argc = argparse_parse(&argparse, argc, argv);

if(argc != 0)
argparse_help_cb(&argparse, options);

if(output_headers == NULL)
output_headers = "HEADERS.JV2";

if((HEADERS = fopen (output_headers, "r")) == NULL)
{
fprintf(stderr, "Error: could not open file!");
return 1;
}

if(output_extra == NULL)
output_extra = "EXTRA.JV2";

if((EXTRA = fopen (output_extra, "r")) == NULL)
{
fprintf(stderr, "Error: could not open file!");
return 1;
}

if(output_dna == NULL)
output_dna = "DNA.JV2";

if((DNA = fopen (output_dna, "r")) == NULL)
{
fprintf(stderr, "Error: could not open file!");
return 1;
}

while((c = fgetc(EXTRA)) != EOF)
{

if(c == '>')
{
fprintf(stdout, "%c", c);
while((c = fgetc(HEADERS)) != EOF)
{
if(c == EOF) goto x;
fprintf(stdout, "%c", c);
if(c == '\n') break;
}
continue;
}

switch(c)
{

case 0:
if((d = fgetc(DNA)) == EOF)
{
fprintf(stderr, "Error: invalid format!");
return 1;
}
fprintf(stdout, "%c", d);
break;

case 1:
if((d = fgetc(DNA)) == EOF)
{
fprintf(stderr, "Error: invalid format!");
return 1;
}
fprintf(stdout, "%c", tolower(d));
break;

default:
fprintf(stdout, "%c", c);
break;
}
}

x:

if(!HEADERS) fclose(HEADERS);
if(!EXTRA) fclose(EXTRA);
if(!DNA) fclose(DNA);
return EXIT_SUCCESS;
}
2 changes: 1 addition & 1 deletion src/FastaSplitStreams.c
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ int main(int argc, char *argv[])

char usage[250] = "\nExample: ";
strcat(usage, programName);
strcat(usage, " -e <filename> -d <filename> -H <filename> < input.fastq\n");
strcat(usage, " -e <filename> -d <filename> -H <filename> < input.fasta\n");

argparse_init(&argparse, options, NULL, programName, 0);
argparse_describe(&argparse, "\nIt splits and writes a FASTA file into three channels of information: headers, extra and DNA.", usage);
Expand Down
13 changes: 10 additions & 3 deletions src/GTO.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ int main(int argc, char *argv[])
" ╚═════╝ ╚═╝ ╚═════╝ \n"
" \n"
"NAME \n"
" GTO v%u.%u.3, \n"
" GTO v1.5.4, \n"
" The Genomics-Proteomics Toolkit. \n"
" \n"
"AUTHORS \n"
Expand Down Expand Up @@ -243,6 +243,14 @@ int main(int argc, char *argv[])
" It uses the Chester-visual to visualize relative singularity \n"
" regions. \n"
" \n"
" [gto_fasta_split_streams] \n"
" It splits and writes a FASTA file into three channels of \n"
" information: headers, extra and DNA. \n"
" \n"
" [gto_fasta_merge_streams] \n"
" It merges the three channels of information (headers, extra \n"
" and DNA) and writes it into a FASTA file. \n"
" \n"
"Genomic Sequence Tools \n"
" [gto_genomic_count_bases] \n"
" It counts the number of bases in sequence, FASTA or \n"
Expand Down Expand Up @@ -399,7 +407,6 @@ int main(int argc, char *argv[])
" GTO: A toolkit to unify pipelines in genomic and proteomic research.\n",
" J. R. Almeida, A. J. Pinho, J. L. Oliveira, O. Fajarda, D. Pratas, \n",
" SoftwareX, Volume 12, 2020, 100535, \n",
" doi: https://doi.org/10.1016/j.softx.2020.100535 \n",
VERSION, RELEASE);
" doi: https://doi.org/10.1016/j.softx.2020.100535 \n");
return EXIT_SUCCESS;
}
5 changes: 4 additions & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ PROGS = $(BIN)/gto \
$(BIN)/gto_amino_acid_from_fasta \
$(BIN)/gto_amino_acid_from_fastq \
$(BIN)/gto_amino_acid_from_seq \
$(BIN)/gto_fasta_split_streams
$(BIN)/gto_fasta_split_streams \
$(BIN)/gto_fasta_merge_streams
#$(BIN)/gto_amino_acid_to_seq


Expand Down Expand Up @@ -258,6 +259,8 @@ $(BIN)/gto_amino_acid_from_seq: AminoAcidFromSeq.c $(DEPS) $(OBJS)
$(CC) $(CFLAGS) -o $(BIN)/gto_amino_acid_from_seq AminoAcidFromSeq.c $(OBJS) $(LIBS)
$(BIN)/gto_fasta_split_streams: FastaSplitStreams.c $(DEPS) $(OBJS)
$(CC) $(CFLAGS) -o $(BIN)/gto_fasta_split_streams FastaSplitStreams.c $(OBJS) $(LIBS)
$(BIN)/gto_fasta_merge_streams: FastaMergeStreams.c $(DEPS) $(OBJS)
$(CC) $(CFLAGS) -o $(BIN)/gto_fasta_merge_streams FastaMergeStreams.c $(OBJS) $(LIBS)

#$(BIN)/gto_amino_acid_to_seq: AminoAcidToSeq.c $(DEPS) $(OBJS)
# $(CC) $(CFLAGS) -o $(BIN)/gto_amino_acid_to_seq AminoAcidToSeq.c $(OBJS) $(LIBS)
Expand Down
1 change: 1 addition & 0 deletions tester/gto_fasta_merge_streams/DNA.JV2
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCCGGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAAGTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCCGCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGCTAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA
Binary file added tester/gto_fasta_merge_streams/EXTRA.JV2
Binary file not shown.
2 changes: 2 additions & 0 deletions tester/gto_fasta_merge_streams/HEADERS.JV2
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
AB000264 |acc=AB000264|descr=Homo sapiens mRNA
AB000263 |acc=AB000263|descr=Homo sapiens mRNA
4 changes: 4 additions & 0 deletions tester/gto_fasta_merge_streams/output.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
>AB000264 |acc=AB000264|descr=Homo sapiens mRNA
ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCCGGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAAGTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCCGCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGCTAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA
>AB000263 |acc=AB000263|descr=Homo sapiens mRNA
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA
2 changes: 2 additions & 0 deletions tester/gto_fasta_merge_streams/runExample.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
../../bin/gto_fasta_merge_streams -e EXTRA.JV2 -H HEADERS.JV2 -d DNA.JV2 > output.fasta
3 changes: 3 additions & 0 deletions tester/runAllTests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -213,4 +213,7 @@ sh runExample.sh
cd ..
cd gto_fasta_split_streams
sh runExample.sh
cd ..
cd gto_fasta_merge_streams
sh runExample.sh
cd ..

0 comments on commit 1fc26cc

Please sign in to comment.