Skip to content

ma-compbio/RACA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

          RACA (Reference-Assisted Chromosome Assembly)
          =============================================

       This is the version used in the original PNAS paper.

0. Warning
----------
    Please do not use a '.' symbol as a part of a chromosome name.
    The '.' symbol is used as a delimiter to separate a species name and a chromosome name in output files. 
    If the '.' symbol is already in a chromosome name in your sequence files, please convert it to other symbol,
    like '_', before creating the chain/net files and read mapping files. 


1. How to compile?
------------------

     Just type 'make' to compile the RACA package.


2. How to run?
--------------

    2.1 Configuration file

        RACA requires a single configuration file as a parameter. 
		The configuration file has all parameters that are needed for RACA.

		Example dataset below has a sample configuration file 'params.txt' 
		and all other parameter files which are self-explanatory.
		(also refer to the data directory.) 
        
		Please read carefully the description of each configuration variable 
		and modify them as needed. 


    2.2 Run RACA 

        There is a wrapper Perl script, 'Run_RACA.pl'. To run RACA, type as:

            <path to RACA>/Run_RACA.pl <path to the configuration file> 
 
3. Example dataset (Tibetan antelope assembly)
----------------------------------------------

	3.1 Download the dataset

		Visit http://bioinfo.konkuk.ac.kr/RACA/ and click the 
		link "Tibetan antelope (TA) data". Then you can download the file 
		TAdata.tgz file. 

	3.2 Compile the dataset

		Go into the directory where you downloaded the TAdata.tgz file and run:

		tar xvfz TAdata.tgz
		cd TAdata/
		make

	3.3 Run RACA for the dataset

		In the TAdata directory run:

		<path to RACA>/Run_RACA.pl params.txt

	3.4 Where are output files?

		In the TAdata/Out_RACA directory.
		

4. What are produced?
---------------------

    In the output directory that is specified in the above configuration file, 
	the following files are produced.

    - rec_chrs.refined.txt 

        This file contains the order and orientation of target scaffolds in 
		each reconstructed RACA chromosome fragment. Each column is defined 
		as:  

            Column1: the RACA chromosome fragment id
            Column2: start position (0-based) in the RACA chromosome fragment
            Column3: end position (1-based) in the RACA chromosome fragment 
            Column4: target scaffold id or 'GAPS'
            Column5: start position (0-based) in the target scaffold
            Column6: end position (1-based) in the target scaffold

    - rec_chrs.<ref_spc>.segments.refined.txt

        This file contains the mapping between the RACA chromosome fragments 
		and the genome sequences of the reference species <ref_spc>. 

    - ref_chrs.<tar_spc>.segments.refined.txt
        
        This file contains the mapping between the RACA chromosome fragments 
		and the genome sequences of the target species <tar_spc>. 
    
    - ref_chrs.<out_spc>.segments.refined.txt
        
        This file contains the mapping between the RACA chromosome fragments 
		and the genome sequences of the outgroup species <out_spc>. This file 
		is created for each outgroup species. 

    - rec_chrs.adjscores.txt

        This file constins the adjacency scores that were used to reconstruct 
		the RACA chromosome fragments. Each column is defined as:

            Column1: the RACA chromosome fragment id
            Column2: start position (1-based) in the RACA chromosome fragment
            Column3: end position (1-based) in the RACA chromosome fragment 
			Column4: the adjacency score

    - rec_chrs.size.txt

        This file contains the total size (the second column) and the total 
		number of target scaffolds (the third column) that are placed in each 
		RACA chromosome fragment (the first column).  

    There are other intermediate files and directories in the output directory.
	 They can be safely ignored.  

4. How to ask questions?
------------------------

    Contact Jaebum Kim ([email protected])



About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published