-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue running DCC #58
Comments
Dear @cmonger, thank you for reporting this error. I recently added a fix to address empty lines in input files and it seems this error may be an unintended side effect of that fix. I should be able to provide a fix shortly. Thank you, |
Dear @cmonger, I fixed the issue in the latest push, please update your DCC installation. You may, however additionally supply the BAM files via the Cheers, |
@tjakobi |
Dear @cmonger, the BAM files are used in any case by DCC, the parameter is just used to directly supply the names - otherwise DCC tries to guess the names from the supplied Chimeric.out.junction file names. If provided via the -B command, the list should only contain the "main" mapping BAMs, not the additional mate mappings. Cheers, |
Hello, I am having the same error. Here is the command I use:
Here is the error I receive:
Thanks |
Dear @udube, thank you for reporting the issues. It seems like you only provided one mate via I'll include a appropriate warning message in the next commit. Cheers, |
Hi, Like in a previous issue, I cannot get DCC to run past the first few steps with an error produced (same as #58 ). I just downloaded DCC yesterday so i'm sure this is the most current version. I am following the tutorial steps with my own data. I have created the samplesheet, mate1, and mate2 files. Here is the error: Bash script contents: GENOME=/archive/miuralab/dknupp/Index/STAR/mm10-star DCC @samplesheet I have checked, and samplesheet, mate1, and mate2 do not have any empty lines. So i'm not sure where this error is being produced. Also for the genome.fa file, I built the index using STAR (which I used in the first steps for the alignment). These files don't have .fa or .fai after them, will this be a problem? Thanks for the help, Dave Incase this is helpful, below are contents of input files samplesheet file content: mate1 file content: mate2 file content: contents of repeats .gtf (head -n3): contents of genepredictions .gtf (head -n3): |
Dear @knuppd, thank you for your feedback. Just to get the versions straight, did you do a Cheers, |
Hi @knuppd, I did not compile a new release of DCC yet, therefore the latest release does not yet include the fix from this thread. If you do the git clone you will get the version with the fixed code. Cheers, |
Hi @tjakobi, Thank you I was able to get the program to run after downloading using git. However, I did not get two of the output files: LinearCount and CircSkipJunctions. It seems as though one of the bam files maybe caused an error but i'm not sure if this is the cause of the problem. (I have counts though, for this bam file assigned to the circRNA output file (below)). CircRNACount File:
|
Hi @knuppd, thanks for reporting back. Are the BAM files sorted and have the .bai index file in the same folder? DCC assumes that that is the case. Cheers, |
Hi @tjakobi , They say that they are ".sortedByCoord.out.bam" But i have not run samtools index. Was I supposed to do this prior to running DCC? Here is the contents of the folder in which both mate1 and mate2 were mapped to the genome. Best, Dave |
Dear @knuppd, STAR already sorted the BAM files correctly but they are missing the index. You can create the index via To speed things up you may run the command in a loop so you don't have to start it for every file yourself: Cheers, |
Hi @tjakobi, Thanks for the help. I am still not getting the ouput, maybe something is wrong with the python i have from miniconda? It seems to be having problems. Maybe I should try a different version? current version i'm using is 2.7.15. BAM file for KORep3 looks like the other files so i'm not sure what this header mistake is or if it is related to the earlier flag. Here is my log file:
|
Dear @knuppd, it seems this is more an issue with a FASTA file index than with the BAM files. In the directory containing the genome FASTA file, could you please regenerate the index?
Cheers, |
Dear @tjakobi Thank you very much for your help getting DCC to run. I really appreciate it. I have now successfully gotten the program to run with the anticipated output files. I guess my last question is if its normal to have repeats of circRNA cooridinates? In my output file of circRNACount there are repeats of circRNAs (same coordinates) but they have different count values?
Best, Dave |
Dear @knuppd, that indeed looks strange. Did you look in the Cheers, |
Hi @tjakobi Checking the CircCoordinates file, yes it does seem that DCC is detecting a circRNA on both strands. Should the -ss option be used for stranded libraries? I thought the default was "stranded". Does -ss have to be set regardless? Best, |
Hi @knuppd, generally, do you see the majority of the circRNAs annotated as "not annotated"? That would be an indicator that Cheers |
Hi @tjakobi, Yes, I generally see that one of the two coordinates (the one with the proper strand) will have an annotated GeneID and the other will not. I have illumina, dUTP libraries. So should I be using |
Hi @knuppd, something I should definitely put in the documentation. I also wrote it at some point as part of the protocol paper for the circtools workflow:
Source: Deep Computational Circular RNA Analytics from RNA-seq Data From this list the correct choice would be to not use |
Hi @tjakobi, Thank you for the info. Doesn't it seem odd though that I have a large proportion of my detected circRNAs having different strands (i.e. strands not of the parent mRNA)? From my CircCoordinates output file, I have 3258 detected circRNAs. If I remove those that have the same chromosome, start and stop position I am left with 2209 circRNAs (i.e. if there are 2 circRNAs with the same chr/start/stop, only 1 will be kept and the other removed). That means 33% of the detected circRNAs are on the opposite strand as the mRNA is coded. |
options used while running DCC:
|
Dear @knuppd, indeed this behavior is note expected. Would you be able to share one or more of the BAM files (+the chimeric junction files) for further debugging? I can provide necessary upload capacity if required. Cheers |
Hi @tjakobi, Yes I can, do you have an email I can contact you with? I could upload them via dropbox to you. Best, |
Hi @knuppd, just use [email protected]. Thank you! |
Dear @tjakobi, I have shared these files with your email above via dropbox. Best, Dave |
Dear @knuppd, I am downloading the files now. I will report back as soon as I can pinpoint the issue. Cheers, |
Hi @knuppd, for your reference, I was able to reproduce the issue and try go investigate what is happening there. Cheers, |
Dear @knuppd, I looked through the BAM file, especially the header and saw that you used STAR 2.6.0b. STAR changed the chimeric output format with version 2.6.0+, therefore requiring to additionally specify
I could not find the parameter in the CLI call for STAR, maybe you could rerun the mapping with the additional flag and try running DCC on the newly generated junction files? Cheers, |
Hi @tjakobi, I have tried re-running the alignments with this option. However the same problem persists. For reference, here is the alignment command:
Best, Dave |
Thank you for your effort @knuppd. Would it be possible to compare the junction files and - if they are different from the previous run - upload them? Cheers, |
Hi @tjakobi, I looked at the chimeric junction files and after sorting them numerically on the first 2 columns and comparing the first 20 lines of each they appear identical. I also check the individual mate1 and mate2 chimeric junction files as well. Best, Dave |
Hi @knuppd, thank you for looking into this. I looked into the BAM file with rseqc and it seems the the library is not strand specific?
In case of a non-stranded library, DCC cannot know from which strand a read originates and will probably end up putting 50% of the reads on one strand each. In this case DCC also cannot determine if the circRNA is novel or just belongs to the annotated gene on the other strand. Since I hardly see unstranded libraries anymore the code base for this workflow may not work perfectly - DCC is supposed to assign circRNAs in this case based on known annotations. I will have to look into this. Could you run DCC again with Cheers, |
Hi @tjakobi, Thank you, I will try running the data with the -N option. The data I am analyzing is from a 2019 paper that specified in the methods that the library prep they used was Illumina ribo-zero tru-seq protocol, which is why I assumed this was stranded data. Since this data appears to be unstranded (maybe they used an old kit?) can I still use DCC/CircTest for differential expression of circRNAs? Best, Dave |
Dear @knuppd, Yes, you may still use DCC for analyzing the data set. Please let me know how results look for the Cheers, |
Hi @tjakobi, I have finished running DCC with the -N flag. It appears to have fixed the problem, I am not detecting any duplicate lines in the file. I appreciate the help! Best, Dave |
Hi @knuppd, okay, perfect. I will close this issue here, please do not hesitate to open a new issue if you face any problems. Cheers, |
Dear Dieterich-lab,
I'm having some issues running DCC (version 0.4.7) which I hope you can help me with.
I'm trying to run the software in a virtual environment on python version 2.7.15rc1
The only packages installed in the environment are FUCHs, DCC, and the relavent dependancies.
When running DCC I get this output/error:
I get the same error if I use the suggested parameters (and replicate samples and mates supplied in samplesheet/mate1/mate2 files) or even when running the most 'simple' command possible e.g:
Please let me know if there is any information I can provide to help fix this issue.
Thanks,
Craig
The text was updated successfully, but these errors were encountered: