-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document FeatureCounts requirements to GTF more appropriately #144
Comments
Here is an example of part of a gtf that did not work for us: |
There are multiple possibilities: a.) Having a possibility to edit the options supplied to a.) Would also require us to adapt featureCounts merging processes in general, e.g. providing this option to the |
iGenomes also has NCBI and UCSC references, they're just not listed in the iGenomes config. We should probably add these. I think that they're normalised for a lot of stuff like this. |
Once there are some opinions in @ewels looking at u :-P , I'll have a go! |
Can you check with for example:
See if you get the same problem? Use https://ewels.github.io/AWS-iGenomes/ to get all required s3 URLs. |
iGenomes doesn't have the species I'm working with unfortunately :( |
Ah sorry, I was speed reading and didn't pick up on the non-model organism bit. GFF is a horrible format for exactly this reason, it's not really a specified format. The good news is that now I'm sat down and reading properly, I realise that this is a problem that we already came across ages ago and built in a feature to handle. So your fix is already part of the pipeline! It's even got documentation: https://github.com/nf-core/rnaseq/blob/master/docs/usage.md#featurecounts-extra-gene-names I guess that suppling the option |
ps. This is where it's used: Line 923 in e837637
From the SubRead documentation:
|
Hi Phil, thank you for your answers. This indeed pointed to the solution of the problem, even though not fully. Due to the different annotation in the GTF file, I had to change the featureCount call to https://github.com/ggabernet/rnaseq/blob/57c4475415b38994b50c6630f856d67f39605b57/main.nf#L932 Would it be a possibility to provide a parameter that allows changing the term for this call, in a similar way as for extra attributes? |
Absolutely - we can set this to |
YUp, will do |
This is now possible in the |
Perfect, thank you! |
We recently had a project with a non-standard organism project, where we had to download genome and GFF3 from NCBI instead of using the ENSEMBL ones. This caused featureCounts to not being able to create appropriate counts, as the
gene_id
was for example missing in that GTF/GFF.Proper format is:
https://github.com/nf-core/test-datasets/blob/rnaseq/reference/genes.gtf
@ggabernet will post an example of a GFF that didn't work well. I will then take care of writing down some docs on how to make sure the GFF/GTF works fine for an analysis...
The text was updated successfully, but these errors were encountered: