Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the complete pipeline #17

Open
serpei opened this issue Oct 4, 2023 · 4 comments
Open

Run the complete pipeline #17

serpei opened this issue Oct 4, 2023 · 4 comments

Comments

@serpei
Copy link

serpei commented Oct 4, 2023

Hi,
thank you for your super-interesting tool.

I have some questions about the snakemake_config.yaml (If I plan to run all pipeline together using snakemake)

  1. This file has to stay in the IRIS folder? So, if I want to run IRIS on multiple batches of patients I cannot do this in parallel?
  2. If I want to run all steps to define the degree of association I only have to set run_all_modules at "true" and insert a list of normals in tissue_matched_normal_reference_group_names and tumors in tumor_reference_group_names?
  3. Which are the differences between tissue_matched_normal_reference_group_names and normal_reference_group_names?
  4. Can the user add its normal controls? And, if yes, how?
  5. What's the meaning of "blocklist"?
  6. What does mean comparison_mode "group" or "individual"?
  7. For which kind of analyses you suggest stat_test_type parametric or non parametric?
  8. Has sample_fastqs a maximum?
  9. Are novel events considered automatically?
  10. The parameter splice_event_type: has a user to run separately each event type?

Sorry to bother you,
thank you,
Serena

@EricKutschera
Copy link
Contributor

  1. The snakemake_config.yaml is intended to stay in the IRIS/ folder. If you want to run IRIS multiple times in parallel you could create a separate install of IRIS in a different folder. I think another option is to just put a new copy of the IRIS code in a new folder and then set these config paths to point to the main installed version of IRIS: https://github.com/Xinglab/IRIS/blob/v2.0.1/snakemake_config.yaml#L105
    If you have multiple runs at the same time then they should have different run_name config values

  2. To run all steps you should set run_all_modules: true. If you want to use tissue_matched_normal or tumor references then you should fill out all the config values for that reference group: https://github.com/Xinglab/IRIS/blob/v2.0.1/snakemake_config.yaml#L61
    At least one of tissue_matched_normal or normal is required

  3. There are separate output files for the comparison against the tissue_matched_normal (tier 1) and the comparison against all 3 of tissue_matched_normal, tumor, and normal (tier 3): https://github.com/Xinglab/IRIS/tree/v2.0.1#example-output

  4. When you run the pipeline it will add a new directory to IRIS_data/db using the run_name from the config. After running the pipeline you can then use the results in a future run as one of the reference_group_names

  5. From https://github.com/Xinglab/IRIS/blob/v2.0.1/example/parameter_file_description.txt#L40

Removes the AS events that are error-prone due to artifacts

Here's the example file https://github.com/Xinglab/IRIS/blob/v2.0.1/IRIS/data/blocklist.brain_2020.txt

  1. and 7. From https://github.com/Xinglab/IRIS/blob/v2.0.1/example/parameter_file_description.txt#L36

Comparison mode & statistical test type: 'group' mode (number of input samples >=2) and 'individual mode' (number of input sample =1) are provided. 'group' mode is default and recommended; for PSI-based tests, 'parametric' and 'nonparametric' tests are supported. 'parametric' is default

  1. No maximum

  2. The snakemake does not use the novelSS parameter

  3. The snakemake only supports one event type at a time. If you want output for each event type then you need to run multiple times

@serpei
Copy link
Author

serpei commented Oct 6, 2023

Thank you so much for your explanations!
Given that I would like to consider all events types and also novelSS I think I have to use the single functions to build a pipeline. Do you agree?
Given so, the single functions support multithreading and I can fix number of core to be using in each of them?
Thank you again,
Serena

@EricKutschera
Copy link
Contributor

Building your own pipeline from the individual functions is reasonable. Depending on how much you want to change, you could try modifying the provided snakemake workflow instead of building a new pipeline from scratch

The single functions use multithreading, but they don't take the number of cores as a parameter. For example: https://github.com/Xinglab/IRIS/blob/v2.0.1/snakemake_config.yaml#L7
https://github.com/Xinglab/IRIS/blob/v2.0.1/IRIS/IRIS_process_rnaseq.py#L13
https://github.com/Xinglab/IRIS/blob/v2.0.1/IRIS/IRIS_process_rnaseq.py#L27

You can edit the files to change the number of threads

@serpei
Copy link
Author

serpei commented Jan 9, 2024

Excuse me for the late response. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants