-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Add an "Tutorials" section to the docs with a demo on TCGA-LUAD #3
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Boris, thanks for this PR. I have requested changes for 3 reasons.
- You should add
ipynb = "*"
(or a stricter version constraint if you prefer) to the docs group dependency. It didn't work for me without and worked with. - You should add the following lines to the gitignore to avoid pushing the files created when running
mkdocs serve
:
docs/generated/gallery/
docs/examples/data/raw/tcga/
- A remark on the fact that we assume a given structure for the data: we assume this structure in simulation mode only (it would make no sense in remote mode to assume all the data from all the centers to be thus organised). Maybe add a line saying that we assume this structure for simulation mode, and that in remote mode, we assume that the data has been registered to substra beforehadn, in the form of a folder containing two csvs (metadata and counts), as well as an opener and the md file that goes with it (see substra doc for how to register a datasample). Then, we only need pass the dataset_datasample_keys path.
Note that I tested running everything adding ipynb and it works. Thats great ! If you do these modifications, feel free to merge without asking for a re validation.
docs/examples/plot_demo.py
Outdated
# %% | ||
# ## Dataset setup | ||
# | ||
# FedPyDESeq2 assumes the data to be organized in the following structure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only the case in "simulation mode". In remote mode, the data has been registered to substra, and so there is no need for this specific folder organisation. In remote mode, we expect each center to have registered its data as a data sample. This datasample is a folder containing the counts.csv and metadata.csv file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add "To run the experiments locally" (i.e. in simulation) before yoru sentence and add a remark on the remote mode somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @umarteauowkin, this is a very good point indeed!
Let me rewrite a bit
Co-authored-by: umarteauowkin <[email protected]>
… remote modes in terms of data organization
This PR adds a tutorial gallery to the online docs, containing a simple demonstration of FedPyDESeq2 on TCGA-LUAD (in simulated mode).
To test this PR, you should run
poetry install --with testing,linting,docs
and then build the docs using
mkdocs serve
Note: this will probably take a little more than an hour to run.