Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Add an "Tutorials" section to the docs with a demo on TCGA-LUAD #3

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

BorisMuzellec
Copy link
Collaborator

@BorisMuzellec BorisMuzellec commented Jan 9, 2025

This PR adds a tutorial gallery to the online docs, containing a simple demonstration of FedPyDESeq2 on TCGA-LUAD (in simulated mode).

To test this PR, you should run

poetry install --with testing,linting,docs

and then build the docs using

mkdocs serve

Note: this will probably take a little more than an hour to run.

@BorisMuzellec BorisMuzellec marked this pull request as ready for review January 9, 2025 16:19
Copy link
Collaborator

@umarteauowkin umarteauowkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Boris, thanks for this PR. I have requested changes for 3 reasons.

  • You should add ipynb = "*" (or a stricter version constraint if you prefer) to the docs group dependency. It didn't work for me without and worked with.
  • You should add the following lines to the gitignore to avoid pushing the files created when running mkdocs serve:
docs/generated/gallery/
docs/examples/data/raw/tcga/
  • A remark on the fact that we assume a given structure for the data: we assume this structure in simulation mode only (it would make no sense in remote mode to assume all the data from all the centers to be thus organised). Maybe add a line saying that we assume this structure for simulation mode, and that in remote mode, we assume that the data has been registered to substra beforehadn, in the form of a folder containing two csvs (metadata and counts), as well as an opener and the md file that goes with it (see substra doc for how to register a datasample). Then, we only need pass the dataset_datasample_keys path.

Note that I tested running everything adding ipynb and it works. Thats great ! If you do these modifications, feel free to merge without asking for a re validation.

# %%
# ## Dataset setup
#
# FedPyDESeq2 assumes the data to be organized in the following structure:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only the case in "simulation mode". In remote mode, the data has been registered to substra, and so there is no need for this specific folder organisation. In remote mode, we expect each center to have registered its data as a data sample. This datasample is a folder containing the counts.csv and metadata.csv file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add "To run the experiments locally" (i.e. in simulation) before yoru sentence and add a remark on the remote mode somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @umarteauowkin, this is a very good point indeed!

Let me rewrite a bit

pyproject.toml Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants