diff --git a/CHANGELOG.md b/CHANGELOG.md index 4ce31823..f798aeac 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ - Update GeoMx NGS directory schema - Update shared upload check - Release Xenium +- Create Xenium directory schema ## v0.0.26 - Update GeoMx NGS directory schema diff --git a/docs/xenium/current/index.md b/docs/xenium/current/index.md index 4421353d..d71dd519 100644 --- a/docs/xenium/current/index.md +++ b/docs/xenium/current/index.md @@ -28,5 +28,25 @@ Related files:
## Directory schemas -Version 2.0 (use this one) (draft - submission of data prepared using this schema will be supported by Sept. 30) +Version 2.0 (use this one) + +| pattern | required? | description | +| --- | --- | --- | +| extras\/.* | ✓ | Folder for general lab-specific files related to the dataset. | +| extras\/microscope_hardware\.json | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. | +| extras\/microscope_settings\.json | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. | +| raw\/.* | ✓ | All raw data files for the experiment. | +| raw\/markers\.csv | | A csv file describing any morphology markers used to guide ROI and/or AOI selection. | +| raw\/additional_panels_used\.csv | | If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code. | +| raw\/custom_probe_set\.csv | | This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see ). | +| raw\/custom_probe_set\.bed | | This is a BED file version of the custom probe set file. | +| raw\/transcript_locations\.csv | ✓ | The origin of the coordinate is 0,0 at the top left corner of the image. The file should include: gene name, x, y, z (optional), quality score (optional). It is expected that the first row in the file contains the column header. | +| raw\/custom_gene_list\.csv | | This describes the target genes profiled by the assay. For advanced design, this can be probes sequences for splicing or other analysis for any target of interest. The format should minimally contain: gene name, ensemble ID | +| raw\/probes\.csv | | A CSV file describing the probe panel used. This is tyipcally what's used to specific the probe set when ordering a probe panel for a Xenium run. | +| raw\/gene_panel\.json | ✓ | This is the JSON file describing the probes, as output from the xenium-ranger pipeline. | +| raw\/images\/overlay\.(?:jpeg|tiff) | | State whether an overlay image was used to guide ROI selection. If an overlay is used, then the overlay details will be provided in the protocols.io protocol. If used, this needs to be uploaded. It is not included in the OME TIFF. This can be a JPEG or TIFF file | +| lab_processed\/.* | ✓ | Experiment files that were processed by the lab generating the data. | +| lab_processed\/images\/.* | ✓ | Processed image files | +| lab_processed\/images\/[^\/]+\.ome\.tiff (example: lab_processed/images/HBM892.MDXS.293.ome.tiff) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. | +| lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed | diff --git a/src/ingest_validation_tools/directory-schemas/xenium-v2.0.yaml b/src/ingest_validation_tools/directory-schemas/xenium-v2.0.yaml index affbb91d..f33cc595 100644 --- a/src/ingest_validation_tools/directory-schemas/xenium-v2.0.yaml +++ b/src/ingest_validation_tools/directory-schemas/xenium-v2.0.yaml @@ -1,4 +1,74 @@ -draft: true files: - - draft_link: 'https://docs.google.com/spreadsheets/d/1LE-iyY2E6eP4E8jhgP6rhsvjESrdHXWYrMwKTvNkI5Y' \ No newline at end of file + pattern: extras\/.* + required: True + description: Folder for general lab-specific files related to the dataset. + - + pattern: extras\/microscope_hardware\.json + required: True + description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. + is_qa_qc: True + - + pattern: extras\/microscope_settings\.json + required: False + description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. + is_qa_qc: True + - + pattern: raw\/.* + required: True + description: All raw data files for the experiment. + - + pattern: raw\/markers\.csv + required: False + description: A csv file describing any morphology markers used to guide ROI and/or AOI selection. + - + pattern: raw\/additional_panels_used\.csv + required: False + description: If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code. + - + pattern: raw\/custom_probe_set\.csv + required: False + description: This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see ). + - + pattern: raw\/custom_probe_set\.bed + required: False + description: This is a BED file version of the custom probe set file. + - + pattern: raw\/transcript_locations\.csv + required: True + description: "The origin of the coordinate is 0,0 at the top left corner of the image. The file should include: gene name, x, y, z (optional), quality score (optional). It is expected that the first row in the file contains the column header." + - + pattern: raw\/custom_gene_list\.csv + required: False + description: "This describes the target genes profiled by the assay. For advanced design, this can be probes sequences for splicing or other analysis for any target of interest. The format should minimally contain: gene name, ensemble ID" + - + pattern: raw\/probes\.csv + required: False + description: A CSV file describing the probe panel used. This is tyipcally what's used to specific the probe set when ordering a probe panel for a Xenium run. + - + pattern: raw\/gene_panel\.json + required: True + description: This is the JSON file describing the probes, as output from the xenium-ranger pipeline. + - + pattern: raw\/images\/overlay\.(?:jpeg|tiff) + required: False + description: State whether an overlay image was used to guide ROI selection. If an overlay is used, then the overlay details will be provided in the protocols.io protocol. If used, this needs to be uploaded. It is not included in the OME TIFF. This can be a JPEG or TIFF file + - + pattern: lab_processed\/.* + required: True + description: Experiment files that were processed by the lab generating the data. + - + pattern: lab_processed\/images\/.* + required: True + description: Processed image files + - + pattern: lab_processed\/images\/[^\/]+\.ome\.tiff + required: True + description: OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. + is_qa_qc: False + example: lab_processed/images/HBM892.MDXS.293.ome.tiff + - + pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv + required: True + description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed + is_qa_qc: False