Python scripts for Xena RNAseq repository
This repository contains tools for interacting with the UCSD Xenas dataset of RNA sequence data with samples of 12,000 cancer RNA sequences and 8,000 normal tissue RNA sequences. The development format is primarily using Python in Jupyter Notebooks. As time goes on we will also repackage some of these as python library functions.
A sample of the initial dataset
Distance function for Samples
Distance function for RNA sequences
Confusion matrix for decodeing 46 cell types and 46 cell types which are cancer
Downloadable interactive confusion matrix
Genesets notebook Functions
- reads RNA seq dataset to memory
- creates Neural Networks to classify RNA sequence data into tissue type and tissue type with tumor
- Produces interactive html webpage plots using plotly for
- selecting Samples and RNA pathway gene sets
- sorted source data
- confusion matrix
- sample distance plots
- RNA distance plots
- Add a function to create networks from correllation plot
- Investigate additional information to be mined from the dataset using Neural Networks to predict the following data
- RNA -> Tumor/Normal (done)
- RNA -> Tissue type (done)
- RNA -> Tumor type
- RNA -> DNA mutations (DNA trio)
- RNA -> RNA - # predict one half of the RNA with the other half
- RNA -> low dimensional Encoding -> RNA - reduce dimensions and reconstruct RNA
- RNA -> Correlation -> Network -> Pathways and changes
- RNA -> Pathways -> Tissue and Tumor (done) (Rob's experiment)
- RNA -> mixture of cell type in tissues (multiple types of blood sample)
- RNA -> next RNA based on time series (brain cells over time)
- Trained RNA weights -> Pathway predictions
- RNA -> Survival prediction
- RNA from one lab -> RNA from another lab
- RNA -> GCN -> Pathway changes
- RNA -> GCN -> CNN -> predictions
- RNA -> random pathway traces -> CNN -> predictions
- RNA -> drug targets, drugs, survival