Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reshape mutation matrix for use by core-service repository #34

Closed
stephenshank opened this issue Dec 5, 2016 · 3 comments
Closed

Reshape mutation matrix for use by core-service repository #34

stephenshank opened this issue Dec 5, 2016 · 3 comments

Comments

@stephenshank
Copy link
Member

The current format of the mutation matrix leads to some complications in the core-service repository. A more desirable format to work with for the purpose of populating the core-service mutation model would be of the form:

sample_id	entrez_gene_id
TCGA-18-3406-01	1
TCGA-38-4631-01	1
...
@stephenshank
Copy link
Member Author

See #35.

@dhimmel
Copy link
Member

dhimmel commented Dec 5, 2016

I agree this is an important step, but I think we will want to do it slightly differently than in #35. I think we should do the processing in 2.TCGA-process.ipynb where the mutation data starts out in a melted format. I also think we may want to add some additional columns like mutation severity which will be useful for the frontend in the future.

Until we sort these things out, can you use the workaround here for cognoma/core-service#42 (which is a super high priority PR, so let's complete that ASAP):

path = 'mutation-matrix.tsv.bz2'
read_file = bz2.open(path , 'rt')
reader = csv.DictReader(read_file, delimiter='\t')
for row in reader:
    sample_id = row.pop('sample_id')
    for entrez_gene_id, mutation_status in row.items():
        if mutation_status == '1':
            # Create mutation from entrez_gene_id, sample_id
reader.close()

@stephenshank
Copy link
Member Author

bz2 module for the win! So simple this way!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants