-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEATURE] added DBAdapter usage ipython notebook draft.
- Loading branch information
mwalzer
committed
Nov 13, 2015
1 parent
1b54842
commit a457dfb
Showing
4 changed files
with
1,585 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,3 +8,4 @@ dist/* | |
.pythoscope | ||
.idea | ||
*.pyc | ||
Fred2/tutorials/.ipynb_checkpoints/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<h1> DBAdapter usage </h1>\n", | ||
"\n", | ||
"This tutorial illustrates the use of Fred2 to map gene names and retrieve database accessions and genetic or transcript sequences from a database source like BioMart. Fred2 can connect to a variety of DB sources both online and offline.\n", | ||
"Here, we will cover the use of Fred2 MartsAdapter as and example for online access and EnsemblAdapter for offline access." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"source": [ | ||
"<h2> Chapter 1: The basics </h2>\n", | ||
"<br/>\n", | ||
"We first start with importing the needed packages." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%matplotlib inline\n", | ||
"%load_ext autoreload\n", | ||
"%autoreload 2\n", | ||
"import sys\n", | ||
"sys.path.extend(['/home/walzer/immuno-tools/Fred2'])\n", | ||
"from Fred2.IO.MartsAdapter import MartsAdapter\n", | ||
"from Fred2.IO.EnsemblAdapter import EnsemblDB" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"For starters we will connect to the BioMart:\n", | ||
"\n", | ||
"Initializing the MartsAdapter, you can specify the URL under which the BioMart of your choice is to be reached by supporting the attribute biomart. If you do not choose a specific BioMart it will default to <a href=\"http://biomart.org\">http://biomart.org</a>. Here however, we will use: <a href=\"http://grch37.ensembl.org\">http://grch37.ensembl.org</a>. Please refer to the documentation of your BioMart to find the correct URL." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"mart_adapter = MartsAdapter(biomart=\"http://grch37.ensembl.org\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we can start using the BioMart. For a comprehensive list of methods implemented to the adapter, you can refer to the <a href=\"http://fred2.readthedocs.org/en/latest/Fred2.IO.html#module-Fred2.IO.MartsAdapter\">documentation</a>.\n", | ||
"\n", | ||
"You can fetch all different kinds of sequences with the adapter. We will start with a transcript sequence to the glucagon gene. You have to provide an identifier that will be known by the BioMart and identifies the <i>transcript</i>, in our <a href=\"http://www.ensembl.org/\">ensembl</a> case in the form \"ENST...\".\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": { | ||
"collapsed": false, | ||
"scrolled": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"<type 'str'>\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"TRANSCRIPT: ENST00000375497\n", | ||
"\tVARIANTS:\n", | ||
"\tSEQUENCE: ATGAAAAGCATTTACTTTGTGGCTGGATTATTTGTAATGCTGGTACAAGGCAGCTGGCAACGTTCCCTTCAAGACACAGAGGAGAAATCCAGATCATTCTCAGCTTCCCAGGCAGACCCACTCAGTGATCCTGATCAGATGAACGAGGACAAGCGCCATTCACAGGGCACATTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAAGATTTTGTGCAGTGGTTGATGAATACCAAGAGGAACAGGAATAACATTGCCAAACGTCACGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAGGTCGCCATTGTTGAAGAACTTGGCCGCAGACATGCTGATGGTTCTTTCTCTGATGAGATGAACACCATTCTTGATAATCTTGCCGCCAGGGACTTTATAAACTGGTTGATTCAGACCAAAATCACTGACAGGAAATAA (mRNA)" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"transcript = mart_adapter.get_transcript_sequence('ENST00000375497')\n", | ||
"print type(transcript)\n", | ||
"from Fred2.Core import Transcript\n", | ||
"fred2_transcript = Transcript(transcript, 'glucagon', 'ENST00000375497')\n", | ||
"fred2_transcript" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The adapter will yield a simple string. You can use this string to contruct your <a href=\"http://fred2.readthedocs.org/en/latest/Fred2.Core.html#module-Fred2.Core.Transcript\">transcript object</a>." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(539, 539)" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"mart_adapter.get_transcript_position('163002078', '163002078', 'ENSG00000115263', 'ENST00000375497')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"..." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'P53'" | ||
] | ||
}, | ||
"execution_count": 10, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"mart_adapter.get_variant_gene(17, 7565101, 7565101)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<h2> Chapter 2: Connecting to offline databases</h2>\n", | ||
"<br/>\n", | ||
"Fred2 Also supports the read from offline databases such as fasta and dat files as you can download from Ensebl, UniProt or RefSeq.\n", | ||
"To connect, you will have to initialize the corresponding adapter and feed it the location of your database file.\n", | ||
"\n", | ||
"As example, we will use the EnsemblAdapter. You can get the official sequence ressources from ensemble <a href=\"http://www.ensembl.org/info/data/ftp/index.html\">here</a>. However for this tutorial, we will use a small test excerpt from the ensembl Protein sequence (FASTA)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 14, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"ed = EnsemblDB()\n", | ||
"ed.read_seqs(\"data/Homo_sapiens.GRCh38.pep.test_stub.fa\")\n", | ||
"ed.read_seqs(\"data/Homo_sapiens.GRCh38.cds.test_stub.fa\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"..." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 17, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"ID: ENSP00000400926\n", | ||
"Name: ENSP00000400926\n", | ||
"Description: ENSP00000400926 pep:known chromosome:GRCh38:4:82818662:82891266:-1 gene:ENSG00000138674 transcript:ENST00000448323 gene_biotype:protein_coding transcript_biotype:protein_coding\n", | ||
"Number of features: 0\n", | ||
"Seq('MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPS...LGV', SingleLetterAlphabet())\n", | ||
"ATGAAGTTAAAGGAAGTAGATCGTACAGCCATGCAGGCATGGAGCCCTGCCCAGAATCACCCCATTTACCTAGCAACAGGAACATCTGCTCAGCAATTGGATGCAACATTTAGTACGAATGCTTCCCTTGAGATATTTGAATTAGACCTCTCTGATCCATCCTTGGATATGAAATCTTGTGCCACATTCTCCTCTTCTCACAGGTACCACAAGTTGATTTGGGGGCCTTATAAAATGGATTCCAAAGGAGATGTCTCTGGAGTTCTGATTGCAGGTGGTGAAAATGGAAATATTATTCTCTATGATCCTTCTAAAATTATAGCTGGAGACAAGGAAGTTGTGATTGCCCAGAATGACAAGCATACTGGCCCAGTGAGAGCCTTGGATGTGAACATTTTCCAGACTAATCTGGTAGCTTCTGGTGCTAATGAATCTGAAATCTACATATGGGATCTAAATAATTTTGCAACCCCAATGACACCAGGAGCCAAAACACAGCCGCCAGAAGATATCAGCTGCATTGCATGGAACAGACAAGTTCAGCATATTTTAGCATCAGCCAGTCCCAGTGGCCGGGCCACTGTATGGGATCTTAGAAAAAATGAGCCAATCATCAAAGTCAGTGACCATAGTAACAGAATGCATTGTTCTGGGTTGGCATGGCATCCTGATGTTGCTACTCAGATGGTCCTTGCCTCCGAGGATGACCGGTTACCAGTGATCCAGATGTGGGATCTTCGATTTGCTTCCTCTCCACTTCGTGTCCTGGAAAACCATGCCAGGGGGATTTTGGCAATTGCTTGGAGCATGGCAGATCCTGAATTGTTACTGAGCTGTGGAAAAGATGCTAAGATTCTCTGCTCCAATCCAAACACAGGAGAGGTGTTATATGAACTTCCCACCAACACACAGTGGTGCTTCGATATTCAGTGGTGTCCCCGAAATCCTGCTGTCTTATCAGCTGCTTCGTTTGATGGGCGTATCAGTGTTTATTCTATCATGGGAGGTAGCACAGATGGTTTAAGACAGAAACAAGTTGACAAGCTTTCATCATCTTTTGGGAATCTTGATCCCTTTGGCACAGGACAGCCCCTTCCTCCGTTACAAATTCCACAGCAGACTGCTCAGCATAGTATAGTGCTGCCTCTGAAGAAGCCGCCCAAGTGGATTCGAAGGCCTGTTGGTGCTTCTTTTTCATTTGGAGGCAAACTGGTTACGTTTGAGAATGTCAGAATGCCTTCTCATCAGGGAGCTGAGCAGCAGCAGCAGCAGCACCATGTGTTCATTAGTCAGGTTGTAACAGAAAAGGAGTTCCTCAGCCGATCAGACCAACTTCAGCAGGCTGTGCAGTCACAAGGATTTATCAATTATTGCCAAAAAAAAATTGATGCTTCTCAGACTGAATTTGAGAAAAATGTGTGGTCCTTTTTGAAGGTAAACTTTGAGGATGATTCTCGTGGAAAATACCTTGAACTTCTAGGATACAGAAAAGAAGATCTAGGAAAGAAGATTGCTTTGGCCTTGAACAAAGTGGATGGAGCCAATGTGGCTCTTAAAGACTCTGACCAAGTAGCACAGAGTGATGGGGAGGAGAGCCCTGCTGCTGAAGAGCAGCTCTTGGGAGAGCACATTAAAGAGGAAAAAGAAGAATCTGAATTTCTACCCTCATCTGGAGGAACATTTAATATCTCTGTCAGTGGGGACATTGATGGTTTAATTACTCAGGCTTTGCTGACGGGCAATTTTGAGAGTGCTGTTGACCTTTGTTTACATGATAACCGCATGGCCGATGCCATTATATTGGCCATAGCAGGTGGACAAGAACTCTTGGCTCGAACCCAGAAAAAATACTTCGCAAAATCCCAAAGCAAAATTACCAGGCTCATCACTGCAGTGGTGATGAAGAACTGGAAAGAGATTGTTGAGTCTTGTGATCTTAAAAATTGGAGAGAGGCTTTAGCTGCAGTATTGACTTATGCAAAGCCGGATGAATTTTCAGCCCTTTGTGATCTTTTGGGAACCAGGCTTGAAAATGAAGGAGATAGCCTCCTGCAGACTCAAGCATGTCTCTGCTATATTTGTGCAGGGAATGTAGAGAAATTAGTTGCATGTTGGACTAAAGCTCAAGATGGAAGCCACCCTTTGTCACTTCAGGATCTGATTGAGAAAGTTGTCATCCTGCGAAAAGCTGTGCAACTCACTCAAGCCATGGACACTAGTACTGTAGGAGTTCTCTTGGCTGCGAAGATGAGTCAGTATGCCAATTTGTTGGCAGCTCAGGGCAGTATTGCTGCAGCCTTGGCTTTTCTTCCTGACAACACCAACCAGCCAAATATCATGCAGCTTCGTGACAGACTTTGTAGAGCACAAGGAGAGCCTGTAGCAGGACATGAATCACCTAAAATTCCGTACGAGAAACAGCAGCTCCCCAAGGGCAGGCCTGGACCAGTTGCTGGCCACCACCAGATGCCAAGAGTTCAAACTCAACAATATTATCCCCATGGAGAAAATCCTCCACCTCCGGGTTTCATAATGCATGGAAATGTTAATCCAAATGCTGCTGGTCAGCTTCCCACATCTCCAGGTCATATGCACACCCAGGTACCACCTTATCCACAGCCACAGCCTTATCAACCAGCCCAGCCGTATCCCTTCGGAACAGGGGGGTCAGCAATGTATCGACCTCAGCAGCCTGTTGCTCCTCCTACTTCAAACGCTTACCCTAACACCCCTTACATATCTTCTGCTTCTTCCTATACTGGGCAGTCTCAGCTGTACGCAGCACAGCACCAGGCCTCTTCACCTACCTCCAGCCCTGCTACTTCTTTCCCTCCTCCCCCTTCCTCTGGAGCATCCTTCCAGCATGGCGGACCAGGAGCTCCACCATCATCTTCAGCTTATGCACTGCCTCCTGGAACAACAGGTACACTGCCTGCTGCCAGTGAGCTGCCTGCGTCCCAAAGAACAGGTCCTCAGAATGGTTGGAATGACCCTCCAGCTTTGAACAGAGTACCCAAAAAGAAGAAGATGCCTGAAAACTTCATGCCTCCTGTTCCCATCACATCACCAATCATGAACCCGTTGGGTGACCCCCAGTCACAAATGCTGCAGCAACAGCCTTCAGCTCCAGTACCACTGTCAAGCCAGTCTTCATTCCCACAGCCACATCTTCCAGGTGGCCAGCCCTTCCATGGCGTACAGCAACCTCTTGGTCAAACAGGCATGCCACCATCTTTTTCAAAGCCCAATATTGAAGGTGCCCCAGGGGCTCCTATTGGAAATACCTTCCAGCATGTGCAGTCTTTGCCAACAAAAAAAATTACCAAGAAACCTATTCCAGATGAGCACCTCATTCTAAAGACCACATTTGAGGATCTTATTCAGCGCTGCCTTTCTTCAGCAACAGACCCTCAAACCAAGAGGAAGCTAGATGATGCCAGCAAACGTTTGGAGTTTCTGTATGATAAACTTAGGGAACAGACACTTTCACCAACAATCACCAGTGGTTTACACAACATTGCAAGGAGCATTGAAACTCGAAACTACTCAGAAGGATTGACCATGCATACCCACATAGTTAGCACCAGCAACTTCAGTGAGACCTCTGCTTTCATGCCAGTTCTCAAAGTTGTTCTCACCCAGGCCAATAAGCTGGGTGTCTAA\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print ed.get_product_sequence('ENSP00000400926')\n", | ||
"print ed.get_transcript_sequence('ENST00000395310')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To combine prediction results we can use `merge_results` from `Fred2.Core`. In addition to the result object we want to merge, also have to specify the type of these objects (here `EpitopePredictionResult`). The function will return a merged results object of the same type." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 2", | ||
"language": "python", | ||
"name": "python2" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Oops, something went wrong.