Skip to content

Commit

Permalink
[FEATURE] added DBAdapter usage ipython notebook draft.
Browse files Browse the repository at this point in the history
  • Loading branch information
mwalzer committed Nov 13, 2015
1 parent 1b54842 commit a457dfb
Show file tree
Hide file tree
Showing 4 changed files with 1,585 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ dist/*
.pythoscope
.idea
*.pyc
Fred2/tutorials/.ipynb_checkpoints/*
250 changes: 250 additions & 0 deletions Fred2/tutorials/DBAdapterUsage.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1> DBAdapter usage </h1>\n",
"\n",
"This tutorial illustrates the use of Fred2 to map gene names and retrieve database accessions and genetic or transcript sequences from a database source like BioMart. Fred2 can connect to a variety of DB sources both online and offline.\n",
"Here, we will cover the use of Fred2 MartsAdapter as and example for online access and EnsemblAdapter for offline access."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"<h2> Chapter 1: The basics </h2>\n",
"<br/>\n",
"We first start with importing the needed packages."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%load_ext autoreload\n",
"%autoreload 2\n",
"import sys\n",
"sys.path.extend(['/home/walzer/immuno-tools/Fred2'])\n",
"from Fred2.IO.MartsAdapter import MartsAdapter\n",
"from Fred2.IO.EnsemblAdapter import EnsemblDB"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For starters we will connect to the BioMart:\n",
"\n",
"Initializing the MartsAdapter, you can specify the URL under which the BioMart of your choice is to be reached by supporting the attribute biomart. If you do not choose a specific BioMart it will default to <a href=\"http://biomart.org\">http://biomart.org</a>. Here however, we will use: <a href=\"http://grch37.ensembl.org\">http://grch37.ensembl.org</a>. Please refer to the documentation of your BioMart to find the correct URL."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mart_adapter = MartsAdapter(biomart=\"http://grch37.ensembl.org\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can start using the BioMart. For a comprehensive list of methods implemented to the adapter, you can refer to the <a href=\"http://fred2.readthedocs.org/en/latest/Fred2.IO.html#module-Fred2.IO.MartsAdapter\">documentation</a>.\n",
"\n",
"You can fetch all different kinds of sequences with the adapter. We will start with a transcript sequence to the glucagon gene. You have to provide an identifier that will be known by the BioMart and identifies the <i>transcript</i>, in our <a href=\"http://www.ensembl.org/\">ensembl</a> case in the form \"ENST...\".\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'str'>\n"
]
},
{
"data": {
"text/plain": [
"TRANSCRIPT: ENST00000375497\n",
"\tVARIANTS:\n",
"\tSEQUENCE: ATGAAAAGCATTTACTTTGTGGCTGGATTATTTGTAATGCTGGTACAAGGCAGCTGGCAACGTTCCCTTCAAGACACAGAGGAGAAATCCAGATCATTCTCAGCTTCCCAGGCAGACCCACTCAGTGATCCTGATCAGATGAACGAGGACAAGCGCCATTCACAGGGCACATTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAAGATTTTGTGCAGTGGTTGATGAATACCAAGAGGAACAGGAATAACATTGCCAAACGTCACGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAGGTCGCCATTGTTGAAGAACTTGGCCGCAGACATGCTGATGGTTCTTTCTCTGATGAGATGAACACCATTCTTGATAATCTTGCCGCCAGGGACTTTATAAACTGGTTGATTCAGACCAAAATCACTGACAGGAAATAA (mRNA)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transcript = mart_adapter.get_transcript_sequence('ENST00000375497')\n",
"print type(transcript)\n",
"from Fred2.Core import Transcript\n",
"fred2_transcript = Transcript(transcript, 'glucagon', 'ENST00000375497')\n",
"fred2_transcript"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The adapter will yield a simple string. You can use this string to contruct your <a href=\"http://fred2.readthedocs.org/en/latest/Fred2.Core.html#module-Fred2.Core.Transcript\">transcript object</a>."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(539, 539)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mart_adapter.get_transcript_position('163002078', '163002078', 'ENSG00000115263', 'ENST00000375497')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"..."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'P53'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mart_adapter.get_variant_gene(17, 7565101, 7565101)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2> Chapter 2: Connecting to offline databases</h2>\n",
"<br/>\n",
"Fred2 Also supports the read from offline databases such as fasta and dat files as you can download from Ensebl, UniProt or RefSeq.\n",
"To connect, you will have to initialize the corresponding adapter and feed it the location of your database file.\n",
"\n",
"As example, we will use the EnsemblAdapter. You can get the official sequence ressources from ensemble <a href=\"http://www.ensembl.org/info/data/ftp/index.html\">here</a>. However for this tutorial, we will use a small test excerpt from the ensembl Protein sequence (FASTA)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"ed = EnsemblDB()\n",
"ed.read_seqs(\"data/Homo_sapiens.GRCh38.pep.test_stub.fa\")\n",
"ed.read_seqs(\"data/Homo_sapiens.GRCh38.cds.test_stub.fa\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"..."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID: ENSP00000400926\n",
"Name: ENSP00000400926\n",
"Description: ENSP00000400926 pep:known chromosome:GRCh38:4:82818662:82891266:-1 gene:ENSG00000138674 transcript:ENST00000448323 gene_biotype:protein_coding transcript_biotype:protein_coding\n",
"Number of features: 0\n",
"Seq('MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPS...LGV', SingleLetterAlphabet())\n",
"ATGAAGTTAAAGGAAGTAGATCGTACAGCCATGCAGGCATGGAGCCCTGCCCAGAATCACCCCATTTACCTAGCAACAGGAACATCTGCTCAGCAATTGGATGCAACATTTAGTACGAATGCTTCCCTTGAGATATTTGAATTAGACCTCTCTGATCCATCCTTGGATATGAAATCTTGTGCCACATTCTCCTCTTCTCACAGGTACCACAAGTTGATTTGGGGGCCTTATAAAATGGATTCCAAAGGAGATGTCTCTGGAGTTCTGATTGCAGGTGGTGAAAATGGAAATATTATTCTCTATGATCCTTCTAAAATTATAGCTGGAGACAAGGAAGTTGTGATTGCCCAGAATGACAAGCATACTGGCCCAGTGAGAGCCTTGGATGTGAACATTTTCCAGACTAATCTGGTAGCTTCTGGTGCTAATGAATCTGAAATCTACATATGGGATCTAAATAATTTTGCAACCCCAATGACACCAGGAGCCAAAACACAGCCGCCAGAAGATATCAGCTGCATTGCATGGAACAGACAAGTTCAGCATATTTTAGCATCAGCCAGTCCCAGTGGCCGGGCCACTGTATGGGATCTTAGAAAAAATGAGCCAATCATCAAAGTCAGTGACCATAGTAACAGAATGCATTGTTCTGGGTTGGCATGGCATCCTGATGTTGCTACTCAGATGGTCCTTGCCTCCGAGGATGACCGGTTACCAGTGATCCAGATGTGGGATCTTCGATTTGCTTCCTCTCCACTTCGTGTCCTGGAAAACCATGCCAGGGGGATTTTGGCAATTGCTTGGAGCATGGCAGATCCTGAATTGTTACTGAGCTGTGGAAAAGATGCTAAGATTCTCTGCTCCAATCCAAACACAGGAGAGGTGTTATATGAACTTCCCACCAACACACAGTGGTGCTTCGATATTCAGTGGTGTCCCCGAAATCCTGCTGTCTTATCAGCTGCTTCGTTTGATGGGCGTATCAGTGTTTATTCTATCATGGGAGGTAGCACAGATGGTTTAAGACAGAAACAAGTTGACAAGCTTTCATCATCTTTTGGGAATCTTGATCCCTTTGGCACAGGACAGCCCCTTCCTCCGTTACAAATTCCACAGCAGACTGCTCAGCATAGTATAGTGCTGCCTCTGAAGAAGCCGCCCAAGTGGATTCGAAGGCCTGTTGGTGCTTCTTTTTCATTTGGAGGCAAACTGGTTACGTTTGAGAATGTCAGAATGCCTTCTCATCAGGGAGCTGAGCAGCAGCAGCAGCAGCACCATGTGTTCATTAGTCAGGTTGTAACAGAAAAGGAGTTCCTCAGCCGATCAGACCAACTTCAGCAGGCTGTGCAGTCACAAGGATTTATCAATTATTGCCAAAAAAAAATTGATGCTTCTCAGACTGAATTTGAGAAAAATGTGTGGTCCTTTTTGAAGGTAAACTTTGAGGATGATTCTCGTGGAAAATACCTTGAACTTCTAGGATACAGAAAAGAAGATCTAGGAAAGAAGATTGCTTTGGCCTTGAACAAAGTGGATGGAGCCAATGTGGCTCTTAAAGACTCTGACCAAGTAGCACAGAGTGATGGGGAGGAGAGCCCTGCTGCTGAAGAGCAGCTCTTGGGAGAGCACATTAAAGAGGAAAAAGAAGAATCTGAATTTCTACCCTCATCTGGAGGAACATTTAATATCTCTGTCAGTGGGGACATTGATGGTTTAATTACTCAGGCTTTGCTGACGGGCAATTTTGAGAGTGCTGTTGACCTTTGTTTACATGATAACCGCATGGCCGATGCCATTATATTGGCCATAGCAGGTGGACAAGAACTCTTGGCTCGAACCCAGAAAAAATACTTCGCAAAATCCCAAAGCAAAATTACCAGGCTCATCACTGCAGTGGTGATGAAGAACTGGAAAGAGATTGTTGAGTCTTGTGATCTTAAAAATTGGAGAGAGGCTTTAGCTGCAGTATTGACTTATGCAAAGCCGGATGAATTTTCAGCCCTTTGTGATCTTTTGGGAACCAGGCTTGAAAATGAAGGAGATAGCCTCCTGCAGACTCAAGCATGTCTCTGCTATATTTGTGCAGGGAATGTAGAGAAATTAGTTGCATGTTGGACTAAAGCTCAAGATGGAAGCCACCCTTTGTCACTTCAGGATCTGATTGAGAAAGTTGTCATCCTGCGAAAAGCTGTGCAACTCACTCAAGCCATGGACACTAGTACTGTAGGAGTTCTCTTGGCTGCGAAGATGAGTCAGTATGCCAATTTGTTGGCAGCTCAGGGCAGTATTGCTGCAGCCTTGGCTTTTCTTCCTGACAACACCAACCAGCCAAATATCATGCAGCTTCGTGACAGACTTTGTAGAGCACAAGGAGAGCCTGTAGCAGGACATGAATCACCTAAAATTCCGTACGAGAAACAGCAGCTCCCCAAGGGCAGGCCTGGACCAGTTGCTGGCCACCACCAGATGCCAAGAGTTCAAACTCAACAATATTATCCCCATGGAGAAAATCCTCCACCTCCGGGTTTCATAATGCATGGAAATGTTAATCCAAATGCTGCTGGTCAGCTTCCCACATCTCCAGGTCATATGCACACCCAGGTACCACCTTATCCACAGCCACAGCCTTATCAACCAGCCCAGCCGTATCCCTTCGGAACAGGGGGGTCAGCAATGTATCGACCTCAGCAGCCTGTTGCTCCTCCTACTTCAAACGCTTACCCTAACACCCCTTACATATCTTCTGCTTCTTCCTATACTGGGCAGTCTCAGCTGTACGCAGCACAGCACCAGGCCTCTTCACCTACCTCCAGCCCTGCTACTTCTTTCCCTCCTCCCCCTTCCTCTGGAGCATCCTTCCAGCATGGCGGACCAGGAGCTCCACCATCATCTTCAGCTTATGCACTGCCTCCTGGAACAACAGGTACACTGCCTGCTGCCAGTGAGCTGCCTGCGTCCCAAAGAACAGGTCCTCAGAATGGTTGGAATGACCCTCCAGCTTTGAACAGAGTACCCAAAAAGAAGAAGATGCCTGAAAACTTCATGCCTCCTGTTCCCATCACATCACCAATCATGAACCCGTTGGGTGACCCCCAGTCACAAATGCTGCAGCAACAGCCTTCAGCTCCAGTACCACTGTCAAGCCAGTCTTCATTCCCACAGCCACATCTTCCAGGTGGCCAGCCCTTCCATGGCGTACAGCAACCTCTTGGTCAAACAGGCATGCCACCATCTTTTTCAAAGCCCAATATTGAAGGTGCCCCAGGGGCTCCTATTGGAAATACCTTCCAGCATGTGCAGTCTTTGCCAACAAAAAAAATTACCAAGAAACCTATTCCAGATGAGCACCTCATTCTAAAGACCACATTTGAGGATCTTATTCAGCGCTGCCTTTCTTCAGCAACAGACCCTCAAACCAAGAGGAAGCTAGATGATGCCAGCAAACGTTTGGAGTTTCTGTATGATAAACTTAGGGAACAGACACTTTCACCAACAATCACCAGTGGTTTACACAACATTGCAAGGAGCATTGAAACTCGAAACTACTCAGAAGGATTGACCATGCATACCCACATAGTTAGCACCAGCAACTTCAGTGAGACCTCTGCTTTCATGCCAGTTCTCAAAGTTGTTCTCACCCAGGCCAATAAGCTGGGTGTCTAA\n"
]
}
],
"source": [
"print ed.get_product_sequence('ENSP00000400926')\n",
"print ed.get_transcript_sequence('ENST00000395310')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To combine prediction results we can use `merge_results` from `Fred2.Core`. In addition to the result object we want to merge, also have to specify the type of these objects (here `EpitopePredictionResult`). The function will return a merged results object of the same type."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading

0 comments on commit a457dfb

Please sign in to comment.