Skip to content

Simple interface to BioMart (Python -> rpy2 -> R/BioConductor's biomaRt)

Notifications You must be signed in to change notification settings

daler/biomartpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

biomartpy

Simple interface to access BioMart from Python (Python -> rpy2 -> R's biomaRt -> pandas.DataFrame), originally written to get a lookup table of gene IDs -> various attributes for downstream work...

Install from PyPI:

$ pip install biomartpy

Or from github:

$ git clone [email protected]:daler/biomartpy.git
$ cd biomartpy
$ python setup.py develop

Choose a mart (use list_marts() to decide):

>>> mart_name = 'ensembl'

Choose a dataset (use list_datasets(mart_name) to decide):

>>> dataset = 'dmelanogaster_gene_ensembl'

Choose some attributes (use list_attributes(mart_name, dataset) to decide):

>>> attributes = ['flybase_gene_id', 'flybasename_gene', 'description']

Get a pandas.DataFrame as a lookup table, indexed by the first attribute in the provided list:

>>> df = make_lookup(mart_name, dataset, attributes=attributes)

.ix to extract rows:

>>> df.ix['FBgn0031209']
flybasename_gene                                                Ir21a
description         Ionotropic receptor 21a [Source:FlyBase gene n...
Name: FBgn0031209

>>> df.ix['FBgn0031209']['flybasename_gene']
'Ir21a'

When providing filters and values, you can either provide them in the way R expects (filters is a list, values is a list-of-lists with one list for each filter) or as a more convenient dictionary (here, only geting these IDs, and only for chromosome 2L):

>>> filters = {
... 'flybase_gene_id': ['FBgn0031208', 'FBgn0002121', 'FBgn0031209', 'FBgn0051973'],
... 'chromosome_name': ['2L']}

Set up attributes (here, including chromosome_name to make sure results are correct, but attributes and filters don't have to necessarily match):

>>> attributes = ['flybase_gene_id', 'flybasename_gene', 'chromosome_name']

Get data:

>>> df = make_lookup(
... mart_name=mart_name,
... dataset=dataset,
... attributes=attributes,
... filters=filters)

Check results:

>>> df
                flybasename_gene chromosome_name
flybase_gene_id
FBgn0002121               l(2)gl              2L
FBgn0031208              CG11023              2L
FBgn0031209                Ir21a              2L
FBgn0051973                 Cda5              2L

About

Simple interface to BioMart (Python -> rpy2 -> R/BioConductor's biomaRt)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages