A script to create a gene-focussed BrigdeDb database based on Ensembl BioMART.
Java 11 is required.
Compile the code with:
mvn clean install
cp target/org.bridgedb.genedb-jar-with-dependencies.jar BioMart2BridgeDb.jar
In your terminal:
java -jar BioMart2BridgeDb.jar <configFile> <outputPath> <oldDB> <inclusive>
<configFile>: location of configuration file
<outputPath>: Path for the new database
<oldDB>: (optional) directory of the old database - run QC
<inclusive>: (optional) use inclusive BridgeDb list
Configuration files can be found in https://github.com/bridgedb/create-bridgedb-genedb-config/tree/master/configFiles.
Example: Arabidopsis thaliana config file
Give the version of Ensembl BioMart to query:
e.g: http://www.ensembl.org/biomart/, http://oct2014.archive.ensembl.org/biomart/, http://nov2020-metazoa.ensembl.org/biomart/
You can find an overview of releases in the Ensembl Archive, Metazoa Archive, Plants Archive, Fungi Archive.
MartRegistry for plants v49 can be found here:
e.g: plants_mart, metazoa_mart, default
Code name of the animal species: http://www.ensembl.org/biomart/martservice?type=datasets&mart=ENSEMBL_MART_ENSEMBL, Metazoa v49, Plants v49 and, Fungi v49
The name of the bridge database
database_name=Arabidopsis thaliana genes and proteins
The name of the file .bridge created
The different data source code name for Arabidopsis thaliana can be found here:
probe_datasource=Affy,Agilent probe_set=affy_aragene,affy_ath1_121501,agilent_g2519f_015059,agilent_g2519f_021169,agilent_g4136a_011839,agilent_g4136b_013324,agilent_g4142a_012600 gene_datasource=entrezgene_id,go_id,mirbase_accession,mirbase_id,pdb,refseq_dna,refseq_peptide,uniprotsptrembl,uniprotswissprot,tair_locus,nasc_gene_id
Optional filters (chromosome list) for Arabidopsis thaliana can be found here: https://nov2020-plants.ensembl.org/biomart/martservice?type=filters&dataset=athaliana_eg_gene
e.g: chromosome_name=1,2,3,4,5,Pt,Mt