This is a full-stack application that ingests data into a MongoDB database, acts as an API layer, and will eventually be an entire front-end servelet as well. The goal is to provide Broad Insititute employees in the proteomics division with easy access to data regarding previous experiments.
The API is created with the Node.js MVC framework, and utilizes Express.js. It allows a user (or front-end application) to access the data in Mongo securely and efficiently. One can add query parameters, return fields, distinct keys, and set specific collections to be queried.
Example URL template
http://(DNS of box):3000/search?q={}&col=[]&f={}&d=&l=
Query ``` q (mandatory): ex - q={"gene names":"ACTC"} An object containing query parameters to match against. Also known as "WHERE" condition. (REGEX EXAMPLE q={"gene names":{"$regex":"ACT"}} Equates to: "Show me every document where any gene name contains ACT" Examples of things it will return: ACT ACTA ACTA1 CACTUS ```
Collections to query ``` col (optional): ex - col=["evidence","peptides"] A string array of the collections to query. Non-existent collection names will be ignored. If omitted, will default to querying all collections ```
Fields to return ``` f (optional): ex - f={"protein names":1,"sequence":1,"_id":0} An object containing boolean values (can use 1/0 or true/false) of whether or not to include said fields. Example says "show me protein names and sequence fields, but not the _id field" Defaults to showing all fields. ```
Distinct values of a key ``` d (optional): ex - d=gene names DOES NOT USE QUOTES. Overrides f param, returns only given field, and only distinct values of that key. ```
Limit results ``` l (optional): ex - l=5 Limits the number of results to the specified number. Defaults to 20. ```
Files ```ingest.py``` and ```bulkIngest.py``` are used to take in MaxQuant text files and parse them into MongoDB-readable documents. They're split into collections based on the title of the file, and referenced using an 'expID' to the experiments collection. The experiments are referenced by the parent directory of the text files.
bulkIngest.py
USAGE: python bulkIngest.py <root directory>
root directory - the origin directory in which all files to be ingested will be contained.
Essentially calls ingest.py for every text file found.
ingest.py
USAGE: python bulkIngest.py <file>
file - the text file to be ingested into MongoDB.
Parses the textfile into Mongo