Skip to content
NucleicNet edited this page May 30, 2019 · 18 revisions

Introduction

The NucleicNet is hosted on our webserver (http://www.cbrc.kaust.edu.sa/NucleicNet/). Here, we distribute a version that operates on Linux machines (Centos 7 /Ubuntu >= 16). Users may also refer to:

Dependencies

The NucleicNet depends on the following publicly available software to run efficiently. Users should refer to their instruction and licenses for prerequisite installations.

  1. Python 3.6.7 (https://www.python.org/downloads/release/python-367/) Primary programming language
  2. Anaconda 5.3.1 (https://www.anaconda.com/distribution/) Coordination of Python packages
  3. FEATURE 3.1.0 (https://simtk.org/projects/feature) Analysis of atomic protein models
  4. XSSP 2.0.4 (https://github.com/cmbi/xssp) Analysis of protein secondary structure from atomic protein models.
  5. Pymol 2.3 (https://pymol.org/2/) Visualisation of Binding Pockets
  6. cuda 8.0.61 and cudNN5.1 (https://developer.nvidia.com/rdp/cudnn-archive) Speed-up of deep learning operations.

After installing the prerequisite dependencies, download our NucleicNet repository by clicking the green "Clone or download" button on our code page. Decompress the package. Run the following within the decompressed folder to configure the Python environment.

conda env create -f py3_env.yml

source activate nucleicnet

To exit from the environment, run the following.

source deactivate nucleicnet

Typically, installation should require less than 15 minutes on a 64-core workstation.

How to Use NucleicNet with Command Line

The NucleicNet works on protein atomic model(s) written in PDB file format. Further specification on the input PDB file can be found in Specification on PDB input files. Users can put PDB file(s) into the "GridData" Folder for their analysis. After which, run the following:

# Generate features for protein atomic models

bash command_GenerateFeature.sh

# Analyse on features by deep learning module

bash command_DeepLearningModule.sh

# Organise deep learning predictions into visualisable forms

bash command_AnalysePrediction.sh

The purpose of each python script called within the bash script are annotated.

Output

Major results are stored in the "Out" folder. Supposed our input PDB file of protein is called "GridData/0000.pdb", below outlines the purpose of the resultant output files:

  • "Out/0000_pymol.pse": This is a pymol session that reveal binding pockets of each RNA constituent (e.g. The 4 bases A/U/C/G and the backbone constituent P/R for phosphate and ribose). Users can open this file by "pymol Out/0000_pymol.pse" (See Fig. 3a-c)
  • "Out/0000_R_logo_RNACColor.png": Optional. If binding sites had been ascertained before as a RNA-protein complex PDB file, we can also call "NucleicNet_SequenceLogo_RNACcolor.py" to retrieve NucleicNet-predicted RNA binding specificity on each base location in form of a Sequence Logo diagram. Supposed the corresponding RNA-protein Complex is stored in "Control/0000.pdb" with RNA chain R, our "Out/0000_R_logo_RNACColor.png" then refers to NucleicNet-predicted Sequence Logo indexed by RNA residue on chain R. (See Fig. 3-4)

We also include scripts and data to reproduce our study on Argonautes (See "command_AnalyseGridPrediction.sh"):

  • "ExperimentalSequencing/RipSeq_HMMlogPDifference.png": Using the NucleicNet to score miRNA sequence for Ago Binding. The result is compared with IP-Seq data (*.txt) stored in the "ExperimentalSequencing" Folder. (See Fig 5a)
  • "ExperimentalSequencing/Knockdown_Relation_All_Positive_publication.png" and "ExperimentalSequencing/Knockdown_Relation_All_Negative_publication.png" : Using the NucleicNet to evaluate miRNA loading efficiency. The result is compared with experimental Knockdown level (*.csv) stored in the "ExperimentalSequencing" Folder. (See Fig 5b)