-
Notifications
You must be signed in to change notification settings - Fork 8
Deployment Guide
Purpose and Audience
The deployment section provides a summarized view of the installation and recommended locations. The intended audience is the people responsible for leading the installation.
Component Distribution
There are three components in the Open Network Insight solution:
- Ingest – binary files are captured or transferred into the Hadoop cluster, where they are transformed and loaded into solution data stores
- Machine Learning – machine learning algorithms are used to add additional learning information to the ingest data, which is used to filter and sort raw data.
- Operational Analytics – data output from the machine learning component is augmented with context (i.e. geographic data) and heuristics, then is available to the user for interacting with it.
While all of the components can be installed on the same server in a development or test scenario, the recommended configuration for production is to map the components to specific server roles in a Hadoop cluster.
Component | Node/Key Role |
---|---|
Ingest | Edge Server (Gateway) |
Machine Learning | YARN Node Manager (Gateway) |
Operational Analytics | Node with Cloudera Manager / Hue (Gateway) |
During the install, each components installs in the /home/"sol-user"/ folder in the appropriate node. This will require the creation of the "solution" user on each node.
Ingest
Six subcomponents are installed on the edge server:
- nfdump (http://nfdump.sourceforge.net/): a set of utilities for capturing and decoding flow data.
- tshark: (packet only) a CLI component of wireshark (https://www.wireshark.org/) for decoding packet data
- rabbitMQ – message queueing framework
- Ingest workflow – bash script or Oozie workflow
- Ingest master and workers – python code for data ingest
- Ingest directory structure – local file system
There are also required changes to the Hadoop configuration:
- Create HDFS path for binary data
- Create HDFS path for Hive tables
- Create solution Hive tables(staging, search)
Machine Learning (ML)
There are multiple sub-components installed in each DataNode / NodeManager used for the solution:
- Scala scripts to run spark pre- and post-processing jobs
- Python scripts used for local transformation
- Algorithm code written in C/C++
- MPI (Message Passing Interface) libraries – used to parallelize algorithm code
- ML workflow – bash script
- ML directory structure on local file system
Some changes are required on the Hadoop Cluster as well:
- Spark configuration settings will need to be reviewed or modified
- YARN configuration settings will need to be reviewed or modified
- Directory structure for machine learning data
Operational Analytics (OA)
Multiple subcomponents are required for installation on the Cloudera Manager/Hue server:
- Jupyter – provides a server for static html and JavaScript, as well as Jupyter notebooks, the key interface and the Hadoop cluster
- Matplotlib (optional) – provides rich charting and plotting within Jupyter notebooks
- D3js and other JavaScript libraries – provide dynamic behavior and interactivity in the user
- Interface
- Solution code – static html, JavaScript, and Jupyter notebooks used to access the operational
- Analytics and information about the system
- Ops directory structure on the local file system
Some changes may be required on the Hadoop Cluster as well:
- YARN configuration settings will need to be reviewed or modified (for Hive query optimization).
Because the top-level components of the solution can be used independently or together, we recommend the following approach to installation. For each component (ingest, machine learning, operational analytics):
- Identify deployment target nodes
- Install prerequisites on local file system
- Install solution component on local file system
- Make configuration/installation changes to Hadoop
- Validate and Test
- Home
- [Overview of Open Network Insight](Overview of Open Network Insight)
- [Technical Overview](Technical Overview)
- [Planning Guide](Planning Guide)
- [Deployment Option 1: Pure Hadoop](Pure Hadoop)
- [Deployment Option 2: Hybrid Hadoop / Virtual](Hybrid Hadoop)
- [Deployment Guide](Deployment Guide)
- [Installation & Configuration Guides](Installation & Configuration Guides)
- Initial Configuration
- [Configure User Accounts](Configure User Accounts)
- [Edit Solution Configuration](Edit Solution Configuration).
- [Setup HDFS](Setup HDFS)
- Ingest Component
- Machine Learning
- [Install Prerequisites](Install ML Prerequisites).
- [Installation & Configuration Guide](Install and Configure ML)
- [Running ML](Running ML)
- Operational Analytics & User Interface
- Initial Configuration
- [User Guide](User Guide)
- Flows
- [Suspicious Connects – Analyst View](Suspicious Connects)
- [Threat Investigation – Analyst View](Threat Investigation)
- Storyboard
- [Ingest Summary – Analyst View](Ingest Summary)
- DNS
- [Suspicious DNS – Analyst View](Suspicious DNS)
- [Threat Investigation – Analyst View](DNS Threat Investigation)
- [Storyboard](DNS Storyboard)
- Proxy
- [Suspicious Proxy - Analyst View](Suspicious Proxy)
- [Threat Investigation - Analyst View](Proxy Threat Investigation)
- [Storyboard](Proxy Storyboard)
- Flows
- ONI Demo