Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Deployment Guide

vgonzale78 edited this page Sep 20, 2016 · 4 revisions

Purpose and Audience

The deployment section provides a summarized view of the installation and recommended locations. The intended audience is the people responsible for leading the installation.

Component Distribution

There are three components in the Open Network Insight solution:

  • Ingest – binary files are captured or transferred into the Hadoop cluster, where they are transformed and loaded into solution data stores
  • Machine Learning – machine learning algorithms are used to add additional learning information to the ingest data, which is used to filter and sort raw data.
  • Operational Analytics – data output from the machine learning component is augmented with context (i.e. geographic data) and heuristics, then is available to the user for interacting with it.

While all of the components can be installed on the same server in a development or test scenario, the recommended configuration for production is to map the components to specific server roles in a Hadoop cluster.

Component Node/Key Role
Ingest Edge Server (Gateway)
Machine Learning YARN Node Manager (Gateway)
Operational Analytics Node with Cloudera Manager / Hue (Gateway)

During the install, each components installs in the /home/"sol-user"/ folder in the appropriate node. This will require the creation of the "solution" user on each node.

Ingest

Six subcomponents are installed on the edge server:

  • nfdump (http://nfdump.sourceforge.net/): a set of utilities for capturing and decoding flow data.
  • tshark: (packet only) a CLI component of wireshark (https://www.wireshark.org/) for decoding packet data
  • rabbitMQ – message queueing framework
  • Ingest workflow – bash script or Oozie workflow
  • Ingest master and workers – python code for data ingest
  • Ingest directory structure – local file system

There are also required changes to the Hadoop configuration:

  • Create HDFS path for binary data
  • Create HDFS path for Hive tables
  • Create solution Hive tables(staging, search)

Machine Learning (ML)

There are multiple sub-components installed in each DataNode / NodeManager used for the solution:

  • Scala scripts to run spark pre- and post-processing jobs
  • Python scripts used for local transformation
  • Algorithm code written in C/C++
  • MPI (Message Passing Interface) libraries – used to parallelize algorithm code
  • ML workflow – bash script
  • ML directory structure on local file system

Some changes are required on the Hadoop Cluster as well:

  • Spark configuration settings will need to be reviewed or modified
  • YARN configuration settings will need to be reviewed or modified
  • Directory structure for machine learning data

Operational Analytics (OA)

Multiple subcomponents are required for installation on the Cloudera Manager/Hue server:

  • Jupyter – provides a server for static html and JavaScript, as well as Jupyter notebooks, the key interface and the Hadoop cluster
  • Matplotlib (optional) – provides rich charting and plotting within Jupyter notebooks
  • D3js and other JavaScript libraries – provide dynamic behavior and interactivity in the user
  • Interface
  • Solution code – static html, JavaScript, and Jupyter notebooks used to access the operational
  • Analytics and information about the system
  • Ops directory structure on the local file system

Some changes may be required on the Hadoop Cluster as well:

  • YARN configuration settings will need to be reviewed or modified (for Hive query optimization).

Because the top-level components of the solution can be used independently or together, we recommend the following approach to installation. For each component (ingest, machine learning, operational analytics):

  • Identify deployment target nodes
  • Install prerequisites on local file system
  • Install solution component on local file system
  • Make configuration/installation changes to Hadoop
  • Validate and Test
  • Home
  • [Overview of Open Network Insight](Overview of Open Network Insight)
    • [Technical Overview](Technical Overview)
  • [Planning Guide](Planning Guide)
    • [Deployment Option 1: Pure Hadoop](Pure Hadoop)
    • [Deployment Option 2: Hybrid Hadoop / Virtual](Hybrid Hadoop)
  • [Deployment Guide](Deployment Guide)
  • [Installation & Configuration Guides](Installation & Configuration Guides)
  • [User Guide](User Guide)
    • Flows
      • [Suspicious Connects – Analyst View](Suspicious Connects)
      • [Threat Investigation – Analyst View](Threat Investigation)
      • Storyboard
      • [Ingest Summary – Analyst View](Ingest Summary)
    • DNS
      • [Suspicious DNS – Analyst View](Suspicious DNS)
      • [Threat Investigation – Analyst View](DNS Threat Investigation)
      • [Storyboard](DNS Storyboard)
    • Proxy
      • [Suspicious Proxy - Analyst View](Suspicious Proxy)
      • [Threat Investigation - Analyst View](Proxy Threat Investigation)
      • [Storyboard](Proxy Storyboard)
  • ONI Demo
Clone this wiki locally