Skip to content
This repository has been archived by the owner on Sep 20, 2023. It is now read-only.

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
saadkadhi committed Oct 27, 2017
1 parent ee5ec37 commit 73d827e
Show file tree
Hide file tree
Showing 3 changed files with 858 additions and 858 deletions.
357 changes: 357 additions & 0 deletions api/deprecated-how-to-create-an-analyzer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
# Overview
The main goal of Cortex is to run analysis on a given observable, defined by its data type, and a value. This observable could be of any type: IP, domain, URL, email, file...

The programs that Cortex invokes to analyze observables are called **Analyzers**. An **analyzer** could be written in any programming language supported by Linux as long as the resulting program is on the same machine as Cortex and is executable.

# Create a Basic Analyzer
From a technical standpoint, a minimal **analyzer** would be defined by:
- A JSON definition file
- An executable script. As of this writing, all the available analyzers are written in Python. However, analyzers can be written in any programming language supported by Linux

Throughout this document, we will use the [Hippocampe_More](https://github.com/CERT-BDF/Cortex-Analyzers/tree/master/analyzers/Hippocampe) analyzer as an example to teach you how to write your own analyzer.

Our **analyzer** will be defined inside a folder called `Hippocampe`. We use a convention. The folder where the analyzer is located is named after the product or service it leverages to do its work: MISP, MaxMind, PassiveTotal, VirusTotal, DomainTools...

## The JSON Definition File
As its name implies, the JSON definition file contain metadata describing the analyzer. The associated filename must be `<PRODUCT_NAME>_<SERVICE_NAME>.json`. For example: `Hippocampe_more.json` and `Hippocampe_hipposcore.json`.

The structure of this file is described in the table below:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| name | String, `REQUIRED` | The analyzer's name |
| version | String `REQUIRED` | The analyzer's version |
| description | String `REQUIRED`| The analyzer's description |
| dataTypeList | String[] `REQUIRED` | An array of strings, listing the observable data types that could be analyzed |
| command | String `REQUIRED` | The command to invoke the analyzer's script. It must be relative to the root directory that contain all the analyzer folders |
| baseConfig* | String `OPTIONAL` | The name of the configuration attribute defined in the Cortex configuration file |
| config* | Object `OPTIONAL` | A configuration object that will be passed to the analyzer's command |

**Note**: Fields marked with a star (*) will be described in depth later in this document.

**Example**
```json
{
"name": "HippoMore",
"version": "1.0",
"description": "Hippocampe detailed report: provides the last detailed report for an IP, domain or a URL",
"dataTypeList": ["ip", "domain", "fqdn", "url"],
"baseConfig": "Hippocampe",
"config": {
"check_tlp": false,
"max_tlp":3,
"service": "more"
},
"command": "Hippocampe/hippo.py"
}
```

## The Script
The analyzer script must be an executable script that Cortex runs using the `command` provided within the JSON definition file. The script could be written in any programming language, as long as it could be executed using a shell command.

When running the analyzer's script file, Cortex provides some input data through the standard input, and expects an output through the standard output.

### Analyzer Input
In Cortex, we distinguish between two types of observables:
- Value-based observables
- File-based observables

The input sent by Cortex to the analyzers depend on the observable type.

*Note*: when using Cortex with [TheHive](https://github.com/CERT-BDF/TheHive/), we use some output conventions that allow us to normalize the way TheHive displays the analysis reports.

#### Input for Value-based Observables
The input for value-based observables must have the following structure:

```json
{
"dataType": "ip",
"data": "8.8.8.8",
"config": {}
}
```

The following table explains the JSON schema of the input:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| dataType | String, `REQUIRED` | The observables's type |
| data | String `REQUIRED` | The observable's value |
| config* | Object `OPTIONAL` | A config object, representing the analyzer's options and parameters if any |


#### Input for File-based Observables
The input for file-based observables must have the following structure:
```json
{
"dataType": "file",
"attachment": {
"name": "<file.extension>"
},
"file": "/path/to/file/observable",
"config": {}
}
```

The following table explains the JSON schema of the input:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| dataType | String, `REQUIRED` | The observable's type (`file` in this case) |
| file | String `REQUIRED` | The observable's file path |
| attachment.name | String `OPTIONAL` | The observable's file name |
| config* | Object `OPTIONAL` | A config object, representing the analyzer's options and parameters if any |

#### The Analyzer Configuration Object
An analyzer can define a default configuration object in its JSON definition file. Cortex can override or add additional configuration properties using the Cortex's configuration file.

Based on that, the `config` object passed to the analyzer's script results from the merge operation of three objects:
- the `config` object defined in the analyzer's JSON definition file (defines the default values of the analyzer's config)
- the `baseConfig` object defined in the Cortex's configuration file using the `analyzer.config.<baseConfig>` property (used to hold sensitive properties like API keys or credentials)
- the *global* analyzers configuration defined in the Cortex's configuration file using `analyzer.config.global` (generally contains proxy configuration information)


### Analyzer Output
The output from Cortex could technically be any JSON object. That said, Cortex's UI might rely on a specific attribute to decide if the job failed or succeeded. The property is named `success` and must be a Boolean value.

In the existing analyzers we tried to stick to some conventions where we defined the formats defined below.

#### Successful Analysis
```json
{
"success": true,
"summary": {},
"artifacts": [],
"full": {}
}
```

The following table explains the JSON conventions of the output:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| success | Boolean `REQUIRED` | The analysis success flag |
| summary | Object `OPTIONAL` | The analysis summary: a small report |
| full | Object `REQUIRED` | The analysis complete report |
| artifacts | Array[`<Artifact>`] `OPTIONAL` | An array of artifacts discovered by the analyzer |

The `<Artifact>` object has the following structure:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| type | String `REQUIRED` | The artifact data type |
| value | String `REQUIRED` | The artifact value |

**Note**: the `artifacts` array will be used in the future by [TheHive](https://github.com/CERT-BDF/TheHive/) to display or import the extracted artifacts from an analysis report.

#### Unsuccessful Analysis
```json
{
"success": false,
"errorMessage": ""
}
```

# The Cortexutils Library
`cortexutils` is a Python library available on `pip`. It provides a Python class that facilitates the creation of analyzer script files. It includes an abstract `Analyzer` class that a programmer may inherit and override in their script. It also provides some methods to quickly format the output to be compliant with the JSON schema expected by [TheHive](https://github.com/CERT-BDF/TheHive/).

To create an analyzer class, developers have to:

1. Create a subclass of `cortexutils.analyzer.Analyzer`
2. Override the constructor, call the super constructor and if needed, read the specific analyzer's options (read specific configuration properties from the config object)
3. Override the `run` method. It must either return a report, using the `report` method, or an error using the `error`method. If `run`is not overridden, the analyzer returns an empty report
3. Optionally override the `summary` method. It should return a JSON object representing a summary of the analyzer report. If not overridden, the analyzer returns an empty summary
3. Optionally override the `artifacts` method. It should return a JSON array representing a list of `artifact` objects (as described above). If not overridden, the analyzer returns the result of an `ioc-parser`, ran over the full JSON report.

Below is an example of a basic analyzer that can handle IPs and domains:

```python
#!/usr/bin/env python
# encoding: utf-8

from cortexutils.analyzer import Analyzer

# Define analyzer's class
class BasicExampleAnalyzer(Analyzer):
# Analyzer's constructor
def __init__(self):
# Call the constructor of the super class
Analyzer.__init__(self)

# Read specific config options
self.optional_prop = self.getParam('config.optional_prop', '')
self.required_prop = self.getParam('config.required_prop', None, 'Error: Missing required_prop')

# Override the report method. This is the analyzer's entry point
def run(self):
# Put your analyzer's logic here
result = {}

# This is just an example
if self.data_type == 'ip':
result['findings'] = ['1.1.1.1', '2.2.2.2', '3.3.3.3']
elif self.data_type == 'domain':
result['findings'] = ['domain1.com', 'domain2.com', 'domain3.com']
else:
return self.error('Unsupported observable data type')

# Return the report
return self.report(result)

# Override the summary method
def summary(self, raw_report):
return {
'count': len(raw_report['findings'])
}

# Override the artifacts method
def artifacts(self, raw_report):
result = []
if 'findings' in raw_report:
for item in raw_report['findings']:
result.append({'type': self.data_type, 'value': item})

return result

# Invoke the analyzer
if __name__ == '__main__':
BasicExampleAnalyzer().run()

```

To call this analyzer, we can run the following command:

```
python sample-analyzer.py <<< '{
"dataType":"ip",
"data": "8.8.8.8",
"config":{
"required_prop": "anyvalue"
}
}'
```

This will generate the following output:

```json
{
"success" : true,
"artifacts" : [
{
"value" : "1.1.1.1",
"type" : "ip"
},
{
"value" : "2.2.2.2",
"type" : "ip"
},
{
"value" : "3.3.3.3",
"type" : "ip"
}
],
"summary" : {
"count" : 3
},
"full" : {
"findings" : [
"1.1.1.1",
"2.2.2.2",
"3.3.3.3"
]
}
}
```

And in Cortex ![](../images/cortex-report.png)

# TheHive and Cortex analyzers
Using Cortex from an instance of [TheHive](https://github.com/CERT-BDF/TheHive/) helps the users improve the analysis report visualization. In fact, TheHive uses the outputs generated from Cortex analyzers in two ways:

- Store the `summary` content as part of the observable's data. This is available for successful analysis jobs only.
- Display the `full`report using the report templates defined within TheHive.

## Report templates
[TheHive](https://github.com/CERT-BDF/TheHive/) is based on Angular 1 and report templates have to be Angular templates which we try to fill using the job's report data.

We distinguish 2 types of report templates:

### Short reports
Generates what we call **mini reports**, to be displayed in the observable's details page and observables list. Short report templates receive the following data:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| name | String | Analyzer's ID |
| content | Object | The job report's `summary` object |
| artifact | Object | The observable details, as stored in TheHive |

For example, if we want to create a short report template for the `BasicExampleAnalyzer`, we could write the following HTML short report file:

```html
<span class="label label-info">Basic: {{content.count || 0}} record(s)</span>
```

`content` being the following:

```json
{
"count" : 3
}
```

The result in TheHive will be ![](../images/short-report.png)

### Long reports
Like short reports, the long reports are used to render the content of the `full` attribute of a job JSON report.

Long report templates receive the following data:

| Attribute | Type | Description |
| ------------ | ------------- | ------------- |
| name | String | Analyzer's ID |
| status | String | The job's status: `Success`, `Failure`, `InProgress` |
| success | Boolean | The job's success status |
| content | Object | The job report's `full` object |
| artifact | Object | The observable details, as stored in TheHive |


For example, if we want to create a long report template for the `BasicExampleAnalyzer`, we could write the following HTML long report file:

```html
<!-- Success case -->
<div class="panel panel-info" ng-if="success">
<div class="panel-heading">
<strong>{{name}}</strong>
</div>
<div class="panel-body">
<div>{{content.findings.length}} {{artifact.dataType | uppercase}}(s) found form {{artifact.data | fang}}</div>
<ul>
<li ng-repeat="finding in content.findings">{{finding}}</li>
</ul>
</div>
</div>

<!-- Failure case -->
<div class="panel panel-danger" ng-if="!success">
<div class="panel-heading">
<strong>{{artifact.data | fang}}</strong>
</div>
<div class="panel-body">
{{content.errorMessage}}
</div>
</div>
```

`content` being the following:

```json
{
"findings" : [
"1.1.1.1",
"2.2.2.2",
"3.3.3.3"
]
}
```

The result in TheHive will be ![](../images/long-report.png)
Loading

0 comments on commit 73d827e

Please sign in to comment.