Skip to content

Commit

Permalink
Add index curation (close/delete indexes based on age or size)
Browse files Browse the repository at this point in the history
Merge branch 'topic/curator'
  • Loading branch information
mmguero committed Jun 25, 2019
2 parents bbeb3f9 + 249ddd6 commit e697374
Show file tree
Hide file tree
Showing 12 changed files with 537 additions and 59 deletions.
61 changes: 61 additions & 0 deletions Dockerfiles/curator.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
FROM debian:buster-slim

# Copyright (c) 2019 Battelle Energy Alliance, LLC. All rights reserved.
LABEL maintainer="[email protected]"

ARG ES_HOST=elasticsearch
ARG ES_PORT=9200
ARG CURATOR_TIMEOUT=120
ARG CURATOR_MASTER_ONLY=False
ARG CURATOR_LOGLEVEL=INFO
ARG CURATOR_LOGFORMAT=default

ARG CURATOR_CLOSE_UNITS=years
ARG CURATOR_CLOSE_COUNT=10
ARG CURATOR_DELETE_UNITS=years
ARG CURATOR_DELETE_COUNT=99
ARG CURATOR_DELETE_GIGS=1000000

ENV ES_HOST $ES_HOST
ENV ES_PORT $ES_PORT
ENV CURATOR_TIMEOUT $CURATOR_TIMEOUT
ENV CURATOR_MASTER_ONLY $CURATOR_MASTER_ONLY
ENV CURATOR_LOGLEVEL $CURATOR_LOGLEVEL
ENV CURATOR_LOGFORMAT $CURATOR_LOGFORMAT

ENV CURATOR_CLOSE_UNITS $CURATOR_CLOSE_UNITS
ENV CURATOR_CLOSE_COUNT $CURATOR_CLOSE_COUNT
ENV CURATOR_DELETE_UNITS $CURATOR_DELETE_UNITS
ENV CURATOR_DELETE_COUNT $CURATOR_DELETE_COUNT
ENV CURATOR_DELETE_GIGS $CURATOR_DELETE_GIGS

ENV DEBIAN_FRONTEND noninteractive
ENV CURATOR_VERSION "5.7.6"
ENV CRON "15 */6 * * *"
ENV CONFIG_FILE "/config/config_file.yml"
ENV ACTION_FILE "/config/action_file.yml"
ENV CURATOR_USER "curator"

RUN sed -i "s/buster main/buster main contrib non-free/g" /etc/apt/sources.list && \
apt-get update && \
apt-get -y -q install \
build-essential \
cron \
procps \
psmisc \
python3 \
python3-dev \
python3-pip && \
pip3 install elasticsearch-curator==${CURATOR_VERSION} && \
groupadd --gid 1000 ${CURATOR_USER} && \
useradd -M --uid 1000 --gid 1000 ${CURATOR_USER} && \
apt-get -q -y --purge remove python3-dev build-essential && \
apt-get -q -y autoremove && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
bash -c 'echo "${CRON} su -c \"/usr/local/bin/curator --config ${CONFIG_FILE} ${ACTION_FILE}\" ${CURATOR_USER} >/proc/1/fd/1 2>/proc/1/fd/2" | crontab -'

ADD curator/scripts /usr/local/bin/
ADD curator/config /config/

CMD ["/usr/local/bin/cron_env_deb.sh"]
2 changes: 1 addition & 1 deletion Dockerfiles/file-monitor.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM debian:buster-slim AS build
FROM debian:buster-slim

# Copyright (c) 2019 Battelle Energy Alliance, LLC. All rights reserved.
LABEL maintainer="[email protected]"
Expand Down
112 changes: 87 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ In short, Malcolm provides an easily deployable network analysis tool suite for
+ [IP/MAC address to hostname mapping via `host-map.txt`](#HostNaming)
+ [CIDR subnet to network segment name mapping via `cidr-map.txt`](#SegmentNaming)
+ [Applying mapping changes](#ApplyMapping)
- [Elasticsearch index curation](#Curator)
* [Known issues](#Issues)
* [Installation example using Ubuntu 18.04 LTS](#InstallationExample)
* [Copyright](#Footer)
Expand All @@ -93,6 +94,7 @@ $ docker-compose pull
Pulling elasticsearch ... done
Pulling kibana ... done
Pulling elastalert ... done
Pulling curator ... done
Pulling logstash ... done
Pulling filebeat ... done
Pulling moloch ... done
Expand All @@ -106,19 +108,20 @@ You can then observe that the images have been retrieved by running `docker imag
```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
malcolmnetsec/nginx-proxy 1.2.1 xxxxxxxxxxxx 16 hours ago 53MB
malcolmnetsec/file-upload 1.2.1 xxxxxxxxxxxx 16 hours ago 214MB
malcolmnetsec/pcap-capture 1.2.1 xxxxxxxxxxxx 17 hours ago 111MB
malcolmnetsec/file-monitor 1.2.1 xxxxxxxxxxxx 17 hours ago 353MB
malcolmnetsec/moloch 1.2.1 xxxxxxxxxxxx 17 hours ago 1.04GB
malcolmnetsec/filebeat-oss 1.2.1 xxxxxxxxxxxx 17 hours ago 454MB
malcolmnetsec/logstash-oss 1.2.1 xxxxxxxxxxxx 17 hours ago 1.14GB
malcolmnetsec/elastalert 1.2.1 xxxxxxxxxxxx 17 hours ago 268MB
malcolmnetsec/kibana-oss 1.2.1 xxxxxxxxxxxx 17 hours ago 850MB
malcolmnetsec/nginx-proxy 1.2.2 xxxxxxxxxxxx 16 hours ago 53MB
malcolmnetsec/file-upload 1.2.2 xxxxxxxxxxxx 16 hours ago 214MB
malcolmnetsec/pcap-capture 1.2.2 xxxxxxxxxxxx 17 hours ago 111MB
malcolmnetsec/file-monitor 1.2.2 xxxxxxxxxxxx 17 hours ago 353MB
malcolmnetsec/moloch 1.2.2 xxxxxxxxxxxx 17 hours ago 1.04GB
malcolmnetsec/filebeat-oss 1.2.2 xxxxxxxxxxxx 17 hours ago 454MB
malcolmnetsec/curator 1.2.2 xxxxxxxxxxxx 17 hours ago 303MB
malcolmnetsec/logstash-oss 1.2.2 xxxxxxxxxxxx 17 hours ago 1.14GB
malcolmnetsec/elastalert 1.2.2 xxxxxxxxxxxx 17 hours ago 268MB
malcolmnetsec/kibana-oss 1.2.2 xxxxxxxxxxxx 17 hours ago 850MB
docker.elastic.co/elasticsearch/elasticsearch-oss 6.8.0 xxxxxxxxxxxx 3 weeks ago 765MB
```

You will still probably want to make sure your system configuration and `docker-compose.yml` settings are tuned by running `./scripts/install.py` or `./scripts/install.py --configure` (see [System configuration and tuning](#ConfigAndTuning)).
You must run [`auth_setup.sh`](#AuthSetup) prior to running `docker-compose pull`. You should also ensure your system configuration and `docker-compose.yml` settings are tuned by running `./scripts/install.py` or `./scripts/install.py --configure` (see [System configuration and tuning](#ConfigAndTuning)).

#### Import from pre-packaged tarballs

Expand Down Expand Up @@ -172,6 +175,7 @@ Malcolm leverages the following excellent open source tools, among others.

Checking out the [Malcolm source code](https://github.com/idaholab/malcolm) results in the following subdirectories in your `malcolm/` working copy:

* `curator` - code and configuration for the `curator` container which define rules for closing and/or deleting old Elasticsearch indices
* `Dockerfiles` - a directory containing build instructions for Malcolm's docker images
* `docs` - a directory containing instructions and documentation
* `elastalert` - code and configuration for the `elastalert` container which provides an alerting framework for Elasticsearch
Expand Down Expand Up @@ -211,6 +215,7 @@ $ ./scripts/build.sh

Then, go take a walk or something since it will be a while. When you're done, you can run `docker images` and see you have fresh images for:

* `malcolmnetsec/curator` (based on `debian:buster-slim`)
* `malcolmnetsec/elastalert` (based on `bitsensor/elastalert`)
* `malcolmnetsec/filebeat-oss` (based on `docker.elastic.co/beats/filebeat-oss`)
* `malcolmnetsec/file-monitor` (based on `debian:buster-slim`)
Expand Down Expand Up @@ -340,6 +345,12 @@ Various other environment variables inside of `docker-compose.yml` can be tweake

* `KIBANA_OFFLINE_REGION_MAPS` – if set to `true`, a small internal server will be surfaced to Kibana to provide the ability to view region map visualizations even when an Internet connection is not available (default `true`)

* `CURATOR_CLOSE_COUNT` and `CURATOR_CLOSE_UNITS` - determine behavior for automatically closing older Elasticsearch indices to conserve memory; see [Elasticsearch index curation](#Curator)

* `CURATOR_DELETE_COUNT` and `CURATOR_DELETE_UNITS` - determine behavior for automatically deleting older Elasticsearch indices to reduce disk usage; see [Elasticsearch index curation](#Curator)

* `CURATOR_DELETE_GIGS` - if the Elasticsearch indices representing the log data exceed this size, in gigabytes, older indices will be deleted to bring the total size back under this threshold; see [Elasticsearch index curation](#Curator)

* `AUTO_TAG` – if set to `true`, Malcolm will automatically create Moloch sessions and Zeek logs with tags based on the filename, as described in [Tagging](#Tagging) (default `true`)

* `BEATS_SSL` – if set to `true`, Logstash will use require encrypted communications for any external Beats-based forwarders from which it will accept logs; if Malcolm is being used as a standalone tool then this can safely be set to `false`, but if external log feeds are to be accepted then setting it to true is recommended (default `false`)
Expand Down Expand Up @@ -982,7 +993,33 @@ If both `zeek.orig_segment` and `zeek.resp_segment` are added to a log, and if t
#### <a name="ApplyMapping"></a>Applying mapping changes
When changes are made to either `cidr-map.txt` or `host-map.txt`, Malcolm's Logstash container must be restarted. The easiest way to do this is to restart malcolm via `restart.sh` (see [Stopping and restarting Malcolm](#StopAndRestart)).

## <a name="Issues"></a>Known Issues
## <a name="Curator"></a>Elasticsearch index curation

Malcolm uses [Elasticsearch Curator](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/about.html) to periodically examine indices representing the log data and perform actions on indices meeting criteria for age or disk usage. The environment variables prefixed with `CURATOR_` in the [`docker-compose.yml`](#DockerComposeYml) file determine the criteria for the following actions:

* [close](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/close.html) indices [older than a specificed age](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_age.html) in order to reduce RAM utilization
* [delete](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/delete_indices.html) indices [older than a specificed age](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_age.html) in order to reduce disk usage
* [delete](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/delete_indices.html) the oldest indices in order to keep the total [database size under a specified threshold](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_space.html)

This behavior can also be modified by running [`./scripts/install.py --configure`](#ConfigAndTuning).

Future development of Malcolm may include additional actions, such as creating index [snapshots](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/snapshot.html).

Other custom [filters](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filters.html) and [actions](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/actions.html) may be defined by the user by manually modifying the `action_file.yml` file used by the `curator` container and ensuring that it is mounted into the container as a volume in the `curator:` section of your `docker-compose.yml` file:

```
curator:
volumes:
- ./curator/config/action_file.yml:/config/action_file.yml
```

The settings governing index curation can affect Malcolm's performance in both log ingestion and queries, and there are caveats that should be taken into consideration when configuring this feature. Please read the Elasticsearch documentation linked in this section with regards to index curation.

Index curation only deals with disk space consumed by Elasticsearch indices: it does not have anything to do with PCAP file storage. The `MANAGE_PCAP_FILES` environment variable in the [`docker-compose.yml`](#DockerComposeYml) file can be used to allow Moloch to prune old PCAP files based on available disk space.

## <a name="Issues"></a>Known issues

### PCAP file export error when Zeek logs are in Moloch search results

Expand Down Expand Up @@ -1098,7 +1135,31 @@ user@host:~/Malcolm$ python3 scripts/install.py --configure

Now that any necessary system configuration changes have been made, the local Malcolm instance will be configured:
```
Setting 10g for ElasticSearch and 3g for Logstash. Is this OK? (Y/n): y
Setting 10g for Elasticsearch and 3g for Logstash. Is this OK? (Y/n): y
Periodically close old Elasticsearch indices? (Y/n): y
Indices older than 5 years will be periodically closed. Is this OK? (Y/n): n
Enter index close threshold (eg., 90 days, 2 years, etc.): 1 years
Indices older than 1 years will be periodically closed. Is this OK? (Y/n): y
Periodically delete old Elasticsearch indices? (Y/n): y
Indices older than 10 years will be periodically deleted. Is this OK? (Y/n): n
Enter index delete threshold (eg., 90 days, 2 years, etc.): 5 years
Indices older than 5 years will be periodically deleted. Is this OK? (Y/n): y
Periodically delete the oldest Elasticsearch indices when the database exceeds a certain size? (Y/n): y
Indices will be deleted when the database exceeds 10000 gigabytes. Is this OK? (Y/n): n
Enter index threshold in gigabytes: 100
Indices will be deleted when the database exceeds 100 gigabytes. Is this OK? (Y/n): y
Automatically analyze all PCAP files with Zeek? (y/N): y
Expand Down Expand Up @@ -1155,6 +1216,7 @@ user@host:~/Malcolm$ docker-compose pull
Pulling elasticsearch ... done
Pulling kibana ... done
Pulling elastalert ... done
Pulling curator ... done
Pulling logstash ... done
Pulling filebeat ... done
Pulling moloch ... done
Expand All @@ -1163,18 +1225,18 @@ Pulling pcap-capture ... done
Pulling upload ... done
Pulling nginx-proxy ... done
user@host:~/Malcolm$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
malcolmnetsec/nginx-proxy 1.2.1 xxxxxxxxxxxx 16 hours ago 53MB
malcolmnetsec/file-upload 1.2.1 xxxxxxxxxxxx 16 hours ago 214MB
malcolmnetsec/pcap-capture 1.2.1 xxxxxxxxxxxx 17 hours ago 111MB
malcolmnetsec/file-monitor 1.2.1 xxxxxxxxxxxx 17 hours ago 353MB
malcolmnetsec/moloch 1.2.1 xxxxxxxxxxxx 17 hours ago 1.04GB
malcolmnetsec/filebeat-oss 1.2.1 xxxxxxxxxxxx 17 hours ago 454MB
malcolmnetsec/logstash-oss 1.2.1 xxxxxxxxxxxx 17 hours ago 1.14GB
malcolmnetsec/elastalert 1.2.1 xxxxxxxxxxxx 17 hours ago 268MB
malcolmnetsec/kibana-oss 1.2.1 xxxxxxxxxxxx 17 hours ago 850MB
malcolmnetsec/nginx-proxy 1.2.2 xxxxxxxxxxxx 16 hours ago 53MB
malcolmnetsec/file-upload 1.2.2 xxxxxxxxxxxx 16 hours ago 214MB
malcolmnetsec/pcap-capture 1.2.2 xxxxxxxxxxxx 17 hours ago 111MB
malcolmnetsec/file-monitor 1.2.2 xxxxxxxxxxxx 17 hours ago 353MB
malcolmnetsec/curator 1.2.2 xxxxxxxxxxxx 17 hours ago 303MB
malcolmnetsec/moloch 1.2.2 xxxxxxxxxxxx 17 hours ago 1.04GB
malcolmnetsec/filebeat-oss 1.2.2 xxxxxxxxxxxx 17 hours ago 454MB
malcolmnetsec/logstash-oss 1.2.2 xxxxxxxxxxxx 17 hours ago 1.14GB
malcolmnetsec/elastalert 1.2.2 xxxxxxxxxxxx 17 hours ago 268MB
malcolmnetsec/kibana-oss 1.2.2 xxxxxxxxxxxx 17 hours ago 850MB
docker.elastic.co/elasticsearch/elasticsearch-oss 6.8.0 xxxxxxxxxxxx 3 weeks ago 765MB
```

Expand Down Expand Up @@ -1218,16 +1280,16 @@ malcolm_upload_1 /docker-entrypoint.sh /usr ... Up
Attaching to malcolm_nginx-proxy_1, malcolm_filebeat_1, malcolm_upload_1, malcolm_kibana_1, malcolm_logstash_1, malcolm_elastalert_1, malcolm_moloch_1, malcolm_elasticsearch_1, malcolm_file-monitor_1$
malcolm_pcap-capture_1
...
```

It will take several minutes for all of Malcolm's components to start up. Logstash will take the longest, probably 5 to 10 minutes. You'll know Logstash is fully ready when you see Logstash spit out a bunch of starting up messages, ending with this:
```
...
logstash_1 | [2019-06-11T15:45:41,938][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x7a5910 sleep>"}
logstash_1 | [2019-06-11T15:45:42,009][INFO ][logstash.agent ] Pipelines running {:count=>3, :running_pipelines=>[:input, :main, :output], :non_running_pipelines=>[]}
logstash_1 | [2019-06-11T15:45:42,599][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
...
```

You can now open a web browser and navigate to one of the [Malcolm user interfaces](#UserInterfaceURLs).
Expand Down
72 changes: 72 additions & 0 deletions curator/config/action_file.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
actions:
1:
action: delete_indices
description: >-
Delete the oldest indices with sessions2- prefixes when their cumulative
disk consumtion is greater than ${CURATOR_DELETE_GIGS:1000000} gigabytes. Ignore
the error if the filter does not result in an actionable list of indices
(ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: True
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: sessions2-
exclude:
- filtertype: space
disk_space: ${CURATOR_DELETE_GIGS:1000000}
use_age: True
source: name
timestring: '%y%m%d'
exclude:
2:
action: delete_indices
description: >-
Delete indices older than ${CURATOR_DELETE_COUNT:99} ${CURATOR_DELETE_UNITS:years}
(based on index name), for sessions2-prefixed indices. Ignore the error if the
filter does not result in an actionable list of indices (ignore_empty_list) and
exit cleanly.
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: True
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: sessions2-
exclude:
- filtertype: age
source: name
direction: older
timestring: '%y%m%d'
unit: ${CURATOR_DELETE_UNITS:years}
unit_count: ${CURATOR_DELETE_COUNT:99}
exclude:
3:
action: close
description: >-
Close indices older than ${CURATOR_CLOSE_COUNT:10} ${CURATOR_CLOSE_UNITS:years}
(based on index name), for sessions2-prefixed indices, to conserve memory.
Ignore the error if the filter does not result in an actionable list of indices
(ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: True
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: sessions2-
exclude:
- filtertype: age
source: name
direction: older
timestring: '%y%m%d'
unit: ${CURATOR_CLOSE_UNITS:years}
unit_count: ${CURATOR_CLOSE_COUNT:10}
exclude:
18 changes: 18 additions & 0 deletions curator/config/config_file.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
client:
hosts:
- ${ES_HOST:elasticsearch}
port: ${ES_PORT:9200}
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
ssl_no_validate: True
http_auth:
timeout: ${CURATOR_TIMEOUT:120}
master_only: ${CURATOR_MASTER_ONLY:False}
logging:
loglevel: ${CURATOR_LOGLEVEL:INFO}
logfile:
logformat: ${CURATOR_LOGFORMAT:default}
#blacklist: ['elasticsearch', 'urllib3']
Loading

0 comments on commit e697374

Please sign in to comment.