Skip to content
This repository has been archived by the owner on Aug 16, 2022. It is now read-only.

Requirement : open distro cluster #474

Closed
simon-verbois opened this issue Apr 21, 2021 · 1 comment
Closed

Requirement : open distro cluster #474

simon-verbois opened this issue Apr 21, 2021 · 1 comment

Comments

@simon-verbois
Copy link

Hey everyone :)

I am preparing a test infrastructure with 4 VMs.

1 WAZUH manager, with Kibana and Filebeat installed on it (4cpu, 8go).
1 master-node (no-data type)
2 data-nodes (3 shards on both per index)
I don't know what resources to put on the Open Distro nodes, in the requirements it only indicates the information for a stack on a node (cf. 8cpu 16go).

Do you have an idea of the resources needed for a production run? (wazuh would contain maximum 2000 agents) and there will be a retention after 1 year on the data-nodes

The same thing for the storage, how to estimate the necessary sizes? Knowing that I have servers (windows) mainly and firewalls (watchguard).

Bye bye

@aetter
Copy link
Contributor

aetter commented Apr 21, 2021

Hi @simon-verbois, hardware requirements vary wildly by workload, so it's tough to give any concrete recommendations. Some general stuff:

  • There's no substitute for testing, and it's generally easier to scale down than to scale up. A too-small cluster might just spike to 100% resource usage and crash, whereas a too-large cluster might sit at 40% most of the time, giving you some idea of how far you can scale down.
  • If you think you have a heavy workload, start testing at 1 CPU core per 50 GB shard. For a lighter workload, you can try something more like 1 core for every 2-4 shards.
  • Dedicated master nodes generally benefit from a larger ratio of CPU to RAM. Data nodes need both.
  • For storage, look at the amount of data generated during a representative time period (an hour, a day, etc.) and then multiply. Longer time periods are obviously more accurate when you multiply them out. I wouldn't run data nodes much higher than 75% full, and of course, if you have replica shards, you have to factor those into your calculations. So if you generate 100 MiB of data per day with 1 replica, that's 200 MiB per day, which is around 71 GiB per year.

@aetter aetter closed this as completed Apr 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants