Merge pull request #979 from dctootall/master

Feature(#944): Create and update Glossary
gravwell · Apr 30, 2024 · aa76fb6 · aa76fb6
2 parents 3148ceb + 6b4204d
commit aa76fb6
Show file tree

Hide file tree

Showing 8 changed files with 191 additions and 4 deletions.
diff --git a/architecture/architecture.md b/architecture/architecture.md
@@ -80,6 +80,7 @@ Being a well thought out and secure enterprise the engineers and I.T. people hav
 
 This slightly more complicated setup shows how an enterprise can push logs of all shapes and sizes into Gravwell and achieve much greater visibility into the total enterprise.  Pulling wildly disparate log sources will allow staff to access and search large amounts of data from a single place, reducing the time required to diagnose and fix problems.  IT operations can debug webserver problems by correlating sflow logs from the core switch to firewall logs and webserver access records to identify why a user can’t get to a page.  Security operations groups can track thumbdrives moving between domain boxes, or correlate port activity with logins to identify employees attaching unauthorized equipment.  Gravwell will monitor and correlate any and all types of data sources, we even have a video ingester which can feed raw video into indexers, have you ever wanted to use facial recognition to correlate badge reader swipes with the number of faces at a door?  Gravwell can do that.
 
+(indexer_storage_topology)=
 ## Indexer Storage Topology
 
 Gravwell embraces the theme of concurrency throughout the entire stack, including storage.  An indexer is not a single storage system with a single storage array, it is storage orchestrator.  A single indexer can contain up to 65 thousand wells, each of which can contain many storage arrays.  Wells do not have to be uniform, Gravwell architects can estimate throughput requirements for various data sources and allocate storage resources accordingly.  Each storage array operates concurrently, feeding the search pipeline and consuming from ingesters asynchronously.  The asynchronous and distributed nature of storage means that throughput can be ramped up by striping wells across multiple storage arrays.  As an example, if a Gravwell instance is feeding from syslog, sflow, and raw PCAP ingesters a system designer might allocate three wells.  The default well might be capturing syslog and some basic windows logging could point to a single large spinning disk array where throughput is not critical and data is kept long term.  The sflow well may have slightly higher throughput requirements but lower retention requirements; so we point it at a single moderately sized SSD.  The well dedicated to PCAP however must be extremely fast and spinning disks or a single SSD just won’t do.  The very high throughput ingest and search requirements might drive use to take advantage of multiple storage arrays.

diff --git a/configuration/configuration.md b/configuration/configuration.md
@@ -33,6 +33,7 @@ The most important items in the configuration file are the `Ingest-Auth`, `Contr
 In clustered Gravwell installations, it is essential that all nodes are configured with the same `Ingest-Auth` and `Control-Auth` values to enable proper intercommunication.
 ```
 
+(configuration_webserver)=
 ## Webserver Configuration
 
 The webserver acts as the focusing point for all searches, and provides an interactive interface into Gravwell.  While the webserver does not require significant storage, it can benefit from small pools of very fast storage so that even when a search hands back large amounts of data, users can quickly navigate their results.  The webserver also participates in the search pipeline and often performs some of the filtering, metadata extraction, and rendering of data.  When deploying a webserver, we recommend a reasonably sized solid state disk (NVME if possible), a memory pool of 16GB of RAM or more, and at least 4 physical cores.  Gravwell is built to be extremely concurrent, so more CPU cores and additional memory can yield significant performance benefits.  An Intel E5 or AMD Epyc chip with 32GB of memory or more is a good choice, and more is always better.
@@ -65,6 +66,7 @@ By default, Gravwell does not generate TLS certificates. For instructions on set
 * Firewalls blocking access to indexer or webserver ports
   * The default is 9404
 
+(configuration_indexer)=
 ## Indexer Configuration
 
 Indexers are the storage centers of Gravwell and are responsible for storing, retrieving, and processing data.  Indexers perform the first heavy lifting when executing a query, first finding the data then pushing it into the search pipeline.  The search pipeline will perform as much work as possible in parallel on the indexers for efficiency.  Indexers benefit from high-speed low-latency storage and as much RAM as possible.  Gravwell can take advantage of file system caches, which means that as you are running multiple queries over the same data it won’t even have to go to the disks.  We have seen Gravwell operate at over 5GB/s per node on well-cached data.  The more memory, the more data can be cached.  When searching over large pools that exceed the memory capacity of even the largest machines, high speed RAID arrays can help increase throughput.

diff --git a/distributed/frontend.md b/distributed/frontend.md
@@ -8,6 +8,7 @@ Once configured, distributed webservers will synchronize resources, users, dashb
 The datastore is a single point of failure for your distributed webserver system. If the datastore goes down, your webservers will continue to function in a degraded state, but it is *critical* that you restore it as soon as possible. Refer to the Disaster Recovery section below for more information, and be sure to take [frequent backups](/admin/backuprestore) for safety.
 ```
 
+(datastore_server)=
 ## The datastore server
 
 Gravwell uses a separate server process called the datastore to keep webservers in sync. It must run on its own machine; it cannot share a server with a Gravwell webserver or indexer. Fetch the datastore installer from [the downloads page](/quickstart/downloads), then run it on the machine which will contain the datastore.