From e7db2e78afac21238432c308e9310274b800dac2 Mon Sep 17 00:00:00 2001 From: Dwight Hodge Date: Sat, 3 Jun 2023 12:43:35 -0400 Subject: [PATCH 1/5] Remove Node agent from 2.16 --- .../docdb-replication/replication.md | 18 +-- .../docdb-replication/replication.md | 28 ++-- .../set-up-cloud-provider/on-premises.md | 145 +----------------- 3 files changed, 30 insertions(+), 161 deletions(-) diff --git a/docs/content/preview/architecture/docdb-replication/replication.md b/docs/content/preview/architecture/docdb-replication/replication.md index c0dd60035719..8b7641b487e6 100644 --- a/docs/content/preview/architecture/docdb-replication/replication.md +++ b/docs/content/preview/architecture/docdb-replication/replication.md @@ -42,13 +42,13 @@ YugabyteDB replicates data across nodes (or fault domains) in order to tolerate Replication of data in DocDB is achieved at the level of tablets, using tablet peers, with each table sharded into a set of tablets, as demonstrated in the following diagram: - +![Tablets in a table](/images/architecture/replication/tablets_in_a_docsb_table.png) Each tablet comprises of a set of tablet peers, each of which stores one copy of the data belonging to the tablet. There are as many tablet peers for a tablet as the replication factor, and they form a Raft group. The tablet peers are hosted on different nodes to allow data redundancy to protect against node failures. The replication of data between the tablet peers is strongly consistent. The following diagram depicts three tablet peers that belong to a tablet called `tablet 1`. The tablet peers are hosted on different YB-TServers and form a Raft group for leader election, failure detection, and replication of the write-ahead logs. -![raft_replication](/images/architecture/raft_replication.png) +![RAFT Replication](/images/architecture/raft_replication.png) ### Raft replication @@ -56,7 +56,7 @@ As soon as a tablet initiates, it elects one of the tablet peers as the tablet l The set of DocDB updates depends on the user-issued write, and involves locking a set of keys to establish a strict update order, and optionally reading the older value to modify and update in case of a read-modify-write operation. The Raft log is used to ensure that the database state-machine of a tablet is replicated amongst the tablet peers with strict ordering and correctness guarantees even in the face of failures or membership changes. This is essential to achieving strong consistency. -Once the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. Once the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations. +After the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. After the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations. ## Replication in a cluster @@ -64,29 +64,29 @@ The replicas of data can be placed across multiple fault domains. The following ### Multi-zone deployment -In the case of a multi-zone deployement, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablet’s leader, as per the following diagram: +In the case of a multi-zone deployment, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablet's leader, as per the following diagram: - +![Replication across zones](/images/architecture/replication/raft-replication-across-zones.png) As a part of the Raft replication, each tablet peer first elects a tablet leader responsible for serving reads and writes. The distribution of tablet leaders across different zones is determined by a user-specified data placement policy, which, in the preceding scenario, ensures that in the steady state, each of the zones has an equal number of tablet leaders. The following diagram shows how the tablet leaders are dispersed: - +![Tablet leader placement](/images/architecture/replication/optimal-tablet-leader-placement.png) ### Tolerating a zone outage As soon as a zone outage occurs, YugabyteDB assumes that all nodes in that zone become unavailable simultaneously. This results in one-third of the tablets (which have their tablet leaders in the zone that just failed) not being able to serve any requests. The other two-thirds of the tablets are not affected. The following illustration shows the tablet peers in the zone that failed: - +![Tablet peers in a failed zone](/images/architecture/replication/tablet-leaders-vs-followers-zone-outage.png) For the affected one-third, YugabyteDB automatically performs a failover to instances in the other two zones. Once again, the tablets being failed over are distributed across the two remaining zones evenly, as per the following diagram: - +![Automatic failover](/images/architecture/replication/automatic-failover-zone-outage.png) ### RPO and RTO on zone outage The recovery point objective (RPO) for each of these tablets is 0, meaning no data is lost in the failover to another zone. The recovery time objective (RTO) is 3 seconds, which is the time window for completing the failover and becoming operational out of the new zones, as per the following diagram: - +![RPO vs RTO](/images/architecture/replication/rpo-vs-rto-zone-outage.png) ## Follower reads diff --git a/docs/content/stable/architecture/docdb-replication/replication.md b/docs/content/stable/architecture/docdb-replication/replication.md index b680b7ff495e..192dd6208fb6 100644 --- a/docs/content/stable/architecture/docdb-replication/replication.md +++ b/docs/content/stable/architecture/docdb-replication/replication.md @@ -2,7 +2,7 @@ title: Replication in DocDB headerTitle: Synchronous replication linkTitle: Synchronous -description: Learn how YugabyteDB uses the Raft consensus in DocDB to replicate data across multiple independent fault domains like nodes, zones, regions and clouds. +description: Learn how YugabyteDB uses the Raft consensus in DocDB to replicate data across multiple independent fault domains like nodes, zones, regions, and clouds. headContent: Synchronous replication using the Raft consensus protocol menu: stable: @@ -16,7 +16,7 @@ Using the Raft distributed consensus protocol, DocDB automatically replicates da ## Concepts -A number of concepts is central to replication. +A number of concepts are central to replication. ### Fault domains @@ -26,7 +26,7 @@ A fault domain comprises a group of nodes that are prone to correlated failures. * Regions or datacenters * Cloud providers -Data is typically replicated across fault domains to be resilient to the outage of all nodes in that fault domain. +Data is typically replicated across fault domains to be resilient to the outage of all nodes in one fault domain. ### Fault tolerance @@ -40,13 +40,13 @@ YugabyteDB replicates data across nodes (or fault domains) in order to tolerate Replication of data in DocDB is achieved at the level of tablets, using tablet peers, with each table sharded into a set of tablets, as demonstrated in the following diagram: - +![Tablets in a table](/images/architecture/replication/tablets_in_a_docsb_table.png) -Each tablet comprises of a set of tablet peers, each of which stores one copy of the data belonging to the tablet. There are as many tablet peers for a tablet as the replication factor, and they form a Raft group. The tablet peers are hosted on different nodes to allow data redundancy on node failures. The replication of data between the tablet peers is strongly consistent. +Each tablet comprises of a set of tablet peers, each of which stores one copy of the data belonging to the tablet. There are as many tablet peers for a tablet as the replication factor, and they form a Raft group. The tablet peers are hosted on different nodes to allow data redundancy to protect against node failures. The replication of data between the tablet peers is strongly consistent. The following diagram depicts three tablet peers that belong to a tablet called `tablet 1`. The tablet peers are hosted on different YB-TServers and form a Raft group for leader election, failure detection, and replication of the write-ahead logs. -![raft_replication](/images/architecture/raft_replication.png) +![RAFT Replication](/images/architecture/raft_replication.png) ### Raft replication @@ -54,7 +54,7 @@ As soon as a tablet initiates, it elects one of the tablet peers as the tablet l The set of DocDB updates depends on the user-issued write, and involves locking a set of keys to establish a strict update order, and optionally reading the older value to modify and update in case of a read-modify-write operation. The Raft log is used to ensure that the database state-machine of a tablet is replicated amongst the tablet peers with strict ordering and correctness guarantees even in the face of failures or membership changes. This is essential to achieving strong consistency. -Once the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. Once the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations. +After the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. After the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations. ## Replication in a cluster @@ -62,29 +62,29 @@ The replicas of data can be placed across multiple fault domains. The following ### Multi-zone deployment -In the case of a multi-zone deployement, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablet’s leader, as per the following diagram: +In the case of a multi-zone deployment, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablet's leader, as per the following diagram: - +![Replication across zones](/images/architecture/replication/raft-replication-across-zones.png) -As a part of the Raft replication, each tablet peer first elects a tablet leader responsible for serving reads and writes. The distribution of tablet leaders across different zones is determined by a user-specified data placement policy which, in the preceding scenario, ensures that in the steady state, each of the zones has an equal number of tablet leaders. The following diagram shows how the tablet leaders are dispersed: +As a part of the Raft replication, each tablet peer first elects a tablet leader responsible for serving reads and writes. The distribution of tablet leaders across different zones is determined by a user-specified data placement policy, which, in the preceding scenario, ensures that in the steady state, each of the zones has an equal number of tablet leaders. The following diagram shows how the tablet leaders are dispersed: - +![Tablet leader placement](/images/architecture/replication/optimal-tablet-leader-placement.png) ### Tolerating a zone outage As soon as a zone outage occurs, YugabyteDB assumes that all nodes in that zone become unavailable simultaneously. This results in one-third of the tablets (which have their tablet leaders in the zone that just failed) not being able to serve any requests. The other two-thirds of the tablets are not affected. The following illustration shows the tablet peers in the zone that failed: - +![Tablet peers in a failed zone](/images/architecture/replication/tablet-leaders-vs-followers-zone-outage.png) For the affected one-third, YugabyteDB automatically performs a failover to instances in the other two zones. Once again, the tablets being failed over are distributed across the two remaining zones evenly, as per the following diagram: - +![Automatic failover](/images/architecture/replication/automatic-failover-zone-outage.png) ### RPO and RTO on zone outage The recovery point objective (RPO) for each of these tablets is 0, meaning no data is lost in the failover to another zone. The recovery time objective (RTO) is 3 seconds, which is the time window for completing the failover and becoming operational out of the new zones, as per the following diagram: - +![RPO vs RTO](/images/architecture/replication/rpo-vs-rto-zone-outage.png) ## Follower reads diff --git a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md index 1337ede87ddb..3d9f3412e1af 100644 --- a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md +++ b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md @@ -62,7 +62,7 @@ type: docs -
You can configure the on-premises cloud provider for YugabyteDB using YugabyteDB Anywhere. If no cloud providers are configured, the main **Dashboard** prompts you to configure at least one cloud provider. +You can configure the on-premises cloud provider for YugabyteDB using YugabyteDB Anywhere. If no cloud providers are configured, the main **Dashboard** prompts you to configure at least one cloud provider. ## Configure the on-premises provider @@ -359,13 +359,13 @@ On each node, perform the following as a user with sudo access: sudo mkdir /var/log/prometheus sudo mkdir /var/run/prometheus sudo mv /tmp/node_exporter-1.3.1.linux-amd64.tar /opt/prometheus - sudo adduser --shell /bin/bash prometheus # (also adds group “prometheus”) + sudo adduser --shell /bin/bash prometheus # (also adds group "prometheus") sudo chown -R prometheus:prometheus /opt/prometheus sudo chown -R prometheus:prometheus /etc/prometheus sudo chown -R prometheus:prometheus /var/log/prometheus sudo chown -R prometheus:prometheus /var/run/prometheus sudo chmod +r /opt/prometheus/node_exporter-1.3.1.linux-amd64.tar - sudo su - prometheus (user session is now as user “prometheus”) + sudo su - prometheus (user session is now as user "prometheus") ``` 1. Run the following commands as user `prometheus`: @@ -550,7 +550,7 @@ As an alternative to setting crontab permissions, you can install systemd-specif /bin/systemctl daemon-reload ``` -2. Ensure that you have root access and add the following service and timer files to the `/etc/systemd/system` directory (set their ownerships to the `yugabyte` user and 0644 permissions):

+2. Ensure that you have root access and add the following service and timer files to the `/etc/systemd/system` directory (set their ownerships to the `yugabyte` user and 0644 permissions): `yb-master.service` @@ -738,140 +738,9 @@ As an alternative to setting crontab permissions, you can install systemd-specif WantedBy=timers.target ``` -### Use node agents - -To automate some of the steps outlined in [Provision nodes manually](#provision-nodes-manually), YugabyteDB Anywhere provides a node agent that you can run on each node meeting the following requirements: - -- The node has already been set up with the `yugabyte` user group and home. -- The bi-directional communication between the node and YugabyteDB Anywhere has been established (that is, the IP address can reach the host and vice versa). - -#### Installation - -You can install a node agent as follows: - -1. Download the installer from YugabyteDB Anywhere using the API token of the Super Admin, as follows: - - ```sh - curl https:///api/v1/node_agents/download --header 'X-AUTH-YW-API-TOKEN: ' > installer.sh && chmod +x installer.sh - ``` - -3. Verify that the installer file contains the script. - -3. Run the following command to download the node agent's `.tgz` file which installs and starts the interactive configuration: - - ```sh - ./installer.sh -t install -u https:// -at - ``` - - For example, if you execute `./installer.sh -t install -u http://100.98.0.42:9000 -at 301fc382-cf06-4a1b-b5ef-0c8c45273aef`, expect the following output: - - ```output - * Starting YB Node Agent install - * Creating Node Agent Directory - * Changing directory to node agent - * Creating Sub Directories - * Downloading YB Node Agent build package - * Getting Linux/amd64 package - * Downloaded Version - 2.17.1.0-PRE_RELEASE - * Extracting the build package - * The current value of Node IP is not set; Enter new value or enter to skip: 10.9.198.2 - * The current value of Node Name is not set; Enter new value or enter to skip: Test - * Select your Onprem Provider - 1. Provider ID: 41ac964d-1db2-413e-a517-2a8d840ff5cd, Provider Name: onprem - Enter the option number: 1 - * Select your Instance Type - 1. Instance Code: c5.large - Enter the option number: 1 - * Select your Region - 1. Region ID: dc0298f6-21bf-4f90-b061-9c81ed30f79f, Region Code: us-west-2 - Enter the option number: 1 - * Select your Zone - 1. Zone ID: 99c66b32-deb4-49be-85f9-c3ef3a6e04bc, Zone Name: us-west-2c - Enter the option number: 1 - • Completed Node Agent Configuration - • Node Agent Registration Successful - You can install a systemd service on linux machines by running node-agent-installer.sh -t install-service (Requires sudo access). - ``` - -4. Run the following command to enable the node agent as a systemd service, which is required for self-upgrade and other functions: - - ```sh - sudo node-agent-installer.sh -t install-service - ``` - -When the installation has been completed, the configurations are saved in the `config.yml` file located in the `node-agent/config/` directory. You should refrain from manually changing values in this file. - -#### Registration - -To enable secured communication, the node agent is automatically registered during its installation so the YugabyteDB Anywhere is aware of its existence. You can also register and unregister the node agent manually during configuration. - -The following is the node agent registration command: - -```sh -node-agent node register --api-token -``` - -If you need to overwrite any previously configured values, you can use the following parameters within the registration command: - -- `--node_ip` represents the node IP address. -- `--url` represents the YugabyteDB Anywhere address. - -For secured communication, YugabyteDB Anywhere generates a key pair (private, public, and server certificate) that is sent to the node agent as part of its registration process. - - - -To unregister a node agent, use the following command: - -```sh -node-agent node unregister -``` - -#### Operations - -Even though the node agent installation, configuration, and registration are sufficient, the following supplementary commands are also supported: - -- `node-agent node unregister` is used for unregistersing the node and node agent from YugabyteDB Anywhere. This can be done to restart the registration process. -- `node-agent node register` is used for registering a node and node agent to YugabyteDB Anywhere if they were unregistered manually. Registering an already registered node agent fails as YugabyteDB Anywhere keeps a record of the node agent with this IP. -- `node-agent service start` and `node-agent service stop` are used for starting or stopping the node agent as a gRPC server. -- `node-agent node preflight-check` is used for checking if a node is configured as a YugabyteDB Anywhere node. After the node agent and the node have been registered with YugabyteDB Anywhere, this command can be run on its own, if the result needs to be published to YugabyteDB Anywhere. For more information, see [Preflight check](#preflight-check). - -#### Preflight check - -Once the node agent is installed, configured, and connected to YugabyteDB Anywhere, you can perform a series of preflight checks without sudo privileges by using the following command: - -```sh -node-agent node preflight-check -``` - -The result of the check is forwarded to YugabyteDB Anywhere for validation. The validated information is posted in a tabular form on the terminal. If there is a failure against a required check, you can apply a fix and then rerun the preflight check. - -Expect an output similar to the following: - -![Result](/images/yp/node-agent-preflight-check.png) - -If the preflight check is successful, you would be able to add the node to the provider (if required) by executing the following: - -```sh -node-agent node preflight-check --add_node -``` - ## Remove YugabyteDB components from the server -As described in [Eliminate an unresponsive node](../../../manage-deployments/remove-nodes/), when a node enters an undesirable state, you can delete such node, with YugabyteDB Anywhere clearing up all the remaining artifacts except the `prometheus` and `yugabyte` user. +As described in [Eliminate an unresponsive node](../../../manage-deployments/remove-nodes/), when a node enters an undesirable state, you can delete the node, with YugabyteDB Anywhere clearing up all the remaining artifacts except the `prometheus` and `yugabyte` user. You can manually remove Yugabyte components from existing server images. Before attempting this, you have to determine whether or not YugabyteDB Anywhere is operational. If it is, you either need to delete the universe or delete the nodes from the universe. @@ -881,7 +750,7 @@ To completely eliminate all traces of YugabyteDB Anywhere and configuration, you You can remove YugabyteDB components and configuration from the database server nodes as follows: -- Login to the server node as the `yugabyte` user. +- Log in to the server node as the `yugabyte` user. - Navigate to the `/home/yugabyte/bin` directory that contains a number of scripts including `yb-server-ctl.sh`. The arguments set in this script allow you to perform various functions on the YugabyteDB processes running on the node. @@ -925,7 +794,7 @@ You may now choose to reverse the system settings that you configured in [Provis ### Delete YugabyteDB Anywhere from the server -To remove YugabyteDB Anywhere and Replicated components from the host server, execute the following commands as the `root` user (or prepend `sudo` to each command) : +To remove YugabyteDB Anywhere and Replicated components from the host server, execute the following commands as the `root` user (or prepend `sudo` to each command): ```sh systemctl stop replicated replicated-ui replicated-operator From 1b7a88d94f6a89e880f751606626ca7179e15bd9 Mon Sep 17 00:00:00 2001 From: Dwight Hodge Date: Wed, 7 Jun 2023 19:22:58 -0400 Subject: [PATCH 2/5] Add systemd services for centos 7 --- .../set-up-cloud-provider/on-premises.md | 441 +++++++++++------- 1 file changed, 263 insertions(+), 178 deletions(-) diff --git a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md index 3d9f3412e1af..d5fbd4f6a412 100644 --- a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md +++ b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md @@ -189,13 +189,24 @@ If the SSH user configured in the on-premises provider does not have sudo privil For each node, perform the following: -- [Set up time synchronization](#set-up-time-synchronization) -- [Open incoming TCP ports](#open-incoming-tcp-ip-ports) -- [Preprovision the node](#preprovision-nodes-manually) -- [Install Prometheus node exporter](#install-prometheus-node-exporter) -- [Install backup utilities](#install-backup-utilities) -- [Set crontab permissions](#set-crontab-permissions) -- [Install systemd-related database service unit files (optional)](#install-systemd-related-database-service-unit-files) +- [Configure the on-premises provider](#configure-the-on-premises-provider) + - [Complete the provider information](#complete-the-provider-information) + - [Configure hardware for YugabyteDB nodes](#configure-hardware-for-yugabytedb-nodes) + - [Define regions and zones](#define-regions-and-zones) +- [Add YugabyteDB nodes](#add-yugabytedb-nodes) + - [Provision nodes manually](#provision-nodes-manually) + - [Running the preprovisioning script](#running-the-preprovisioning-script) + - [Setting up database nodes manually](#setting-up-database-nodes-manually) + - [Set up time synchronization](#set-up-time-synchronization) + - [Open incoming TCP/IP ports](#open-incoming-tcpip-ports) + - [Preprovision nodes manually](#preprovision-nodes-manually) + - [Install Prometheus node exporter](#install-prometheus-node-exporter) + - [Install backup utilities](#install-backup-utilities) + - [Set crontab permissions](#set-crontab-permissions) + - [Install systemd-related database service unit files](#install-systemd-related-database-service-unit-files) +- [Remove YugabyteDB components from the server](#remove-yugabytedb-components-from-the-server) + - [Delete database server nodes](#delete-database-server-nodes) + - [Delete YugabyteDB Anywhere from the server](#delete-yugabytedb-anywhere-from-the-server) ##### Set up time synchronization @@ -550,193 +561,267 @@ As an alternative to setting crontab permissions, you can install systemd-specif /bin/systemctl daemon-reload ``` -2. Ensure that you have root access and add the following service and timer files to the `/etc/systemd/system` directory (set their ownerships to the `yugabyte` user and 0644 permissions): +1. Ensure that you have root access and add the following service and timer files to the `/etc/systemd/system` directory (set their ownerships to the `yugabyte` user and 0644 permissions): - `yb-master.service` + `yb-master.service` - ```sh - [Unit] - Description=Yugabyte master service - Requires=network-online.target - After=network.target network-online.target multi-user.target - StartLimitInterval=100 - StartLimitBurst=10 - - [Path] - PathExists=/home/yugabyte/master/bin/yb-master - PathExists=/home/yugabyte/master/conf/server.conf - - [Service] - User=yugabyte - Group=yugabyte - # Start - ExecStart=/home/yugabyte/master/bin/yb-master --flagfile /home/yugabyte/master/conf/server.conf - Restart=on-failure - RestartSec=5 - # Stop -> SIGTERM - 10s - SIGKILL (if not stopped) [matches existing cron behavior] - KillMode=process - TimeoutStopFailureMode=terminate - KillSignal=SIGTERM - TimeoutStopSec=10 - FinalKillSignal=SIGKILL - # Logs - StandardOutput=syslog - StandardError=syslog - # ulimit - LimitCORE=infinity - LimitNOFILE=1048576 - LimitNPROC=12000 - - [Install] - WantedBy=default.target - ``` + ```sh + [Unit] + Description=Yugabyte master service + Requires=network-online.target + After=network.target network-online.target multi-user.target + StartLimitInterval=100 + StartLimitBurst=10 - `yb-tserver.service` + [Path] + PathExists=/home/yugabyte/master/bin/yb-master + PathExists=/home/yugabyte/master/conf/server.conf - ```sh - [Unit] - Description=Yugabyte tserver service - Requires=network-online.target - After=network.target network-online.target multi-user.target - StartLimitInterval=100 - StartLimitBurst=10 - - [Path] - PathExists=/home/yugabyte/tserver/bin/yb-tserver - PathExists=/home/yugabyte/tserver/conf/server.conf - - [Service] - User=yugabyte - Group=yugabyte - # Start - ExecStart=/home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/server.conf - Restart=on-failure - RestartSec=5 - # Stop -> SIGTERM - 10s - SIGKILL (if not stopped) [matches existing cron behavior] - KillMode=process - TimeoutStopFailureMode=terminate - KillSignal=SIGTERM - TimeoutStopSec=10 - FinalKillSignal=SIGKILL - # Logs - StandardOutput=syslog - StandardError=syslog - # ulimit - LimitCORE=infinity - LimitNOFILE=1048576 - LimitNPROC=12000 - - [Install] - WantedBy=default.target - ``` + [Service] + User=yugabyte + Group=yugabyte + # Start + ExecStart=/home/yugabyte/master/bin/yb-master --flagfile /home/yugabyte/master/conf/server.conf + Restart=on-failure + RestartSec=5 + # Stop -> SIGTERM - 10s - SIGKILL (if not stopped) [matches existing cron behavior] + KillMode=process + TimeoutStopFailureMode=terminate + KillSignal=SIGTERM + TimeoutStopSec=10 + FinalKillSignal=SIGKILL + # Logs + StandardOutput=syslog + StandardError=syslog + # ulimit + LimitCORE=infinity + LimitNOFILE=1048576 + LimitNPROC=12000 - `yb-zip_purge_yb_logs.service` + [Install] + WantedBy=default.target + ``` - ```sh - [Unit] - Description=Yugabyte logs - Wants=yb-zip_purge_yb_logs.timer - - [Service] - User=yugabyte - Group=yugabyte - Type=oneshot - WorkingDirectory=/home/yugabyte/bin - ExecStart=/bin/sh /home/yugabyte/bin/zip_purge_yb_logs.sh - - [Install] - WantedBy=multi-user.target - ``` + `yb-tserver.service` - `yb-zip_purge_yb_logs.timer` + ```sh + [Unit] + Description=Yugabyte tserver service + Requires=network-online.target + After=network.target network-online.target multi-user.target + StartLimitInterval=100 + StartLimitBurst=10 - ```sh - [Unit] - Description=Yugabyte logs - Requires=yb-zip_purge_yb_logs.service - - [Timer] - User=yugabyte - Group=yugabyte - Unit=yb-zip_purge_yb_logs.service - # Run hourly at minute 0 (beginning) of every hour - OnCalendar=00/1:00 - - [Install] - WantedBy=timers.target - ``` + [Path] + PathExists=/home/yugabyte/tserver/bin/yb-tserver + PathExists=/home/yugabyte/tserver/conf/server.conf - `yb-clean_cores.service` + [Service] + User=yugabyte + Group=yugabyte + # Start + ExecStart=/home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/server.conf + Restart=on-failure + RestartSec=5 + # Stop -> SIGTERM - 10s - SIGKILL (if not stopped) [matches existing cron behavior] + KillMode=process + TimeoutStopFailureMode=terminate + KillSignal=SIGTERM + TimeoutStopSec=10 + FinalKillSignal=SIGKILL + # Logs + StandardOutput=syslog + StandardError=syslog + # ulimit + LimitCORE=infinity + LimitNOFILE=1048576 + LimitNPROC=12000 - ```sh - [Unit] - Description=Yugabyte clean cores - Wants=yb-clean_cores.timer - - [Service] - User=yugabyte - Group=yugabyte - Type=oneshot - WorkingDirectory=/home/yugabyte/bin - ExecStart=/bin/sh /home/yugabyte/bin/clean_cores.sh - - [Install] - WantedBy=multi-user.target - ``` + [Install] + WantedBy=default.target + ``` - `yb-clean_cores.timer` + `yb-zip_purge_yb_logs.service` - ```sh - [Unit] - Description=Yugabyte clean cores - Requires=yb-clean_cores.service - - [Timer] - User=yugabyte - Group=yugabyte - Unit=yb-clean_cores.service - # Run every 10 minutes offset by 5 (5, 15, 25...) - OnCalendar=*:0/10:30 - - [Install] - WantedBy=timers.target - ``` + ```sh + [Unit] + Description=Yugabyte logs + Wants=yb-zip_purge_yb_logs.timer - `yb-collect_metrics.service` + [Service] + User=yugabyte + Group=yugabyte + Type=oneshot + WorkingDirectory=/home/yugabyte/bin + ExecStart=/bin/sh /home/yugabyte/bin/zip_purge_yb_logs.sh - ```sh - [Unit] - Description=Yugabyte collect metrics - Wants=yb-collect_metrics.timer - - [Service] - User=yugabyte - Group=yugabyte - Type=oneshot - WorkingDirectory=/home/yugabyte/bin - ExecStart=/bin/bash /home/yugabyte/bin/collect_metrics_wrapper.sh - - [Install] - WantedBy=multi-user.target - ``` + [Install] + WantedBy=multi-user.target + ``` - `yb-collect_metrics.timer` + `yb-zip_purge_yb_logs.timer` - ```sh - [Unit] - Description=Yugabyte collect metrics - Requires=yb-collect_metrics.service - - [Timer] - User=yugabyte - Group=yugabyte - Unit=yb-collect_metrics.service - # Run every 1 minute - OnCalendar=*:0/1:0 - - [Install] - WantedBy=timers.target - ``` + ```sh + [Unit] + Description=Yugabyte logs + Requires=yb-zip_purge_yb_logs.service + + [Timer] + User=yugabyte + Group=yugabyte + Unit=yb-zip_purge_yb_logs.service + # Run hourly at minute 0 (beginning) of every hour + OnCalendar=00/1:00 + + [Install] + WantedBy=timers.target + ``` + + `yb-clean_cores.service` + + ```sh + [Unit] + Description=Yugabyte clean cores + Wants=yb-clean_cores.timer + + [Service] + User=yugabyte + Group=yugabyte + Type=oneshot + WorkingDirectory=/home/yugabyte/bin + ExecStart=/bin/sh /home/yugabyte/bin/clean_cores.sh + + [Install] + WantedBy=multi-user.target + ``` + + `yb-clean_cores.timer` + + ```sh + [Unit] + Description=Yugabyte clean cores + Requires=yb-clean_cores.service + + [Timer] + User=yugabyte + Group=yugabyte + Unit=yb-clean_cores.service + # Run every 10 minutes offset by 5 (5, 15, 25...) + OnCalendar=*:0/10:30 + + [Install] + WantedBy=timers.target + ``` + + `yb-collect_metrics.service` + + ```sh + [Unit] + Description=Yugabyte collect metrics + Wants=yb-collect_metrics.timer + + [Service] + User=yugabyte + Group=yugabyte + Type=oneshot + WorkingDirectory=/home/yugabyte/bin + ExecStart=/bin/bash /home/yugabyte/bin/collect_metrics_wrapper.sh + + [Install] + WantedBy=multi-user.target + ``` + + `yb-collect_metrics.timer` + + ```sh + [Unit] + Description=Yugabyte collect metrics + Requires=yb-collect_metrics.service + + [Timer] + User=yugabyte + Group=yugabyte + Unit=yb-collect_metrics.service + # Run every 1 minute + OnCalendar=*:0/1:0 + + [Install] + WantedBy=timers.target + ``` + +1. For CentOS 7, ensure that you also add the following service files to the `/etc/systemd/system` directory (set their ownerships to the `yugabyte` user and 0644 permissions): + + `yb-bind_check.service` + + ```sh + [Unit] + Description=Yugabyte IP Bind Check + Requires=network-online.target + After=network.target network-online.target multi-user.target + Before=yb-controller.service yb-tserver.service yb-master.service yb-collect_metrics.timer + StartLimitInterval=100 + StartLimitBurst=10 + + [Path] + PathExists=/home/yugabyte/controller/bin/yb-controller-server + PathExists=/home/yugabyte//controller/conf/server.conf + + [Service] + # Start + ExecStart=/home/yugabyte/controller/bin/yb-controller-server \ + --flagfile /home/yugabyte/controller/conf/server.conf \ + --only_bind --logtostderr + Type=oneshot + KillMode=control-group + KillSignal=SIGTERM + TimeoutStopSec=10 + # Logs + StandardOutput=syslog + StandardError=syslog + + [Install] + WantedBy=default.target + ``` + + `yb-controller.service` + + ```sh + [Unit] + Description=Yugabyte Controller + Requires=network-online.target + After=network.target network-online.target multi-user.target + StartLimitInterval=100 + StartLimitBurst=10 + + [Path] + PathExists=/home/yugabyte/controller/bin/yb-controller-server + PathExists=/home/yugabyte/controller/conf/server.conf + + [Service] + User=yugabyte + Group=yugabyte + # Start + ExecStart=/home/yugabyte/controller/bin/yb-controller-server \ + --flagfile /home/yugabyte/controller/conf/server.conf + Restart=always + RestartSec=5 + # Stop -> SIGTERM - 10s - SIGKILL (if not stopped) [matches existing cron behavior] + KillMode=control-group + TimeoutStopFailureMode=terminate + KillSignal=SIGTERM + TimeoutStopSec=10 + FinalKillSignal=SIGKILL + # Logs + StandardOutput=syslog + StandardError=syslog + # ulimit + LimitCORE=infinity + LimitNOFILE=1048576 + LimitNPROC=12000 + + [Install] + WantedBy=default.target + ``` ## Remove YugabyteDB components from the server From 96c3f0541702f9574ec69f70d96e07075508230a Mon Sep 17 00:00:00 2001 From: Dwight Hodge Date: Thu, 8 Jun 2023 00:13:32 -0400 Subject: [PATCH 3/5] minor format --- .../set-up-cloud-provider/on-premises.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md index d5fbd4f6a412..86580b0c5bf5 100644 --- a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md +++ b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md @@ -161,8 +161,7 @@ You can manually provision each node using the preprovisioning Python script, as sudo docker exec -it yugaware bash ``` -1. Copy and paste the Python script prompted via the UI and substitute for a node IP address and mount points. -Optionally, use the `--ask_password` flag if the sudo user requires password authentication, as follows: +1. Copy and paste the Python script prompted via the UI and substitute for a node IP address and mount points. Optionally, use the `--ask_password` flag if the sudo user requires password authentication, as follows: ```bash /opt/yugabyte/yugaware/data/provision/9cf26f3b-4c7c-451a-880d-593f2f76efce/provision_instance.py --ip 10.9.116.65 --mount_points /data --ask_password From 9b2cd657e66da2696b910e3bdbd1afa6c8a59ef0 Mon Sep 17 00:00:00 2001 From: Dwight Hodge Date: Thu, 8 Jun 2023 01:08:32 -0400 Subject: [PATCH 4/5] review comments --- .../set-up-cloud-provider/on-premises.md | 14 +++++++++++--- .../manage-deployments/remove-nodes.md | 4 ++-- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md index 86580b0c5bf5..a592ce7a809f 100644 --- a/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md +++ b/docs/content/v2.16/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/on-premises.md @@ -557,6 +557,16 @@ As an alternative to setting crontab permissions, you can install systemd-specif /bin/systemctl restart yb-collect_metrics, \ /bin/systemctl enable yb-collect_metrics, \ /bin/systemctl disable yb-collect_metrics, \ + /bin/systemctl start yb-bind_check, \ + /bin/systemctl stop yb-bind_check, \ + /bin/systemctl restart yb-bind_check, \ + /bin/systemctl enable yb-bind_check, \ + /bin/systemctl disable yb-bind_check, \ + /bin/systemctl start yb-controller, \ + /bin/systemctl stop yb-controller, \ + /bin/systemctl restart yb-controller, \ + /bin/systemctl enable yb-controller, \ + /bin/systemctl disable yb-controller, \ /bin/systemctl daemon-reload ``` @@ -748,8 +758,6 @@ As an alternative to setting crontab permissions, you can install systemd-specif WantedBy=timers.target ``` -1. For CentOS 7, ensure that you also add the following service files to the `/etc/systemd/system` directory (set their ownerships to the `yugabyte` user and 0644 permissions): - `yb-bind_check.service` ```sh @@ -826,7 +834,7 @@ As an alternative to setting crontab permissions, you can install systemd-specif As described in [Eliminate an unresponsive node](../../../manage-deployments/remove-nodes/), when a node enters an undesirable state, you can delete the node, with YugabyteDB Anywhere clearing up all the remaining artifacts except the `prometheus` and `yugabyte` user. -You can manually remove Yugabyte components from existing server images. Before attempting this, you have to determine whether or not YugabyteDB Anywhere is operational. If it is, you either need to delete the universe or delete the nodes from the universe. +You can manually remove YugabyteDB components from existing server images. Before attempting this, you have to determine whether or not YugabyteDB Anywhere is operational. If it is, you either need to delete the universe or delete the nodes from the universe. To completely eliminate all traces of YugabyteDB Anywhere and configuration, you should consider reinstalling the operating system image (or rolling back to a previous image, if available). diff --git a/docs/content/v2.16/yugabyte-platform/manage-deployments/remove-nodes.md b/docs/content/v2.16/yugabyte-platform/manage-deployments/remove-nodes.md index 3c921e58cf0e..bd72929ad822 100644 --- a/docs/content/v2.16/yugabyte-platform/manage-deployments/remove-nodes.md +++ b/docs/content/v2.16/yugabyte-platform/manage-deployments/remove-nodes.md @@ -15,7 +15,7 @@ If a virtual machine or a physical server in a universe reaches its end of life ![Unreachable Node Actions](/images/ee/node-actions-unreachable.png) -When this happens, new Master leaders are elected for the underlying data shards, but since the universe enters a partially under-replicated state, it would not be able to tolerate additional failures. To remedy the situation, you can eliminate the unreachable node by taking actions in the following sequence: +When this happens, new Master leaders are elected for the underlying data shards, but because the universe enters a partially under-replicated state, it would not be able to tolerate additional failures. To remedy the situation, you can eliminate the unreachable node by taking actions in the following sequence: - Step 1: [Remove node](#remove-node) - Step 2: [Start a new Master process](#start-a-new-master-process), if necessary @@ -74,6 +74,7 @@ A typical universe has an RF of 3 or 5. At the end of the [node removal](#remove ![Start master](/images/yp/start-master.png) When you execute the start Master action, YugabyteDB Anywhere performs the following: + 1. Configures the Master on the subject node. 2. Starts a new Master process on the subject node (in Shell mode). @@ -82,7 +83,6 @@ When you execute the start Master action, YugabyteDB Anywhere performs the follo 4. Updates the Master addresses g-flag on all other nodes to inform them of the new Master. - ## Release node instance To release the IP address associated with the **yb-15-aws-ys-n6** node, click its corresponding **Actions > Release Instance**. This changes the value in the **Status** column from **Removed** to **Decommissioned**. From 0d328d99cc0c0a711be9b1271961317465929970 Mon Sep 17 00:00:00 2001 From: Dwight Hodge Date: Thu, 8 Jun 2023 02:34:27 -0400 Subject: [PATCH 5/5] format --- .../set-up-cloud-provider/vmware-tanzu.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/content/preview/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/vmware-tanzu.md b/docs/content/preview/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/vmware-tanzu.md index 03199b45070b..2f511661423c 100644 --- a/docs/content/preview/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/vmware-tanzu.md +++ b/docs/content/preview/yugabyte-platform/configure-yugabyte-platform/set-up-cloud-provider/vmware-tanzu.md @@ -92,7 +92,7 @@ For information on the Kubernetes Provider settings, refer to [Provider settings To add service-level annotations, use the following [overrides](../kubernetes/#overrides): -```config +```yaml serviceEndpoints: - name: "yb-master-service" type: "LoadBalancer" @@ -115,19 +115,19 @@ serviceEndpoints: To disable LoadBalancer, use the following overrides: -```configuration +```yaml enableLoadBalancer: False ``` To change the cluster domain name, use the following overrides: -```configuration +```yaml domainName: my.cluster ``` To add annotations at the StatefulSet level, use the following overrides: -```configuration +```yaml networkAnnotation: annotation1: 'foo' annotation2: 'bar' @@ -193,7 +193,7 @@ Depending on the cloud providers configured for your YugabyteDB Anywhere, you ca To provision in AWS or GCP cloud, your overrides should include the appropriate `provider_type` and `region_codes` as an array, as follows: -```configuration +```yaml { "universe_name": "cloud-override-demo", "provider_type": "gcp", # gcp for Google Cloud, aws for Amazon Web Service @@ -203,7 +203,7 @@ To provision in AWS or GCP cloud, your overrides should include the appropriate To provision in Kubernetes, your overrides should include the appropriate `provider_type` and `kube_provider` type, as follows: -```configuration +```yaml { "universe_name": "cloud-override-demo", "provider_type": "kubernetes", @@ -215,7 +215,7 @@ To provision in Kubernetes, your overrides should include the appropriate `provi To override the number of nodes, include the `num_nodes` with the desired value, and then include this parameter along with other parameters for the cloud provider, as follows: -```configuration +```yaml { "universe_name": "cloud-override-demo", "num_nodes": 4 # default is 3 nodes. @@ -226,7 +226,7 @@ To override the number of nodes, include the `num_nodes` with the desired value, To override the replication factor, include `replication` with the desired value, and then include this parameter along with other parameters for the cloud provider, as follows: -```configuration +```yaml { "universe_name": "cloud-override-demo", "replication": 5, @@ -240,7 +240,7 @@ To override the replication factor, include `replication` with the desired value To override the volume settings, include `num_volumes` with the desired value, as well as `volume_size` with the volume size in GB for each of those volumes. For example, to have two volumes with 100GB each, overrides should be specified as follows: -```configuration +```yaml { "universe_name": "cloud-override-demo", "num_volumes": 2, @@ -252,7 +252,7 @@ To override the volume settings, include `num_volumes` with the desired value, a To override the YugabyteDB software version to be used, include `yb_version` with the desired value, ensuring that this version exists in YugabyteDB Anywhere, as follows: -```configuration +```yaml { "universe_name": "cloud-override-demo", "yb_version": "1.1.6.0-b4"