Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts and documentation on Cassandra operation #172

Merged
merged 1 commit into from
May 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,11 @@
1. `docker-compose build`
1. `docker-compose up -d`

## Cassandra

Documentation and scripts to deploy and operate cassandra in
production are available on [scripts/cassandra](scripts/cassandra).

## Backup and restore

In the scripts/ folder there are backup and restore scripts for docker postgres.
Expand Down
274 changes: 274 additions & 0 deletions scripts/cassandra/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
Cassandra Operation
=================

* [Installing cassandra](#installing-cassandra)
* [Configuring cassandra](#configuring-cassandra)
* [Starting and stopping cassandra](#starting-and-stopping-cassandra)
* [Starting Cassandra](#starting-cassandra)
* [Stopping Cassandra](#stopping-cassandra)
* [Enabling cassandra service to auto-start on boot](#enabling-cassandra-service-to-auto-start-on-boot)
* [Adding a new cassandra node](#adding-a-new-cassandra-node)
* [Migrating cassandra to another host](#migrating-cassandra-to-another-host)
* [Upgrading Cassandra](#upgrading-cassandra)
* [Before the upgrade](#before-the-upgrade)
* [Minor upgrade](#minor-upgrade)
* [Major upgrade](#major-upgrade)
* [Testing the upgrade](#testing-the-upgrade)
* [Performing the major upgrade](#performing-the-major-upgrade)
* [Rollback from upgrade](#rollback-from-upgrade)
* [After the uprade](#after-the-uprade)
* [Backup & Restore](#backup--restore)
* [Backup](#backup)
* [Restore](#restore)
* [Repair data after node failure or backup recovery](#repair-data-after-node-failure-or-backup-recovery)

# Installing cassandra

1. Copy the sample `env.example` file to `.env` and edit variables with the proper configuration for this node:

`cp env.example .env`

2. Install cassandra version `X.Y.Z` with the following command:

`./install_cassandra.sh X.Y.Z`

By default this script runs in dry-run mode so you can double check the install and configuration commands.
Use the -x flag (ie. `./install_cassandra.sh -x X.Y.Z`) to perform the actual installation.

Cassandra is installed on the `/opt/apache-cassandra-X.Y.Z` directory. A symlink is created from `/opt/cassandra`
to `/opt/apache-cassandra-X.Y.Z`. The cassandra service is installed on systemctl and is disabled by default.
The nodetool and cqlsh commands are placed on `/usr/local/bin`.

The following variables are set during install:
* CASSANDRA\_HOME=/opt/cassandra
* CASSANDRA\_CONF=/opt/cassandra/conf
* CASSANDRA\_LOG\_DIR=/var/log/cassandra
* cassandra\_storagedir=/var/lib/cassandra

This installation is targeted at Ubuntu systems and was tested with Ubuntu 20.04.

# Configuring cassandra

The installation script will automatically configure cassandra according to the parameters specified on the `.env` file.
If any changes need to be made in the configuration after the `.env` file is updated, run the following command:

`./configure-cassandra -x`

Please note the `-x` flag must be specified, otherwise the script will be run in dry-run mode.

The important parameters to specify are CLUSTER\_NAME, SEEDS, LISTEN\_ADDRESS, RPC\_ADDRESS.

Please note that any parameters manually specified on /opt/cassandra/conf/cassandra.yaml may be lost when this script is
run so it's important to update the script to take into account new parameters.

# Starting and stopping cassandra

## Starting Cassandra

Start the cassandra service with:

`sudo service cassandra start`

Check that cassandra was started without errors by inspecting the log on `/var/log/cassandra/system.log`.

## Stopping Cassandra

Before stopping cassandra, it's recommended to drain the node so all data is flushed to disk with:

`nodetool drain`

After the node is drain stop the cassandra service with:

`sudo service cassandra stop`

## Enabling cassandra service to auto-start on boot

Use the following command to configure the node to automatically start cassandra if the node is restarted:

`sudo systemctl enable cassandra.service`

# Adding a new cassandra node

New nodes must be added when there are performance issues on the cluster or when the disk capacity reaches around 75%.
In order to add a new node just install cassandra according to the instructions above and make sure to set the
CLUSTER\_NAME and SEEDS to point to the cluster where you want the new node to join.

# Migrating cassandra to another host

1. Install and configure the desired version of cassandra in the new server with:

`./install_cassandra.sh -x <version>`.

Make sure the new server IP is configured on the `.env` file.

2. Create the storage directory `/var/lib/cassandra` on the new host with:

`mkdir -p /var/lib/cassandra`

3. On the new host, copy the data directory from the old host with rsync:

`rsync -aczP --stats user@old_host:/var/lib/cassandra/data /var/lib/cassandra/`

5. Drain and stop the original cassandra node being migrated:

```
nodetool drain
service cassandra stop
```

6. Run rsync again on the new host so the remaining data is copied over:

`rsync -aczP --stats user@old_host:/var/lib/cassandra/data /var/lib/cassandra/`

7. Make sure the newly copied data is owned by the `cassandra` user:

`chown -R cassandra:cassandra /var/lib/cassandra/`

8. Start cassandra on the new host with:

`service cassandra start`

9. Check that the system logs for any errors during startup:

`tail -f /var/log/cassandra/system.log`

10. Check that the migrated node IP is seen by other hosts with:

`nodetool status`

# Upgrading Cassandra

## Before the upgrade

It's recommended to perform a snapshot of the all tables before doing the upgrade as a precaution step in case something goes wrong during the upgrade:

`nodetool snapshot`

This will create a snapshot on `/var/lib/cassandra/data/<table>/snapshots/<snapshot_id>`.

## Minor upgrade

Performing an upgrade on Cassandra between patch versions is very simple (ie. from version 3.11.2 to 3.11.7).
The upgrade must be performed in a rolling-restart manner one node at a time with the following steps on each node:

1. Drain and stop cassandra:

```
nodetool drain
service cassandra stop
```

2. Install the new cassandra version with:

`./install_cassandra.sh <version>`

3. Start the cassandra process on the new version:

`service cassandra start`

4. Check logs and application to verify everything is running correctly before upgrading the next node.

## Major upgrade

### Testing the upgrade

Since there can be changes in the data format between major versions, it is recommended to test the upgrade in a separate node
before performing an upgrade between major versions (ie. from 3.11 to 4.0). Peform the following steps to test a major upgrade:

1. Install and configure the desired version of cassandra in the new server with:

`./install_cassandra.sh -x <version>`.

Make sure the new server has a different CLUSTER\_NAME specified on the .env file
so it doesn't accidentaly join the old cluster.

2. Export the kairosdb schema from a running node:

`cqlsh -u USER -p PASS -e "DESC KEYSPACE kairosdb" NODE_IP > kairosdb_schema.cql`

3. Start cassandra on the new node with:

`service cassandra start`

4. Import the created schema by entering cqlsh and use the following command:

```
cqlsh -u USER -p PASS NODE_IP
SOURCE 'kairosdb_schema.cql'
```

5. Stop the cassandra node:

`service cassandra stop`

6. Copy the kairosdb data files from a node in the previous version:

`rsync -aczP --stats user@old_host:/var/lib/cassandra/data/kairosdb/ /var/lib/cassandra/data/kairosdb`

5. Start the cassandra service:

`service cassandra start`

7. Perform some queries via cqlsh or point the staging applicaton to this cassandra server and verify
the data is being read correctly without errors.

### Performing the major upgrade

The steps are the same for performing a minor upgrade, except that after the upgrade is completed
you must run the following command after the upgrade on each node (before moving to next):

`nodetool upgradesstables`

This command will ensure data files are upgaded to the newer version and may take a while to run.

## Rollback from upgrade

Failed upgrades are very unlikely, but in case it happens, perform the following steps to rollback a node:

1. Stop cassandra server that was upgraded:

`service cassandra stop`

2. Restore the old version with:

`ln -snf /opt/apache-cassandra-<old-version> /opt/cassandra`

3. Replace all the data from `/var/lib/cassandra/data/<table>` with `/var/lib/cassandra/data/<table>/snapshots/<snapshot_id>` for all tables.

4. Clean all data from `/var/lib/cassandra/data/commitlogs`, `/var/lib/cassandra/data/hints` and `/var/lib/cassandra/data/saved_caches`.

5. Start cassandra

`service cassandra start`

## After the uprade

If everything goes well with the upgrade, don't forget to clean the snapshot files with:

`nodetool clearsnapshot`

# Backup & Restore

## Backup

The simplest way to backup a cassandra node is to use the cloud provider's VM snapshot feature.

## Restore

After restoring the snapshot VM from your cloud provider, you need to update the node's IP address
on the `.env` file on `LISTEN_ADDRESS`, `RPC_ADDRESS` and `SEEDS` and run the following command:

`./configure_cassandra.sh -x`

After that start the cassandra process with:

`service cassandra start`

After restoring the node from backup it's recommended to run repair (as instructed below).

# Repair data after node failure or backup recovery

If a node is down for longer than 3 hours (max\_hint\_window\_in\_ms), run the following command to
make the node synchronize data with other nodes in the cluster:

`./repair_node.sh USERNAME PASSWORD NODE_PRIVATE_IP`
30 changes: 30 additions & 0 deletions scripts/cassandra/cassandra-rackdc.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
#
# When upgrading from SimpleSnitch, you will need to set your initial machines
# to have rack=rack1
dc=DC1
rack=RAC1

# Add a suffix to a datacenter name. Used by the Ec2Snitch and Ec2MultiRegionSnitch
# to append a string to the EC2 region name.
#dc_suffix=

# Uncomment the following line to make this snitch prefer the internal ip when possible, as the Ec2MultiRegionSnitch does.
# prefer_local=true
17 changes: 17 additions & 0 deletions scripts/cassandra/cassandra-settings.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[Unit]
Description=Cassandra Recommended Settings
DefaultDependencies=no
After=sysinit.target local-fs.target
Before=cassandra.service

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/defrag && \
echo mq-deadline > /sys/block/sda/queue/scheduler && \
echo 0 > /sys/class/block/sda/queue/rotational && \
echo 8 > /sys/class/block/sda/queue/read_ahead_kb && \
echo 0 > /proc/sys/vm/zone_reclaim_mode && \
echo 32 > /sys/block/sda/queue/nr_requests'

[Install]
WantedBy=basic.target
9 changes: 9 additions & 0 deletions scripts/cassandra/cassandra-sysctl.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Settings from https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/install/installRecommendSettings.html
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216i
vm.max_map_count = 1048575
26 changes: 26 additions & 0 deletions scripts/cassandra/cassandra.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# /usr/lib/systemd/system/cassandra.service

[Unit]
Description=Cassandra
After=network.target
StartLimitInterval=200
StartLimitBurst=5

[Service]
Type=forking
PIDFile=/var/lib/cassandra/cassandra.pid
User=cassandra
Group=cassandra
Environment="CASSANDRA_INCLUDE=/opt/cassandra/cassandra.in.sh"
PassEnvironment="CASSANDRA_INCLUDE"
ExecStart=/opt/cassandra/bin/cassandra -p /var/lib/cassandra/cassandra.pid
Restart=always
RestartSec=10
SuccessExitStatus=143
LimitMEMLOCK=infinity
LimitNOFILE=10000
LimitNPROC=32768
LimitAS=infinity

[Install]
WantedBy=multi-user.target
Loading