NOT PORTED YET
- DevOps Bash tools for AWS, EKS, EC2 etc
- Install AWS CLI
- Set up access to EKS - Elastic Kubernetes Services
- EC2 Instances
- Get EC2 Console Output
- Add an EC2 EBS volume
- Resize an EC2 EBS volume
- Remove an EC2 EBS volume from a live running instance
- RDS - Relational Database Service
- Troubleshooting
- Diagrams
Follow the install doc or paste this to run an automated install script which auto-detects and handles Mac or Linux:
git clone https://github.com/HariSekhon/DevOps-Bash-tools
bash-tools/install/install_aws_cli.sh
Then configure depending on if you're using SSO or access keys etc.
A common issue is failing to find resources in the UI or CLI.
Check your region in the top right of the UI or that your CLI is picking up the right region like so:
aws configure get region
and compare with:
aws ec2 describe-availability-zones --query "AvailabilityZones[0].RegionName" --output text
See eks.md
http://aws.amazon.com/ec2/instance-types/
https://aws.amazon.com/ec2/pricing/on-demand/
DO NOT USE T-series (T3 / T2) burstable general instances types for anything besides your own personal PoC.
They can seize up under heavy load and are not recommended for any production workloads.
Find the EC2 instance ID:
aws ec2 describe-instances \
--query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value | [0],Placement.AvailabilityZone]' \
--output table
Debug if you're having issues rebooting a VM:
aws ec2 get-console-output --instance-id "$EC2_INSTANCE_ID" | jq -r .Output
This can also be useful for temporary space increases, eg. add a big /tmp
partition to allow some
migration loads in an Informatica agent, which can be removed later.
(since you cannot shrink partitions later if you enlarge them instead)
Find out the zone the EC2 instance is in - you will need to create the EBS volume in the same zone:
aws ec2 describe-instances \
--query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value | [0],Placement.AvailabilityZone]' \
--output table
Set the Availability Zone environment variable to use in further commands:
AVAILABILITY_ZONE=eu-west-1a # make sure this is same Availability Zone as the VM you want to attach it to
Choose a size in GB:
DISK_SIZE_GB=500
Create an EC2 EBS volume of 500Gb in the eu-west-1a zone where the VM is:
REGION="${AVAILABILITY_ZONE%?}" # auto-infer the region by removing last character
aws ec2 create-volume \
--size "$DISK_SIZE_GB" \
--region "$REGION" \
--availability-zone "$AVAILABILITY_ZONE" \
--volume-type gp3
output:
{
"AvailabilityZone": "eu-west-1a",
"CreateTime": "2024-08-02T11:55:18+00:00",
"Encrypted": false,
"Size": 500,
"SnapshotId": "",
"State": "creating",
"VolumeId": "vol-007e4d5f88a46fb6f",
"Iops": 3000,
"Tags": [],
"VolumeType": "gp3",
"MultiAttachEnabled": false,
"Throughput": 125
}
Set the VolumeId
field to a variable to use in further commands:
VOLUME_ID="vol-007e4d5f88a46fb6f"
Create a description variable to use in next command:
VOLUME_DESCRIPTION="informatica-prod-secure-agent-tmp-volume"
Name the new volume so you know what is it when you look at it in future in the UI:
aws ec2 create-tags \
--resources "$VOLUME_ID" \
--tags Key=Name,Value="$VOLUME_DESCRIPTION"
This can be done with zero downtime while the VM is running.
Look up the EC2 instance ID of the VM you want to attach it to:
aws ec2 describe-instances \
--query 'Reservations[*].Instances[*].[InstanceId, Tags[?Key==`Name`].Value | [0]]' \
--output table
Create a variable with the EC2 instance ID:
EC2_INSTANCE_ID="i-0a1234b5c6d7890e1"
Attach the new disk to the instance giving it a new device name, in this case /dev/sdb
:
aws ec2 attach-volume --device /dev/sdb \
--instance-id "$EC2_INSTANCE_ID" \
--volume-id "$VOLUME_ID"
(you cannot specify /dev/nvme1
as the next disk you see on Nitro VMs but if you specify /dev/sdb
then it will
appear as /dev/nvme1n1
anyway)
Inside the VM - follow the Disk Management commands.
See if the new disk is available:
cat /proc/partitions
If you can't see it yet, run partprobe
:
sudo partprobe
and then repeat the above cat /proc/partitions
(it has also appeared after a few seconds on EC2 without this)
Create a new GPT partition table on the new disk:
sudo parted /dev/nvme1n1 --script mklabel gpt
Create a new partition that spans the entire disk:
sudo parted /dev/nvme1n1 --script mkpart primary 0% 100%
See the new partition:
cat /proc/partitions
Format the partition with XFS:
sudo mkfs.xfs /dev/nvme1n1p1
Verify the new formatting:
lsblk -f /dev/nvme1n1
Since device numbers can change on rare occasion, find and use the UUID instead:
lsblk -o NAME,UUID
Edit /etc/fstab
:
sudo vi /etc/fstab
and add a line like this, substituting the UUID from the above commands:
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /tmp xfs defaults,nofail 0 2
If the mount point is /tmp
make sure you shut down any processes that might be using it first like
Informatica agent.
Then mount it using this short form of the mount
command which tests the fstab at the same time:
sudo mount /tmp
mount: (hint) your fstab has been modified, but systemd still uses
the old version; use 'systemctl daemon-reload' to reload.
sudo systemctl daemon-reload
Check new mounted partition and space is available:
df -Th /tmp
If you've just mounted a new /tmp
make sure to set a sticky bit and world writable permissions for people and apps
to be able to use it:
sudo chmod 1777 /tmp
Start back up any processes that you shut down before mounting the disk.
https://docs.aws.amazon.com/ebs/latest/userguide/recognize-expanded-volume-linux.html
Check the partition sizes by running this inside the EC2 VM shell:
lsblk
List EC2 EBS volumes using script in DevOps-Bash-tools repo:
aws_ec2_ebs_volumes.sh
or find it in the AWS Console UI:
open "https://$AWS_DEFAULT_REGION.console.aws.amazon.com/ec2/home?region=$AWS_DEFAULT_REGION#Volumes:"
Using script in DevOps-Bash-tools repo:
aws_ec2_ebs_create_snapshot_and_wait.sh "$volume_id" "before root partition expansion"
(this script automatically determines and prefixes the name of the EC2 instance to the description)
or manually create and keep checking for completion:
aws ec2 create-snapshot --volume-id "$volume_id" --description "myvm: before root partition expansion"
The snapshot may take a while. Watch its progress at in the AWS Console UI here:
open "https://$AWS_DEFAULT_REGION.console.aws.amazon.com/ec2/home?region=$AWS_DEFAULT_REGION#Snapshots:"
or check for pending snapshots using AWS CLI:
aws ec2 describe-snapshots --query 'Snapshots[?State==`pending`].[SnapshotId,VolumeId,Description,State]' --output table
After the snapshot above is complete, run this script from DevOps-Bash-tools repo:
aws_ec2_ebs_resize_and_wait.sh "$volume_id" "$size_in_gb"
or manually:
aws ec2 modify-volume --volume-id "$voume_id" --size "$size_in_gb"
and then repeatedly manually monitor the modification:
aws ec2 describe-volumes-modifications --volume-ids "$volume_id"
Double check which partition you want to enlarge by running this inside the EC2 VM shell:
lsblk
If the partition is number 4, then
sudo growpart /dev/nvme0n1 4
output should look like this:
CHANGED: partition=4 start=1437696 old: size=417992671 end=419430366 new: size=627707871 end=629145566
verify the new size:
lsblk
Check the filesystem sizes and types:
df -hT
If it's Ext4, extend the filesystem like so:
sudo resize2fs /dev/nvme0n1p4
If it's XFS, extend the filesystem like so, in this case for the /
root filesystem:
sudo xfs_growfs -d /
output should look like this:
meta-data=/dev/nvme0n1p4 isize=512 agcount=86, agsize=610431 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=52249083, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
data blocks changed from 52249083 to 78463483
Verify the new filesystem size:
df -hT
This is only for non-root volumes.
For example if you want to replace the /tmp
disk with a smaller one now that data migration is complete.
IMPORTANT: First shut down any software in the VM using the volume to avoid data corruption
Inside the VM, unmount the volume, eg:
umount /tmp
If you get an error like:
umount: /tmp: target is busy.
check:
lsof /tmp
or
fuser -mv /tmp
and kill those processes or ask users to log out if it's their shell session holding it.
If there is nothing left except:
USER PID ACCESS COMMAND
/tmp: root kernel mount /tmp
You may have to reboot the VM - in which case remove or comment out the disk's mount point entry eg.
entry in this case from /etc/fstab
first to prevent it having a possible boot time error.
You can do the detachment but the volume will still be visible in an ls -l /tmp
and may require a reboot to clear
the state and connection to the EBS volume.
WARNING: do not reboot the EC2 instance without commenting out the disk mount or setting the nofail
option
Otherwise you will be forced to do a disk mount recovery using another EC2 instance as per the EC2 Disk Mount Recovery procedure from the troubleshooting section.
If you do that beware that a Reboot instance may not succeed and you may need a Force Instance Stop
cold shutdown and
startup to clear the state as a regular Reboot may get stuck starting up before SSH comes up to do anything.
From DevOps-Bash-tools list instances and their EBS volumes:
aws_ec2_ebs_volumes.sh
aws ec2 detach-volume --volume-id "$VOLUME_ID" --instance-id "$EC2_INSTANCE_ID" --device "$DEVICE"
List unattached EBS volumes:
aws ec2 describe-volumes --query 'Volumes[?Attachments==`[]`].[VolumeId]' --output table
Optionally deleted the EBS volume if you're 100% sure you don't need it any more:
aws ec2 delete-volume --volume-id "$VOLUME_ID"
Hosted SQL RDBMS like MySQL, PostgreSQL, Microsoft SQL Server etc.
AWS CLI doesn't have a convenient short form for just listing instances, but you can get one like this:
aws rds describe-db-instances | jq -r '.DBInstances[].DBInstanceIdentifier'
with their statuses in a table:
aws rds describe-db-instances --query "DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus]" --output table
(notice this is using AWS CLI query not jq
- hence the different query string format)
Using the name returned from above commands:
aws rds modify-db-instance \
--db-instance-identifier "$RDS_INSTANCE" \
--master-user-password "MyNewVerySecurePassword"
Make sure you are not using T-series (T3 / T2) burstable general purpose instance types.
Change to another instance type if you are.
When Status
becomes Storage Full
on the RDS home page the DB instance writes stop working due to no space to
write DB redo logs for ACID compliance. Reads may still work during this time.
Solution: Ensure Enable storage autoscaling
is ticked and modify the instance to increase the
Maximum Storage Threshold
by a reasonable amount, no less than 20%.
After EKS Spot pod migrations, the app pod sometimes comes up before the Vault pod comes up so its attempt to get the DB password from Vault fails and results in a blank DB password and later DB connection error.
In a Python Django app it may remain up but not functioning and its logs may contain Python tracebacks like this:
MySQldb._exceptions.OperationalError: (1045, "Access denied for user 'myuser'@'x.x.x.x' (using password: YES)")
Restart the app deployment to restart the pod after the Vault pod has come up so that the pod re-fetches the correct DB password from Vault.
kubectl rollout restart deployment <app>
- Create an init container to accurately test for Vault availability before allowing the app pod to come up
- This can test Vault availability
- It can fetch DB password similar to what the app container does
- It can test that the fetched DB password actually works using a test connection to the DB
- The App itself could crash upon startup detection that the DB connection fails to cause the pod to crash and
auto-restart until the DB password is fetched and connected successfully
- The DB connection and implicitly the Vault password load could be tested by the entrypoint trying to connect to the DB before starting the app
This is sometimes necessary when a Linux VM isn't coming up due to some disk changes such as detaching and deleting a
volume that is still in /etc/fstab
or some other configuration imperfection that is preventing the boot process from
completing to give you SSH access.
Use another EC2 instance in the same Availability Zone as the problematic VM which owns the disk where the EBS volume is physically located.
- Shut down the problem instance which isn't booting.
- Optional: mark the instance with tags
Name1
=Problem
to make it easier to find - Detach the EBS volume from the problem instance
- Find the volume (optionally using the
Problem
search in the list of EBS volumes) - Attach the EBS volume to your debug EC2 instance in the same Availabilty Zone as device
/dev/sdf
- On the debug instance:
Find the new disk. It's usually the largest partition on the new disk
cat /proc/mounts
Mount it:
mount /dev/xvdf4 /mnt
Edit the fstab:
sudo vi /mnt/etc/fstab
Add the nofail
option to all disk lines mount options 4th field to ensure the Linux OS comes up even if it can't
find a disk (because for example you've detached it to replace it with a different EBS volume):
The lines should end up looking like this:
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / xfs defaults,nofail 0 0
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /tmp xfs defaults,nofail 0 2
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /boot xfs defaults,nofail 0 0
UUID=xxxx-xxxx /boot/efi vfat defaults,uid=0,gid=0,umask=077,shortname=winnt,nofail 0 2
After editing and saving the /etc/fstab
file, unmount the recovery disk:
sudo umount /mnt
- Detach the volume from the debug instance
- Attach the volume to the original instance
- Start the original instance which should now come up
- Remove the
Problem
tag from the volume
Partial port from private Knowledge Base page 2012+