Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchguard #547

Merged
merged 6 commits into from
Jul 16, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 170 additions & 6 deletions aion.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,24 @@ cd "$(dirname $(realpath $0))"
KERVER=$(uname -r | grep -o "^4\.")

if [ "$KERVER" != "4." ]; then
echo "Warning! The linux kernel version must great or equal than 4."
echo "Warning! The linux kernel must be greater than or equal to version 4."
fi

HW=$(uname -m)

if [ "$HW" != "x86_64" ]; then
echo "Warning! Aion blockchain platform must be running on the 64 bits architecture"
echo "Warning! Aion blockchain platform must be running on 64 bits architecture"
fi

DIST=$(lsb_release -i | grep -o "Ubuntu")

if [ "$DIST" != "Ubuntu" ]; then
echo "Warning! Aion blockchain is fully compatible with the Ubuntu distribution. Your current system is not Ubuntu distribution. It may has some issues."
echo "Warning! Aion blockchain is fully compatible with Ubuntu distribution. Your current system is not Ubuntu distribution. It may have some issues."
fi

MAJVER=$(lsb_release -r | grep -o "[0-9][0-9]" | sed -n 1p)
if [ "$MAJVER" -lt "16" ]; then
echo "Warning! Aion blockchain is fully compatible with the Ubuntu version 16.04. Your current system is older than Ubuntu 16.04. It may has some issues."
echo "Warning! Aion blockchain is fully compatible with Ubuntu version 16.04. Your current system is older than Ubuntu 16.04. It may have some issues."
fi

ARG=$@
Expand All @@ -37,5 +37,169 @@ ARG=$@
# add execute permission to rt
chmod +x ./rt/bin/*

env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
-cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@"


####### WATCHGUARD IMPLEMENTATION #######
# #
# To kill: ps aux | egrep "aion.sh" #
# #
# Wait - reboot timer condition #
# Sample - kernel checking rate #
# Tolerance - kernel dead condition #
# ThreadRate - thread checking rate #
# (threadRate * sample) #
# #
#########################################

# Enable "watch" as first command line argument
guard=false
if [[ $1 == "watch" ]]; then
first=true
for arg in $@; do
if $first; then
set --
first=false
else
set -- "$@" "$arg"
fi
done
guard=true
fi

if $guard; then

echo "######################"
echo " Watchguard Enabled "
echo "######################"
echo

wait=300 # sec
sample=30 # sec
tolerance=60 # sec
threadRate=2 # rate

noInterrupt=true
countRebounce=0
running=false
watching=false
lastBoot=0

trap "exit" SIGINT SIGTERM
trap "interrupt" EXIT
function interrupt() {

# Interrupts the Aion kernel and awaits shutdown complete
if $running; then
kill $kPID
temp=$(ps --pid $kPID | egrep -o "$kPID")
while [[ $temp -eq $kPID ]] ; do
sleep 2s
temp=$(ps --pid $kPID | egrep -o "$kPID")
done
running=false
watching=false
noInterrupt=false
fi

# Interrupts the watchguard (current process)
kill -9 $$

}

# Keep executing aion kernel until interrupted
while $noInterrupt; do

# Reboot timer
newBoot=$(date +"%s")
let "duration=$newBoot-$lastBoot"
if [[ $duration -lt $wait ]]; then
let "sleep=$wait-$duration"
echo "[Reboot Timer: $sleep]"
sleep $sleep
fi
lastBoot=$newBoot

# Execute Java kernel
env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
-cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@" &
kPID=$!
running=true
watching=true
checkRate=0
tPrev=0

# Locate logger detail
config=config/config.xml
logging=$(egrep -o "log-file.*log-file" $config | cut -d">" -f2 | cut -d"<" -f1)
logpath=$(egrep -o "log-path.*log-path" $config | cut -d">" -f2 | cut -d"<" -f1)
file=$logpath/aionCurrentLog.dat

# Watchguard
while $watching; do

sleep $sample

# [1] Log timestamp (last 60 sec) OR [2] PID process state ZOMBIE/DEAD
if $logging; then
last=$(stat $file | egrep "Modify\:" | cut -d" " -f3 | cut -d"." -f1)
lastUTC=$(date --date="$last" +"%s")
nowUTC=$(date +"%s")
if [[ $lastUTC -lt $((nowUTC-$tolerance)) ]] || (ps $kPID | cut -c 16-22 | egrep -v "STAT" | egrep -q "Z"); then
echo "## KERNEL DEAD FOR $tolerance SEC ##"
watching=false
fi
fi

# [1] Thread runtime (last 60 sec) AND [2] PID thread state BLOCKED
if [ $checkRate -eq $threadRate ]; then
let "duration=$sample*$threadRate"
temp='p2p-in sync-ib'
threads=($temp)
checkRate=0
for ((i=0; i<${#threads[@]}; ++i)); do
tTime=$(ps --pid $kPID | egrep -o "[0-9]{2}\.[0-9]{2} ${threads[i]}" | cut -d" " -f1)
tState=$(jstack -l $kPID | egrep -A1 "${threads[i]}" | egrep -o "State.*" | cut -d" " -f2)
if [[ $tTime == ${tPrev[i]} ]] && [[ $tState == "BLOCKED" ]]; then
echo "## ${threads[i]} THREAD DEAD ##"
(jstack -l $kPID | egrep -A1 "${threads[i]}") > threadDump_$countRebounce.txt
watching=false
fi
tPrev[i]=$tTime
done
else
((checkRate++))
fi

done

# Shutsdown Aion kernel
echo "## Killing Kernel ##"
kill $kPID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is it possible for this script to get stuck if the kernel does not respond to a kill (SIGINT?) request.

Is there anywhere that was escalate to a SIGKILL (for example if the kernel is still alive after 1 minute)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill add a shutdown timer to force the script to kill the process

timer=0
temp=$(ps --pid $kPID | egrep -o "$kPID")
while [[ $temp -eq $kPID ]] ; do

# If shutdown exceeds 1 minute
((timer+=2))
if [[ $timer -ge 60 ]]; then
kill -9 $kPID
fi

sleep 2s
temp=$(ps --pid $kPID | egrep -o "$kPID")

done
running=false

((countRebounce++))
echo
echo "############################## REBOUNCE COUNT [$countRebounce] ##############################"

done

else

env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
-cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@"

fi
22 changes: 17 additions & 5 deletions modBoot/src/org/aion/Aion.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
*/
package org.aion;

import static java.lang.System.exit;
import static org.aion.crypto.ECKeyFac.ECKeyType.ED25519;
import static org.aion.crypto.HashUtil.H256Type.BLAKE2B_256;
import static org.aion.zero.impl.Version.KERNEL_VERSION;
Expand Down Expand Up @@ -60,7 +61,7 @@ public static void main(String args[]) {
CfgAion cfg = CfgAion.inst();
if (args != null && args.length > 0) {
int ret = new Cli().call(args, cfg);
System.exit(ret);
exit(ret);
}

/*
Expand All @@ -78,15 +79,26 @@ public static void main(String args[]) {
throw e;
}

/*
* Ensuring valid UUID in the config.xml
* Valid UUID: 32 Hex in 5 Groups [0-9A-F]
* 00000000-0000-0000-0000-000000000000
*/
String UUID = cfg.getId();
if (!UUID.matches("[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}")) {
System.out.println("Invalid UUID; please check <id> setting in config.xml");
exit(-1);
}

/* Outputs relevant logger configuration */
if (!cfg.getLog().getLogFile()) {
System.out
.println("Logger disabled; to enable please check log settings in config.xml\n");
.println("Logger disabled; to enable please check <log> settings in config.xml");
} else if (!cfg.getLog().isValidPath() && cfg.getLog().getLogFile()) {
System.out.println("File path is invalid; please check log setting in config.xml\n");
System.out.println("Invalid file path; please check <log> setting in config.xml");
return;
} else if (cfg.getLog().isValidPath() && cfg.getLog().getLogFile()) {
System.out.println("Logger file path: '" + cfg.getLog().getLogPath() + "'\n");
System.out.println("Logger file path: '" + cfg.getLog().getLogPath() + "'");
}

/*
Expand Down Expand Up @@ -212,4 +224,4 @@ private ShutdownThreadHolder(Thread zmqThread, IMineRunner nm, ProtocolProcessor

}, "shutdown"));
}
}
}