Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchguard #547

Merged
merged 6 commits into from
Jul 16, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 170 additions & 6 deletions aion.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,24 @@ cd "$(dirname $(realpath $0))"
KERVER=$(uname -r | grep -o "^4\.")

if [ "$KERVER" != "4." ]; then
echo "Warning! The linux kernel version must great or equal than 4."
echo "Warning! The linux kernel must be greater than or equal to version 4."
fi

HW=$(uname -m)

if [ "$HW" != "x86_64" ]; then
echo "Warning! Aion blockchain platform must be running on the 64 bits architecture"
echo "Warning! Aion blockchain platform must be running on 64 bits architecture"
fi

DIST=$(lsb_release -i | grep -o "Ubuntu")

if [ "$DIST" != "Ubuntu" ]; then
echo "Warning! Aion blockchain is fully compatible with the Ubuntu distribution. Your current system is not Ubuntu distribution. It may has some issues."
echo "Warning! Aion blockchain is fully compatible with Ubuntu distribution. Your current system is not Ubuntu distribution. It may have some issues."
fi

MAJVER=$(lsb_release -r | grep -o "[0-9][0-9]" | sed -n 1p)
if [ "$MAJVER" -lt "16" ]; then
echo "Warning! Aion blockchain is fully compatible with the Ubuntu version 16.04. Your current system is older than Ubuntu 16.04. It may has some issues."
echo "Warning! Aion blockchain is fully compatible with Ubuntu version 16.04. Your current system is older than Ubuntu 16.04. It may have some issues."
fi

ARG=$@
Expand All @@ -37,5 +37,169 @@ ARG=$@
# add execute permission to rt
chmod +x ./rt/bin/*

env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
-cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@"
#env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
# -cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@" &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete commented code if not needed



####### WATCHGUARD IMPLEMENTATION #######
# #
# To kill: ps aux | egrep "aion.sh" #
# #
# Wait - reboot timer condition #
# Sample - rebounce checking rate #
# Tolerance - kernel dead condition #
# ThreadRate - frequency of checking #
# thread dead condition (sample) #
# #
#########################################

# Enable "watch" as first command line argument
guard=false
if [[ $1 == "watch" ]]; then
first=true
for arg in $@; do
if $first; then
set --
first=false
else
set -- "$@" "$arg"
fi
done
guard=true
fi

if $guard; then

wait=300 # sec
sample=30 # sec
tolerance=60 # sec
threadRate=3 # rate
#wait=30
#sample=1
#tolerance=1
#threadRate=20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete commented stuff if not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops forgot to remove these from testing


config=config/config.xml
logging=$(egrep -o "log-file.*log-file" $config | cut -d">" -f2 | cut -d"<" -f1)
logpath=$(egrep -o "log-path.*log-path" $config | cut -d">" -f2 | cut -d"<" -f1)
file=$logpath/aionCurrentLog.dat

noInterrupt=true
countRebounce=0
running=false
watching=false
lastBoot=0

trap "exit" INT TERM
trap "interrupt" EXIT
function interrupt() {

# Interrupts the Aion kernel and awaits shutdown complete
if $running; then
kill $kPID
temp=$(top -n1 -p $kPID | egrep -o "$kPID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: I would do something like "ps --pid $kPID" instead of "top -n1 -p $kPID" (and everywhere else you use top)

The top command prints a bunch of memory info and other stuff that might get accidentally picked up by the egrep.

while [[ $temp -eq $kPID ]] ; do
sleep 2s
temp=$(top -n1 -p $kPID | egrep -o "$kPID")
done
running=false
watching=false
noInterrupt=false
fi

# Removes remnant processes accessing kernel logfile
if $logging; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What remnant processes are you expecting to find with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to use this when i was trying to kill the kernel using the command line; kill only killed the script but not the kernel - the issue should be fixed now so ill remove this

temp=$(lsof $file | egrep "java" | cut -c 9-13)
remnants=($temp)
for ((i=0; i<${#remnants[@]}; ++i)); do
kill ${remnants[i]}
done
fi

# Interrupts the watchguard (current process)
kill $$

}

# Keep executing aion kernel until interrupted
while $noInterrupt; do

# Reboot timer
newBoot=$(date +"%s")
let "duration=$newBoot-$lastBoot"
if [[ $duration -lt $wait ]]; then
let "sleep=$wait-$duration"
echo "[Reboot Timer: $sleep]"
sleep $sleep
fi
lastBoot=$newBoot
echo

# Execute Java kernel
env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
-cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@" &
kPID=$!
running=true
watching=true
checkRate=0
tPrev=0

# Watchguard
while $watching; do

sleep $sample

# [1] Log timestamp (last 60 sec) OR [2] PID process state ZOMBIE/DEAD
if $logging; then
last=$(stat $file | egrep "Modify\:" | cut -d" " -f3 | cut -d"." -f1)
lastUTC=$(date --date="$last" +"%s")
nowUTC=$(date +"%s")
if [[ $lastUTC -lt $((nowUTC-$tolerance)) ]] || (ps $kPID | cut -c 16-22 | egrep -v "STAT" | egrep -q "Z"); then
echo "## KERNEL DEAD FOR $tolerance SEC ##"
watching=false
fi
fi

# [1] Thread runtime (last 60 sec) AND [2] PID thread state BLOCKED
if [ $checkRate -eq $threadRate ]; then
let "duration=$sample*$threadRate"
temp='p2p-in sync-ib'
threads=($temp)
checkRate=0
for ((i=0; i<${#threads[@]}; ++i)); do
tTime=$(top -n1 -p $kPID -H | egrep -o "[0-9]{2}\.[0-9]{2} ${threads[i]}" | cut -d" " -f1)
tState=$(jstack -l $kPID | egrep -A1 "${threads[i]}" | egrep -o "State.*" | cut -d" " -f2)
if [[ $tTime == ${tPrev[i]} ]] && [[ $tState == "BLOCKED" ]]; then
echo "## ${threads[i]} THREAD DEAD ##"
(jstack -l $kPID | egrep -A1 "${threads[i]}") > threadDump_$countRebounce.txt
watching=false
fi
tPrev[i]=$tTime
done
else
((checkRate++))
fi

done

# Shutsdown Aion kernel
echo "## Killing Kernel ##"
kill $kPID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is it possible for this script to get stuck if the kernel does not respond to a kill (SIGINT?) request.

Is there anywhere that was escalate to a SIGKILL (for example if the kernel is still alive after 1 minute)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill add a shutdown timer to force the script to kill the process

temp=$(top -n1 -p $kPID | egrep -o "$kPID")
while [[ $temp -eq $kPID ]] ; do
sleep 2s
temp=$(top -n1 -p $kPID | egrep -o "$kPID")
done
running=false

((countRebounce++))
echo
echo "############################## REBOUNCE COUNT [$countRebounce] ##############################"

done
else

env EVMJIT="-cache=1" ./rt/bin/java -Xms4g \
-cp "./lib/*:./lib/libminiupnp/*:./mod/*" org.aion.Aion "$@"

fi
20 changes: 16 additions & 4 deletions modBoot/src/org/aion/Aion.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
*/
package org.aion;

import static java.lang.System.exit;
import static org.aion.crypto.ECKeyFac.ECKeyType.ED25519;
import static org.aion.crypto.HashUtil.H256Type.BLAKE2B_256;
import static org.aion.zero.impl.Version.KERNEL_VERSION;
Expand Down Expand Up @@ -60,7 +61,7 @@ public static void main(String args[]) {
CfgAion cfg = CfgAion.inst();
if (args != null && args.length > 0) {
int ret = new Cli().call(args, cfg);
System.exit(ret);
exit(ret);
}

/*
Expand All @@ -78,15 +79,26 @@ public static void main(String args[]) {
throw e;
}

/*
* Ensuring valid UUID in the config.xml
* Valid UUID: 32 Hex in 5 Groups [0-9A-F]
* 00000000-0000-0000-0000-000000000000
*/
String UUID = cfg.getId();
if (! UUID.matches("[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typo: space

System.out.println("Invalid UUID; please check <id> setting in config.xml");
exit(-1);
}

/* Outputs relevant logger configuration */
if (!cfg.getLog().getLogFile()) {
System.out
.println("Logger disabled; to enable please check log settings in config.xml\n");
.println("Logger disabled; to enable please check <log> settings in config.xml");
} else if (!cfg.getLog().isValidPath() && cfg.getLog().getLogFile()) {
System.out.println("File path is invalid; please check log setting in config.xml\n");
System.out.println("Invalid file path; please check <log> setting in config.xml");
return;
} else if (cfg.getLog().isValidPath() && cfg.getLog().getLogFile()) {
System.out.println("Logger file path: '" + cfg.getLog().getLogPath() + "'\n");
System.out.println("Logger file path: '" + cfg.getLog().getLogPath() + "'");
}

/*
Expand Down