Skip to content

Troubleshooting

Joseph Huckaby edited this page Jul 28, 2023 · 3 revisions

Node.js Version

Make sure you have the latest stable Node.js for Cronicle to run smoothly. As of 2023 the minimum supported version is v16. Here are instructions for upgrading your Node.js via package manager:

https://nodejs.org/en/download/package-manager

Or grab a precompiled binary from here:

https://nodejs.org/en/download/releases

Cronicle expects the node binary to be in the root user's PATH.

Cronicle Version

Please make sure you are running the latest version of Cronicle. Instructions are here:

https://github.com/jhuckaby/Cronicle/blob/master/docs/CommandLine.md#upgrading-cronicle

Storage Transactions

It is highly recommended that you enable storage transactions. This protects against database corruption for events like backend storage errors, crashes and sudden power loss. Cronicle ships with transactions enabled in the default config, but if you have an older version you may need to manually enable them. Add these two properties into the Storage object in your /opt/cronicle/conf/config.json file:

{
	"Storage": {
		"transactions": true,
		"trans_auto_recover": true,
		
		...
	}
}

Make sure Cronicle is fully stopped on all master servers before making this change.

Storage Repair

If you experience any strange behavior in Cronicle, such as job history not populating, you may have some database corruption. In this case, a repair script is provided, which will scan the entire database and make any necessary repairs.

To run the repair script, first stop Cronicle on all your master servers:

sudo /opt/cronicle/bin/control.sh stop

Then, run the repair script in "dry run" mode. This will detect corruption but not make any changes:

sudo /opt/cronicle/bin/storage-repair.js --dryrun

If the script detects any issues in your storage database, the output will include something like:

Dry-run complete!
4 repairs are needed.

Run the script without the --dryrun argument to actually make the repairs:

sudo /opt/cronicle/bin/storage-repair.js

Note that some data loss may occur when repairing storage records. It is recommended that you always keep backups of your data.

Once everything is repaired, you can restart Cronicle on all your master servers.

Server Hostnames

Cronicle is very sensitive about server hostnames, and uses them to determine who is a member of the cluster, and who can and should become a master server. If server's hostname has changed, then Cronicle can no longer find it in one of its internal lists.

Here is how to fix this. First, stop Cronicle on all your servers, SSH to your master server, become root, and run this command:

/opt/cronicle/bin/storage-cli.js get global/servers/0

This will dump out a JSON document listing all the servers Cronicle knows about. For example, here is what it looks like on my test cluster:

{
	"type": "list_page",
	"items": [
		{
			"hostname": "dev01.dev.ca.test.net",
			"ip": "192.168.66.167"
		},
		{
			"hostname": "dev02.dev.ca.test.net",
			"ip": "192.168.66.199"
		}
	]
}

What you need to do is figure out exactly what Node.js thinks your current server hostname is, find the old server in that JSON document, and correct the hostname (and possibly the IP address too).

To determine what Node.js thinks your current server hostname is, type this command:

node -e 'console.log(require("os").hostname().toLowerCase());'

Do you see that hostname in the JSON document above? If so, then skip down to the next section. If not, then read on.

To correct the server hostname, the Cronicle storage-cli.js script can spawn your local text editor (usually vi, whatever is in your EDITOR environment variable) and allow you to edit the storage record right in your Terminal. Please be very careful doing this, as you can permanently damage Cronicle. Here's the command:

/opt/cronicle/bin/storage-cli.js edit global/servers/0

Once you have corrected the hostname, save and exit your editor, and the JSON document will be rewritten back to storage (either local disk, AWS S3 or Couchbase, whatever you have configured).

Please Note: It is very important that you only correct the hostname (and maybe the IP) of the server in the list. Do not remove or add any servers, as the number of items in the array is stored in a separate document. Make sure the hostname is lower-case!

Now try starting cronicle, waiting 60 seconds, and seeing if the Web UI is happy again:

/opt/cronicle/bin/control.sh start

Please note that it takes a full minute (60 seconds) for Cronicle to decide to become master, so please be patient here.

Did that do the trick? If not, keep reading...

Master Server Groups

So, your server hostname is now correct and in the JSON document, but the Web UI is still stuck on "Waiting for master server"? The reason is probably that your server's new hostname is no longer part of a master server group.

Cronicle has a server group system, and only certain groups are eligible for becoming master. It categorizes servers into groups by matching their hostname against a regular expression pattern. It is possible that when your server hostname changed, it no longer matches the correct master group pattern.

To dump out the current server group definitions, stop Cronicle on all your servers, SSH to your master server, become root, and run this command:

/opt/cronicle/bin/storage-cli.js get global/server_groups/0

This should output a JSON document that looks somewhat like this:

{
	"type": "list_page",
	"items": [
		{
			"id": "allgrp",
			"title": "All Servers",
			"regexp": ".+",
			"master": 0
		},
		{
			"id": "mastergrp",
			"title": "Master Group",
			"regexp": "(dev01|dev02|devrelease|mtx\\d+)\\.",
			"master": 1
		}
	]
}

As you can see, each group has a regexp which is a regular expression match on the hostname, and a master attribute that determines if the servers in the group can become master.

You may have to adjust the regexp string in your master group, so that your server's new hostname matches the pattern.

To correct the server groups, the Cronicle storage-cli.js script can spawn your local text editor (usually vi, whatever is in your EDITOR environment variable) and allow you to edit the storage record right in your Terminal. Please be very careful doing this, as you can permanently damage Cronicle. Here's the command:

/opt/cronicle/bin/storage-cli.js edit global/server_groups/0

Once you have corrected the regexp patterns, save and exit your editor, and the JSON document will be rewritten back to storage (either local disk, AWS S3 or Couchbase, whatever you have configured).

Please Note: It is very important that you only correct the regexp of the server group in the list. Do not remove or add any groups, as the number of items in the array is stored in a separate document.

Now try starting cronicle, waiting 60 seconds, and seeing if the Web UI is happy again:

/opt/cronicle/bin/control.sh start

Please note that it takes a full minute (60 seconds) for Cronicle to decide to become master, so please be patient here.