critical target error message on VCH console #8112

RebeccaYo · 2018-06-29T21:20:08Z

VIC version:

1.4.0

Deployment details:

./vic-machine-linux create --name WV-VCH --public-network-ip 192.168.10.100/24 --management-network PublicNetwork --insecure-registry 10.115.68.63:443 --compute-resource ordos12.eng.vmware.com --no-tlsverify --no-tls --thumbprint ... --target 10.115.68.198 --user [email protected] --image-store Tegile-Lun3a --volume-store Tegile-Lun3a:default --volume-store Tegile-Lun1a:Tegile-Lun1a --volume-store Tegile-Lun1b:Tegile-Lun1b --bridge-network VCH1BridgeNetwork --client-network Testbed --public-network Testbed --container-network Testbed:cn1 --dns-server 192.168.10.101 --public-network-gateway 192.168.10.101 --endpoint-cpu 1 --endpoint-memory 2048

Steps to reproduce:

After deploying the VCH, the error appears within a number of hours to days.
This same error has appeared on many different deployments of VCH, which were located on different datastores (I was making sure this wasn't because of a bad disk). Every time, the error is the same (including sector number) except for the number at the beginning of the error.

Actual behavior:

There is an error message on the console of the VCH.
[ 10.487831] blk_update_request: critical target error, dev sda, sector 15958016. See attachment.
Also, I cannot connect to the Docker daemon of this VCH at tcp://192.168.10.100:2375.

Expected behavior:

After deploying the VCH, there was no error on the console and I did have connectivity to the Docker daemon.

Logs:

VCH Admin portal is inaccessible.
When I tried to enable ssh on the VCH, I received the message

INFO[0005] ### Configuring VCH for debug ####
INFO[0005] Validating target
INFO[0005]
INFO[0005] VCH ID: VirtualMachine:vm-2075
INFO[0005] Creating directory [Tegile-Lun3a] WV-VCH
INFO[0005] Datastore path is [Tegile-Lun3a] WV-VCH
INFO[0006]
INFO[0006] Installer version: v1.4.0-18893-6c385b0
INFO[0006] VCH version: v1.4.0-18893-6c385b0
ERRO[0006] Tools is not running in the appliance, unable to continue
ERRO[0006] Unable to enable ssh on the VCH appliance VM: Tools is not running in the appliance, unable to continue
INFO[0006] Collecting 67e398ac-44ad-4777-b58a-848f8b56df0f vpxd.log
ERRO[0006] Tools is not running in the appliance, unable to continue
ERRO[0006] --------------------
ERRO[0006] vic-machine-linux debug failed: debug failed

Additional details as necessary:

RebeccaYo · 2018-06-29T21:22:13Z

hickeng · 2018-07-02T17:36:28Z

@RebeccaYo

If you still have that VCH around, please could you supply the tether.debug and vmware.log files from the endpointVM datastore directory?

Additionally, given this recreates for you, please could you:

deploy a VCH with --debug=2 during the initial step and
configure the VCH for console access - this is done using vic-machine debug as I assume you tried to run from the original issue output. So long as you set the password you'll be able to log into the VM console (even if you don't enable SSH) until the password expires at midnight.

This gives us a method by which we can gather additional logging if/when the problem recreates. Given the message about tools not running I am wondering if this is a possible recreate of #7680 and would love to get actionable data on that.

Regarding the message on the console. I don't know why you're seeing this but...

the number at the beginning is the time since system boot - you'll see this message in dmesg output
at that time this is almost certain the device in question is the scratch.vmdk base disk, and likely just after either the hot-add prior to creating the filesystem on it, or the hot-remove once filesystem creation is done.
I have seen read errors before when reading from thin vmdks when inflation has to occur, but only recall it happening in a nested environment. I'm assuming this is not nested?

RebeccaYo · 2018-07-02T23:55:23Z

vmware.log
tether.debug.log
Hopefully these will be helpful.
I'll redeploy with debug=2 and enable shell access.
This environment is not nested.
Thanks again for your help.

RebeccaYo · 2018-07-05T23:56:43Z

Hi @hickeng, I've reproduced this on a VCH with debug=2. The VCH admin portal is unavailable, but I've attached the logs from /var/log/vic on the VCH.

vchlogs_8112_yo.tar.gz

RebeccaYo · 2018-07-19T17:52:58Z

@hickeng Any chance you could look at this? I'm seeing panic: runtime error: integer divide by zero appearing less than 24 hours after I create a VCH, necessitating a redeploy. This is currently a showstopper for my VIC testing as it interrupts the test run.

hickeng · 2018-07-24T22:44:24Z

@RebeccaYo Apologies for the delay. I've taken a look at the logs you attached. It's possible that this is a variant of #7680. One very effective way of determining if this is the same issue is deploying with a static IP on the management network and seeing if the problem persists. If it does not then you can try with DHCP again and specifying --asymmetric-routes.

If this is confirmed to be the same issue I'm very interested in knowing whether the panic is also present in the tether.debug when deployed with debug=2. We've struggled to get any traction on #7680 and any insight would be invaluable.

RebeccaYo · 2018-07-24T23:01:28Z

Hi George, thank you, I'll try that. In the meantime, here's the tether.debug for the VCH deployed with debug=2. tether.debug.txt

RebeccaYo closed this as completed Jun 29, 2018

RebeccaYo reopened this Jun 29, 2018

hickeng added kind/defect Behavior that is inconsistent with what's intended area/vsphere Intergration and interoperation with vSphere area/appliance status/need-info Additional information is needed to make progress labels Jul 2, 2018

hickeng mentioned this issue Jul 25, 2018

Avoid integer overflow in dhcp renewal #8154

Merged

hickeng closed this as completed in #8154 Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

critical target error message on VCH console #8112

critical target error message on VCH console #8112

RebeccaYo commented Jun 29, 2018

RebeccaYo commented Jun 29, 2018

hickeng commented Jul 2, 2018 •

edited

Loading

RebeccaYo commented Jul 2, 2018

RebeccaYo commented Jul 5, 2018

RebeccaYo commented Jul 19, 2018

hickeng commented Jul 24, 2018 •

edited

Loading

RebeccaYo commented Jul 24, 2018

critical target error message on VCH console #8112

critical target error message on VCH console #8112

Comments

RebeccaYo commented Jun 29, 2018

RebeccaYo commented Jun 29, 2018

hickeng commented Jul 2, 2018 • edited Loading

RebeccaYo commented Jul 2, 2018

RebeccaYo commented Jul 5, 2018

RebeccaYo commented Jul 19, 2018

hickeng commented Jul 24, 2018 • edited Loading

RebeccaYo commented Jul 24, 2018

hickeng commented Jul 2, 2018 •

edited

Loading

hickeng commented Jul 24, 2018 •

edited

Loading