Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to add disk during pull #2819

Closed
hickeng opened this issue Oct 20, 2016 · 30 comments
Closed

Failed to add disk during pull #2819

hickeng opened this issue Oct 20, 2016 · 30 comments
Assignees
Labels
kind/defect Behavior that is inconsistent with what's intended

Comments

@hickeng
Copy link
Member

hickeng commented Oct 20, 2016

Investigating https://ci.vmware.run/vmware/vic/6045 and as follow-on from #2817

There is a reconfigure (opID=e902fe86) to add a disk to one VCH that interleaves with vic-machine delete for another VCH (opID=e902feaa). I don't know if that interleaving is an issue, but as can be seen, we don't end up with an attached disk for the reconfigure.

The reconfigure is Test-Cases.Group6-VIC-Machine.6-4-Create-Basic-VCH-6045-5188

2016-10-19T18:07:20.107Z info hostd[626C2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e902fe86 user=root] Task Created : haTask-2280-vim.VirtualMachine.reconfigure-278062
2016-10-19T18:07:20.112Z info hostd[626C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6045-5188/VCH-6045-5188.vmx opID=e902fe86 user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
2016-10-19T18:07:20.124Z info hostd[60742B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6045-5188/VCH-6045-5188.vmx opID=e902fe86 user=root] Storage policy for disk '/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee.vmdk' not specified.
2016-10-19T18:07:20.246Z info hostd[61140B70] [Originator@6876 sub=Hostsvc.DatastoreSystem] RefreshVdiskDatastores: Done refreshing datastores.
2016-10-19T18:07:20.309Z info hostd[60640B70] [Originator@6876 sub=DatastoreBrowser opID=e902fe93 user=root] DatastoreBrowserImpl::SearchInt MoId:57ebe905-615d7a8a-e1c7-0050569c8cc7-datastorebrowser dsPath:[datastore1]  recursive:false
2016-10-19T18:07:20.371Z info hostd[61140B70] [Originator@6876 sub=DatastoreBrowser opID=e902fe9f user=root] DatastoreBrowserImpl::SearchInt MoId:57ebe905-615d7a8a-e1c7-0050569c8cc7-datastorebrowser dsPath:[datastore1] vic-machine-test-images recursive:false
2016-10-19T18:07:20.418Z info hostd[61C81B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e902feaa user=root] Task Created : haTask--vim.FileManager.deleteFile-278075
2016-10-19T18:07:20.492Z info hostd[60742B70] [Originator@6876 sub=Libs opID=e902fe86 user=root] OBJLIB-FILEBE : FileBEOpen: can't open '/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk' : Could not find the file (393218).
2016-10-19T18:07:20.492Z info hostd[60742B70] [Originator@6876 sub=DiskLib opID=e902fe86 user=root] DISKLIB-DSCPTR: DescriptorOpenInt: failed to open '/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk': Could not find the file (60002)
2016-10-19T18:07:20.492Z info hostd[60742B70] [Originator@6876 sub=DiskLib opID=e902fe86 user=root] DISKLIB-LINK  : "/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk" : failed to open (The system cannot find the file specified).
2016-10-19T18:07:20.492Z info hostd[60742B70] [Originator@6876 sub=DiskLib opID=e902fe86 user=root] DISKLIB-CHAIN 
@hickeng
Copy link
Member Author

hickeng commented Oct 20, 2016

@fdawg4l Assigned to you for initial investigation.

@mhagen-vmware
Copy link
Contributor

@cgtexmex tagging Clint as he has been working on the concurrency epic

@hickeng hickeng assigned fdawg4l and unassigned faijaz Oct 20, 2016
@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

time="2016-10-19T18:07:19Z" level=debug msg="op=232.10 (delta:5.748µs): [NewOperation] op=232.10 (delta:1.648µs) [github.com/vmware/vic/lib/apiservers/portlayer/restapi/handlers.(*StorageHandlersImpl).WriteImage:274]"·
time="2016-10-19T18:07:19Z" level=debug msg="op=232.10 (delta:230.908µs): Getting image a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2 from http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07"·
time="2016-10-19T18:07:19Z" level=debug msg="op=232.10 (delta:464.558µs): Getting image 9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee from http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07"·
time="2016-10-19T18:07:19Z" level=debug msg="ImageCache: Image 9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee not in cache, retreiving from datastore"·
time="2016-10-19T18:07:19Z" level=debug msg="[BEGIN] [github.com/vmware/vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:416] http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07"·
time="2016-10-19T18:07:19Z" level=debug msg="[ END ] [github.com/vmware/vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:416] [21.017029ms] http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07"·
time="2016-10-19T18:07:19Z" level=info msg="Saving parent map (564d7a55-02de-cbeb-cea5-b5a39146ba07/parentMap)"·
time="2016-10-19T18:07:19Z" level=info msg="Moving [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/parentMap.tmp to [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/parentMap"·
time="2016-10-19T18:07:19Z" level=info msg="Creating directory [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee"·
time="2016-10-19T18:07:19Z" level=info msg="Writing metadata 564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee/imageMetadata/metaData"·
time="2016-10-19T18:07:19Z" level=info msg="Creating image 9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee ([datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee.vmdk)"·
time="2016-10-19T18:07:19Z" level=debug msg="[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).CreateAndAttach:82] [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee.vmdk"·
time="2016-10-19T18:07:19Z" level=info msg="op=232.10 (delta:218.521363ms): Create/attach vmdk [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee.vmdk from parent [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2.vmdk"·
time="2016-10-19T18:07:20Z" level=error msg="task failed: Failed to add disk scsi0:0."·
time="2016-10-19T18:07:20Z" level=error msg="op=232.10 (delta:847.655628ms): vmdk storage driver failed to attach disk: Failed to add disk scsi0:0."·
time="2016-10-19T18:07:20Z" level=debug msg="[ END ] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).CreateAndAttach:82] [629.468416ms] [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee.vmdk"·
time="2016-10-19T18:07:20Z" level=error msg="Cleaning up failed WriteImage directory 564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee"·
time="2016-10-19T18:07:20Z" level=info msg="Removing 564d7a55-02de-cbeb-cea5-b5a39146ba07/images/9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee"·
time="2016-10-19T18:07:20Z" level=error msg="ImageCache error: WriteImage of 9967c5ad88de8c101809f7f22d4774b6791fe46ac3033d57abf7ebb1dd8e36ee failed with: Failed to add disk scsi0:0."·

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

The esx reconfigtask starts at 2016-10-19T18:07:19Z and 1s later returns an error. Vsphere returned the Failed to add disk scsi0:0 message.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

Looking at the messages @hickeng posted above, scratch.vmdk missing is the reason we can't get the disk. @mhagen-vmware is this a case of 2 CI runs running on the same machine with very aggressive cleanup code nuking our VCH and its imagestore underneath us?

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

$ grep scratch.vmdk hostd
2016-10-19T18:06:37.256Z info hostd[606C1B70] [Originator@6876 sub=Nfcsvc opID=e902f885 user=root] file delete force (/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6045-1851/VIC/564d8ee6-00f1-29a6-5879-84fcc0c5baa2/images/scratch/scratch.vmdk -> /vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6045-1851/VIC/564d8ee6-00f1-29a6-5879-84fcc0c5baa2/images/scratch/scratch.vmdk) by 'root'
2016-10-19T18:06:52.502Z info hostd[5E281B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6053-1693/VCH-6053-1693.vmx opID=e902fb59 user=root] Storage policy for disk '/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6053-1693/VIC/564d4853-961a-5bb6-59b0-d421e7f06f08/images/scratch/scratch.vmdk' not specified.
2016-10-19T18:07:10.404Z info hostd[5EE81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6045-5188/VCH-6045-5188.vmx opID=e902fc52 user=root] Storage policy for disk '/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk' not specified.
2016-10-19T18:07:13.718Z info hostd[61140B70] [Originator@6876 sub=Nfcsvc opID=e902fc98 user=root] file delete force (/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6053-1693/VIC/564d4853-961a-5bb6-59b0-d421e7f06f08/images/scratch/scratch.vmdk -> /vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/VCH-6053-1693/VIC/564d4853-961a-5bb6-59b0-d421e7f06f08/images/scratch/scratch.vmdk) by 'root'
2016-10-19T18:07:20.492Z info hostd[60742B70] [Originator@6876 sub=Libs opID=e902fe86 user=root] OBJLIB-FILEBE : FileBEOpen: can't open '/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk' : Could not find the file (393218).

Looks like scratch was deleted in the hostd.log a second before we needed it. So ESX nuked the scratch disk. I'm still digging as to why it was deleted and who deleted it.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

time="2016-10-19T18:07:09Z" level=info msg="Creating directory [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch"
time="2016-10-19T18:07:09Z" level=info msg="Creating directory [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/imageMetadata"
time="2016-10-19T18:07:09Z" level=info msg="Creating image scratch ([datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk)"
time="2016-10-19T18:07:09Z" level=debug msg="[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).CreateAndAttach:82] [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:09Z" level=info msg="op=232.4 (delta:183.87221ms): Create/attach vmdk [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk from parent "
time="2016-10-19T18:07:10Z" level=debug msg="op=232.4 (delta:539.667308ms): Mapping vmdk to pci device [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:10Z" level=debug msg="Looking for attached disk matching filename [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:10Z" level=debug msg="backing file name [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:10Z" level=debug msg="Found candidate disk for [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk at [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:10Z" level=debug msg="[ END ] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).CreateAndAttach:82] [363.742484ms] [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:15Z" level=debug msg="Looking for attached disk matching filename [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:15Z" level=debug msg="backing file name [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:15Z" level=debug msg="Found candidate disk for [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk at [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:16Z" level=debug msg="Index: inserting http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07/scratch (parent: http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07/scratch) in index"
time="2016-10-19T18:07:18Z" level=debug msg="op=232.9 (delta:36.979µs): Getting image scratch from http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07"
time="2016-10-19T18:07:18Z" level=info msg="op=232.9 (delta:228.263177ms): Create/attach vmdk [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2.vmdk from parent [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk"
time="2016-10-19T18:07:19Z" level=debug msg="Index: inserting http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2 (parent: http://VCH-6045-5188/storage/images/564d7a55-02de-cbeb-cea5-b5a39146ba07/scratch) in index"

So our scratch for this VCH is [datastore1] vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch.vmdk. However the scratch that was deleted was VCH-6045-1851/VIC/564d8ee6-00f1-29a6-5879-84fcc0c5baa2/images/scratch/scratch.vmdk. So the scratch that was deleted should have been a different file from the one we needed.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

I can't tell you how convenient the narrow column width GH chose for their issue tracker is. Despite your widescreen monitor made in the last 10 years, they insist you have a CRT from the 70s. Thanks GH!

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 20, 2016

$ grep 64d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch-flat.vmdk VCH-6045-5188/vmware.log

2016-10-19T18:07:10.594Z| vmx| I120: DISKLIB-VMFS  : "/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch-flat.vmdk" : open successful (10) size = 8192000000, hd = 42261477. Type 3
2016-10-19T18:07:16.166Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch-flat.vmdk" : closed.
2016-10-19T18:07:19.184Z| vmx| I120: DISKLIB-VMFS  : "/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch-flat.vmdk" : open successful (14) size = 8192000000, hd = 39336828. Type 3
2016-10-19T18:07:19.592Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/57ebe905-615d7a8a-e1c7-0050569c8cc7/vic-machine-test-images/VIC/564d7a55-02de-cbeb-cea5-b5a39146ba07/images/scratch/scratch-flat.vmdk" : closed.

The PL needed scratch at approx 2016-10-19T18:07:19Z. Is it possible the file was locked when we needed it? The first 2 lines have scratch "open" and "closed" in that window.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 25, 2016

Looks like we have a similar issue on build 6235.

Logs here.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 25, 2016

From 6235 in (Test-Cases.Group6-VIC-Machine.6-05-Create-Validation-VCH-6235-7958-container-logs.zip, I can see the following:

$  grep 239.8 journalctl 
Oct 25 20:16:55 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:55.646493081Z level=debug msg=op=239.8 (delta:5.455µs): [NewOperation] op=239.8 (delta:1.865µs) [github.com/vmware/vic/lib/apiservers/portlayer/restapi/handlers.(*StorageHandlersImpl).WriteImage:258]
Oct 25 20:16:55 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:55.646514053Z level=debug msg=op=239.8 (delta:28.369µs): Getting image scratch from http://VCH-6235-7958/storage/images/564de4f3-2c06-d8d3-0201-f4f5521868ef
Oct 25 20:16:55 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:55.646542293Z level=debug msg=op=239.8 (delta:56.681µs): Getting image a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2 from http://VCH-6235-7958/storage/images/564de4f3-2c06-d8d3-0201-f4f5521868ef
Oct 25 20:16:57 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:57.112587137Z level=info msg=op=239.8 (delta:1.466098112s): Create/attach vmdk [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2.vmdk from parent [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk
Oct 25 20:16:57 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:57.132524731Z level=error msg=op=239.8 (delta:1.486035172s): vmdk storage driver failed to attach disk: Invalid configuration for device '0'.

hostd.log

 916 2016-10-25T20:16:57.581Z info hostd[61681B70] [Originator@6876 sub=DatastoreBrowser opID=e51ff4e2 user=root] DatastoreBrowserImpl::SearchInt MoId:577fceda-9b1fac4c-110b-0050569ccda1-datastorebrowser dsPath:[datastore1]long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/imageMetadata recursive:false
 917 2016-10-25T20:16:57.585Z info hostd[61681B70] [Originator@6876 sub=Default opID=e51ff4e2 user=root] AdapterServer caught exception: vim.fault.FileNotFound
 918 2016-10-25T20:16:57.586Z info hostd[5E681B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4e9 user=root] Task Created : haTask--vim.FileManager.makeDirectory-531900
 919 2016-10-25T20:16:57.589Z info hostd[616C2B70] [Originator@6876 sub=Nfcsvc opID=e51ff4e9 user=root] Create requested for /vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/imageMetadata
 920 2016-10-25T20:16:57.848Z info hostd[616C2B70] [Originator@6876 sub=Nfcsvc opID=e51ff4e9 user=root] file mkdir -p (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/imageMetadata) by 'root'
 921 2016-10-25T20:16:57.848Z info hostd[616C2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4e9 user=root] Task Completed : haTask--vim.FileManager.makeDirectory-531900 Status success
 922 2016-10-25T20:16:57.972Z info hostd[60280B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4ea user=root] Task Created : haTask-2465-vim.VirtualMachine.reconfigure-531901
 923 2016-10-25T20:16:57.973Z info hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
 924 2016-10-25T20:16:57.979Z error hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] Parent file doesn't exist, or the user has no permission to access.
 925 2016-10-25T20:16:57.979Z info hostd[60280B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4ea user=root] Task Completed : haTask-2465-vim.VirtualMachine.reconfigure-531901 Status error
 926 2016-10-25T20:16:57.979Z warning hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] Reconfigure worker failed to validate device spec
 927 2016-10-25T20:16:57.983Z info hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)
 928 2016-10-25T20:16:57.983Z info hostd[602C1B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener
 929 2016-10-25T20:16:57.983Z info hostd[602C1B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded
 930 2016-10-25T20:16:57.983Z info hostd[602C1B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 2465 Old: 1 New: 1

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 25, 2016

@derekbeard If you have a moment, can you take a look at this? Hostd says the parent or file doesn't exist but we see it does directly above.

In port-layer.log, we see scratch.vmdk attached, then detached.

 85 time="2016-10-25T20:16:41Z" level=info msg="Creating directory [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images"·
 86 time="2016-10-25T20:16:41Z" level=info msg="Creating directory [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch"·
 87 time="2016-10-25T20:16:41Z" level=info msg="Creating directory [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/imageMetadata"·
 88 time="2016-10-25T20:16:41Z" level=info msg="Creating image scratch ([datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk)"·
 89 time=2016-10-25T20:16:41.223676523Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).CreateAndAttach:85] [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk·
 90 time=2016-10-25T20:16:41.223837019Z level=info msg=op=239.4 (delta:230.620089ms): Create/attach vmdk [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk from parent··
 91 time=2016-10-25T20:16:41.507088482Z level=debug msg=op=239.4 (delta:513.871886ms): Mapping vmdk to pci device [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk·
 92 time=2016-10-25T20:16:41.507497790Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.findDisk:154] VirtualMachine:2465·
 93 time="2016-10-25T20:16:41Z" level=debug msg="Looking for attached disk matching filename [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk"·
 94 time="2016-10-25T20:16:41Z" level=debug msg="backing file name [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk"·
 95 time="2016-10-25T20:16:41Z" level=debug msg="Found candidate disk for [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk at [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk"·
 96 time=2016-10-25T20:16:41.513765903Z level=debug msg=[ END ] [github.com/vmware/vic/pkg/vsphere/disk.findDisk:154] [6.26633ms] VirtualMachine:2465·
 97 time=2016-10-25T20:16:41.513923315Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.waitForPath:40] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
 98 time="2016-10-25T20:16:41Z" level=debug msg="Waiting for attached disk to appear in /dev/disk/by-path, or timeout"·
 99 time="2016-10-25T20:16:41Z" level=info msg="Attached disk present at /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"·
100 time=2016-10-25T20:16:41.514859706Z level=debug msg=[ END ] [github.com/vmware/vic/pkg/vsphere/disk.waitForPath:40] [935.304µs] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
101 time=2016-10-25T20:16:41.514866380Z level=debug msg=[ END ] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).CreateAndAttach:85] [291.192767ms] [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk·
102 time="2016-10-25T20:16:41Z" level=debug msg="Scratch disk created with size 8000000"·
103 time=2016-10-25T20:16:41.514882566Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/fs.(*Ext4).Mkfs:37] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
104 time="2016-10-25T20:16:41Z" level=info msg="Creating ext4 filesystem on device /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"·
105 time="2016-10-25T20:16:44Z" level=debug msg="Filesystem created on device /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"·
106 time=2016-10-25T20:16:44.884894315Z level=debug msg=[ END ] [github.com/vmware/vic/pkg/fs.(*Ext4).Mkfs:37] [3.370006273s] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
107 time=2016-10-25T20:16:44.884912624Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).Detach:256] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
108 time=2016-10-25T20:16:44.884919727Z level=info msg=op=239.4 (delta:3.891707151s): Detaching disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
109 time=2016-10-25T20:16:44.884931413Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/vsphere/disk.findDisk:154] VirtualMachine:2465·
110 time="2016-10-25T20:16:44Z" level=debug msg="Looking for attached disk matching filename [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk"·
111 time="2016-10-25T20:16:44Z" level=debug msg="backing file name [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk"·
112 time="2016-10-25T20:16:44Z" level=debug msg="Found candidate disk for [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk at [datastore1] long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk"·
113 time=2016-10-25T20:16:44.893324824Z level=debug msg=[ END ] [github.com/vmware/vic/pkg/vsphere/disk.findDisk:154] [8.389446ms] VirtualMachine:2465·
114 time=2016-10-25T20:16:45.200903059Z level=debug msg=[ END ] [github.com/vmware/vic/pkg/vsphere/disk.(*Manager).Detach:256] [315.98616ms] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0·
115 time="2016-10-25T20:16:45Z" level=debug msg="Index: inserting http://VCH-6235-7958/storage/images/564de4f3-2c06-d8d3-0201-f4f5521868ef/scratch (parent: http://VCH-6235-7958/storage/images/564de4f3-2c06-d8d3-0201-f4f5521868ef/scratch) in index"·

And this is corroborated by hostd.log.

// Attach
 2016-10-25T20:16:42.031Z info hostd[61806B70] [Originator@6876 sub=Nfcsvc opID=e51ff24c user=root] file mkdir  (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch) by 'root'
276 2016-10-25T20:16:42.032Z info hostd[61806B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff24c user=root] Task Completed : haTask--vim.FileManager.makeDirectory-531498 Status success
277 2016-10-25T20:16:42.034Z info hostd[61681B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff24d user=root] Task Created : haTask--vim.FileManager.makeDirectory-531499
278 2016-10-25T20:16:42.036Z info hostd[5DC81B70] [Originator@6876 sub=Nfcsvc opID=e51ff24d user=root] Create requested for /vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/imageMetadata
279 2016-10-25T20:16:42.073Z info hostd[5DC81B70] [Originator@6876 sub=Nfcsvc opID=e51ff24d user=root] file mkdir  (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/imageMetadata) by 'root'
280 2016-10-25T20:16:42.073Z info hostd[5DC81B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff24d user=root] Task Completed : haTask--vim.FileManager.makeDirectory-531499 Status success
281 2016-10-25T20:16:42.077Z info hostd[61806B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff24e user=root] Task Created : haTask-2465-vim.VirtualMachine.reconfigure-531500
282 2016-10-25T20:16:42.079Z info hostd[5DC81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
283 2016-10-25T20:16:42.093Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] Storage policy for disk '/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk' not specified.
284 2016-10-25T20:16:42.093Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] SetHostDiskProperties: instUUID  gos
285 2016-10-25T20:16:42.295Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)

<snip>

// Detach
304 2016-10-25T20:16:45.748Z info hostd[FFE7AAE0] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
305 2016-10-25T20:16:45.869Z info hostd[61681B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff27a user=root] Task Created : haTask--vim.SessionManager.acquireGenericServiceTicket-531533
306 2016-10-25T20:16:45.870Z info hostd[61681B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff27a user=root] Task Completed : haTask--vim.SessionManager.acquireGenericServiceTicket-531533 Status success
307 2016-10-25T20:16:45.871Z info hostd[5E6C2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff27b user=root] Task Created : haTask--vim.SearchIndex.findByInventoryPath-531534
308 2016-10-25T20:16:45.872Z info hostd[5DC81B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff27b user=root] Task Completed : haTask--vim.SearchIndex.findByInventoryPath-531534 Status success
309 2016-10-25T20:16:45.872Z info hostd[5DC81B70] [Originator@6876 sub=DatastoreBrowser opID=e51ff27d user=root] DatastoreBrowserImpl::SearchInt MoId:577fceda-9b1fac4c-110b-0050569ccda1-datastorebrowser dsPath:[datastore1]drunk_spence-0bf8d35c500a5b1a2283d8c487778f315d32b46ee4abff2aeb885a60f3eb8576 recursive:false
310 2016-10-25T20:16:45.889Z info hostd[5DC81B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=e51ff284 user=root] Event 29954 : User [email protected] logged out (login time: Tuesday, 25 October, 2016 20:16:36, number of API invocations: 0, user agent: Go-http-client/1.1)
311 2016-10-25T20:16:45.895Z info hostd[FFE7AAE0] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] Try removing Hbr entries for removed disks scsi0:0.
312 2016-10-25T20:16:45.896Z info hostd[FFE7AAE0] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] Done removing Hbr entries for removed disks scsi0:0.
313 2016-10-25T20:16:45.906Z info hostd[FFE7AAE0] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)

@hickeng
Copy link
Member Author

hickeng commented Oct 25, 2016

@mhagen-vmware Do we use ssh to do cleanup of the datastores between/after tests/suites/ci runs?
Trying to establish which operations we perform via any means that could be removing scratch.

@mhagen-vmware
Copy link
Contributor

we don't use ssh at all, do you mean govc? We use govc before install of a VCH to cleanup dangling networks/vms/datastore files.

@derekbeard
Copy link

derekbeard commented Oct 27, 2016

@ fdawg4l

If I read the logs (Test-Cases.Group6-VIC-Machine.6-05-Create-Validation-VCH-6235-7958-container-logs/journalctl) and image.go:scratch() correctly, then the detach that you pointed out above is expected (at least from my read of the code).

Oct 25 20:16:41 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:41Z" level=debug msg="Scratch disk created with size 8000000"

corresponds to the log message just before the mkfs.

Oct 25 20:16:41 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:41.514882566Z level=debug msg=[BEGIN] [github.com/vmware/vic/pkg/fs.(*Ext4).Mkfs:37] /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0
Oct 25 20:16:41 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:41Z" level=info msg="Creating ext4 filesystem on device /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
Oct 25 20:16:43 VCH-6235-7958 kernel: random: nonblocking pool is initialized
Oct 25 20:16:44 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:44Z" level=debug msg="Filesystem created on device /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"

look like the mkfs to me.

And the next step in that the scratch() function is the detach, likely corresponding to

Oct 25 20:16:44 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:44.884919727Z level=info msg=op=239.4 (delta:3.891707151s): Detaching disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0

Is this expected behavior, and if so, should I be focusing on some other symptoms or log messages?

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 27, 2016

Yup! That's right @derekbeard.

We tried to create a chained disk from scratch.vmdk and filed there with the error

Oct 25 20:16:57 VCH-6235-7958 vic-init[141]: time=2016-10-25T20:16:57.132524731Z level=error msg=op=239.8 (delta:1.486035172s): vmdk storage driver failed to attach disk: Invalid configuration for device '0'.

We created scratch earlier (i.e. in the log snippet you pasted) so it should be there. The open question is why couldn't we create the child disk from scratch. Was scratch no longer there, or some other issue?

Note, the "Invalid configuration..." message is directly from vsphere.

@derekbeard
Copy link

@fdawg4l

Looks like these are the corresponding failure messages from the Hostd log:

2016-10-25T20:16:57.972Z info hostd[60280B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4ea user=root] Task Created : haTask-2465-vim.VirtualMachine.reconfigure-531901
2016-10-25T20:16:57.973Z info hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
2016-10-25T20:16:57.979Z error hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] Parent file doesn't exist, or the user has no permission to access.
2016-10-25T20:16:57.979Z info hostd[60280B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4ea user=root] Task Completed : haTask-2465-vim.VirtualMachine.reconfigure-531901 Status error
2016-10-25T20:16:57.979Z warning hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] Reconfigure worker failed to validate device spec
2016-10-25T20:16:57.983Z info hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)

Before this, there are a slew of FileManager.deleteFile() calls, including one to delete scratch.vmdk:

2016-10-25T20:16:47.481Z info hostd[FFE7AAE0] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff33d user=root] Task Created : haTask--vim.FileManager.deleteFile-531650
2016-10-25T20:16:47.489Z info hostd[5FE80B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff33d user=root] Task Completed : haTask--vim.FileManager.deleteFile-531650 Status success
2016-10-25T20:16:47.489Z info hostd[5FE80B70] [Originator@6876 sub=Nfcsvc opID=e51ff33d user=root] file delete force (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6243-7002/VIC/564dbc75-68ff-e088-58ea-6012408c693c/images/scratch/scratch.vmdk -> /vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6243-7002/VIC/564dbc75-68ff-e088-58ea-6012408c693c/images/scratch/scratch.vmdk) by 'root'

Right about this time I see the following in the port-layer.log:

Oct 25 20:16:45 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:45Z" level=debug msg="Found 1 subscribers to vsphere.VMEvent: drunk_spence-0bf8d35c500a5b1a2283d8c487778f315d32b46ee4abff2aeb885a60f3eb8576 on  vic-esx-05.ci.drone.local in ha-datacenter is powered off"
Oct 25 20:16:45 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:45Z" level=debug msg="Event manager calling back to exec"
Oct 25 20:16:46 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:46Z" level=debug msg="Found 1 subscribers to vsphere.VMEvent: Removed drunk_spence-0bf8d35c500a5b1a2283d8c487778f315d32b46ee4abff2aeb885a60f3eb8576 on vic-esx-05.ci.drone.local from ha-datacenter"
Oct 25 20:16:46 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:46Z" level=debug msg="Event manager calling back to exec"
Oct 25 20:16:46 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:46Z" level=debug msg="Found 1 subscribers to vsphere.VMEvent: Removed drunk_spence-0bf8d35c500a5b1a2283d8c487778f315d32b46ee4abff2aeb885a60f3eb8576 on vic-esx-05.ci.drone.local from ha-datacenter"
Oct 25 20:16:46 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:46Z" level=debug msg="Event manager calling back to exec"
Oct 25 20:16:47 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:47Z" level=debug msg="Found 1 subscribers to vsphere.VMEvent: VCH-6243-7002 on  vic-esx-05.ci.drone.local in ha-datacenter is powered off"
Oct 25 20:16:47 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:47Z" level=debug msg="Event manager calling back to exec"
Oct 25 20:16:48 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:48Z" level=debug msg="Found 1 subscribers to vsphere.VMEvent: Removed VCH-6243-7002 on vic-esx-05.ci.drone.local from ha-datacenter"
Oct 25 20:16:48 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:48Z" level=debug msg="Event manager calling back to exec"
Oct 25 20:16:48 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:48Z" level=debug msg="Found 1 subscribers to vsphere.VMEvent: Removed VCH-6243-7002 on vic-esx-05.ci.drone.local from ha-datacenter"
Oct 25 20:16:48 VCH-6235-7958 vic-init[141]: time="2016-10-25T20:16:48Z" level=debug msg="Event manager calling back to exec"

Perhaps related?

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 31, 2016

@derekbeard

Before this, there are a slew of FileManager.deleteFile() calls, including one to delete scratch.vmdk:
/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6243-7002/VIC/564dbc75-68ff-e088-58ea-6012408c693c/images/scratch/scratch.vmdk

The path is different. The path we care about in this case is
long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk (relative to [datastore 1] or /vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1).

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 31, 2016

Looks like build 6243 was running at the same time on this ESX host. I'm investigating whether we're racing on the datastore, in which case this is a test issue. Still digging.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 31, 2016

Yeah, 6243 nuked our image store from underneath this test.

// Create scratch via 6235
2016-10-25T20:16:42.031Z info hostd[61806B70] [Originator@6876 sub=Nfcsvc opID=e51ff24c user=root] file mkdir  (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch) by 'root'

// Land scratch (6235)
2016-10-25T20:16:42.093Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] Storage policy for disk '/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/scratch/scratch.vmdk' not specified.
2016-10-25T20:16:42.093Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] SetHostDiskProperties: instUUID  gos
2016-10-25T20:16:42.295Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)
2016-10-25T20:16:42.297Z info hostd[61681B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=e51ff24e user=root] Event 29953 : Reconfigured VCH-6235-7958 on vic-esx-05.ci.drone.local in ha-datacenter
2016-10-25T20:16:42.297Z info hostd[61681B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff24e user=root] Send config update invoked
2016-10-25T20:16:45.748Z info hostd[FFE7AAE0] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
2016-10-25T20:16:45.895Z info hostd[FFE7AAE0] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] Try removing Hbr entries for removed disks scsi0:0.
2016-10-25T20:16:45.896Z info hostd[FFE7AAE0] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] Done removing Hbr entries for removed disks scsi0:0.
2016-10-25T20:16:45.906Z info hostd[FFE7AAE0] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff273 user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)

// Detatch scratch from our vch (6235)
2016-10-25T20:16:45.909Z info hostd[FFE7AAE0] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=e51ff273 user=root] Event 29955 : Reconfigured VCH-6235-7958 on vic-esx-05.ci.drone.local in ha-datacenter

// JOB 6243 DELETES OUR IMAGE STORE!
2016-10-25T20:16:50.476Z info hostd[60C81B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4a5 user=root] Task Created : haTask--vim.FileManager.deleteFile-531863
2016-10-25T20:16:50.663Z info hostd[5E784B70] [Originator@6876 sub=Vimsvc.TaskManager opID=e51ff4a5 user=root] Task Completed : haTask--vim.FileManager.deleteFile-531863 Status success
2016-10-25T20:16:50.663Z info hostd[5E784B70] [Originator@6876 sub=Nfcsvc opID=e51ff4a5 user=root] file delete force (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long -> /vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long) by 'root'

// crap ourselves (6235)
2016-10-25T20:16:57.589Z info hostd[616C2B70] [Originator@6876 sub=Nfcsvc opID=e51ff4e9 user=root] Create requested for /vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/imageMetadata
2016-10-25T20:16:57.848Z info hostd[616C2B70] [Originator@6876 sub=Nfcsvc opID=e51ff4e9 user=root] file mkdir -p (/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/long/weird/path/VIC/564de4f3-2c06-d8d3-0201-f4f5521868ef/images/a61cd723bcf2b0ccaaa3b8f779dfca17040bd459d9e615b82a7ea17993ec59f2/imageMetadata) by 'root'
2016-10-25T20:16:57.973Z info hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] State Transition (VM_STATE_ON -> VM_STATE_RECONFIGURING)
2016-10-25T20:16:57.979Z error hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] Parent file doesn't exist, or the user has no permission to access.
2016-10-25T20:16:57.979Z warning hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] Reconfigure worker failed to validate device spec
2016-10-25T20:16:57.983Z info hostd[60280B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/577fceda-9b1fac4c-110b-0050569ccda1/VCH-6235-7958/VCH-6235-7958.vmx opID=e51ff4ea user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)

Thank you for taking the time to take a look at this @derekbeard, but we have some test framework inconsistencies which are causing this.

This is not a storage issue. Reassigning.

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 31, 2016

@mhagen-vmware / @cgtexmex ,

Looks like we have some racy-ness due to concurrent CI builds running [full ci] on the same esx host. Assigning to both of you to do the needful. Thanks!

@mhagen-vmware
Copy link
Contributor

@fdawg4l you think it should be fixed with?
#2944

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 31, 2016

Oh, interesting! Uh, give me a few minutes to figure out that robot syntax :-).

@fdawg4l
Copy link
Contributor

fdawg4l commented Oct 31, 2016

Cleanup Datastore On Test Server
    ${out}=  Run  govc datastore.ls
    ${items}=  Split To Lines  ${out}
    :FOR  ${item}  IN  @{items}
    \   ${build}=  Split String  ${item}  -
    \   # Skip any item that is not associated with integration tests
    \   Continue For Loop If  '@{build}[0]' != 'VCH'
    \   # Skip any item that is still running
    \   ${state}=  Get State Of Drone Build  @{build}[1]
    \   Continue For Loop If  '${state}' == 'running'
    \   Log To Console  Removing the following item from datastore: ${item}
    \   ${out}=  Run  govc datastore.rm ${item}

Reading that snippet, It looks like build is parsed from item. In the case above, the image store uses a custom path which doesn't itself map to item. It is [datastore] long/. Is my reading of that right? Will the parsing fail on this custom path?

The test has other issues too with respect to concurrent builds. The custom path is hardcoded and will collide if more than one build chooses the same esx instance AND one attempts to clean up before the other completes. We namespace the image store (with a uuid) which should be unique to the VCH, but the parent of that path being hardcoded can lead to issues.

@mdubya66 mdubya66 added the kind/defect Behavior that is inconsistent with what's intended label Nov 1, 2016
@cgtexmex
Copy link
Contributor

cgtexmex commented Nov 4, 2016

related to #2043

@fdawg4l
Copy link
Contributor

fdawg4l commented Nov 8, 2016

@mhagen-vmware can you look at my comment above and advise? We either need to identify the tests that are using hardcoded non-sharable paths which we are likely racing on, reduce the number of works to the number of esx hosts and stop sharing all together, or your change above has put us in a good place and we can close this issue.

@mhagen-vmware
Copy link
Contributor

Well, at this point my PR is in to prevent concurrency from happening again, so if we never see this again then I guess we can chalk it up to the aggressive cleanup from another job. But this line here:
\ Continue For Loop If '${state}' == 'running'
Should have prevented any removal for any other jobs that might have been running at the same time.

@fdawg4l
Copy link
Contributor

fdawg4l commented Nov 9, 2016

@mhagen-vmware

\ Continue For Loop If '${state}' == 'running'

But ${state} is derived from ${build}, and ${build} is derived from ${item}. In my example, ${item} is [datastore] long/. This path doesn't include a VCH string so the ${build} parsing will fail, right? And if that's true, then ${state} is bogus, no?

@mhagen-vmware
Copy link
Contributor

Right so it would ignore that folder as well due to the previous line:
\ Continue For Loop If '@{build}[0]' != 'VCH'

@fdawg4l
Copy link
Contributor

fdawg4l commented Nov 9, 2016

OOOHH!! OK, cool. Thanks. In that case, let's close this bug and if we see weird datastore issues again, I'll investigate as they're likely real or reopen this and look for how to address any possible collisions.

We still have the issue of 2 VCHs using this hardcoded path and racing eachother. This doesn't involve cleanup but just 2 VCHs conflicting with eachother if we ever enable concurrency again. I'll create an issue to track that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/defect Behavior that is inconsistent with what's intended
Projects
None yet
Development

No branches or pull requests

7 participants