The VM failed to resume on the destination during early power on

During a vMotion one of my VMs refused to move and vCenter was giving a following error:

The VM failed to resume on the destination during early power on.
Module DiskEarly power on failed.
Cannot open the disk ‘/vmfs/volumes/…./…./???????.vmdk’ or one of the snapshot disks it depends on.
Could not open/create change tracking file

I turned the server off and tried to delete the change tracking file. Got an error:

Cannot delete file [<Datastore>] VMFolder/VM-ctk.-vmdk

Migrated the VM to another host and tried power it on. Got an error:

An error was received from the ESXi host while powering on VM VMName.
Failed to start the virtual machine.
Cannot open the disk ‘/vmfs/volumes/5783364b-08fc763e-1389-00215a9b0098/kalle-test1/kalle-test1.vmdk’ or one of the snapshot disks it depends on.
Could not open/create change tracking file

Next I rebooted the ESXi host on which the problematic VM was initially and after that I was able to delete the *-ctk.vmdk file and power on the VM. It seems that for some reason there was a file locks on the change tracking files and it prevented operations on the VM.

Welcome to VirtuallyThatGuy – this is a quick blog about CBT and snapshot errors when vmotion from one host to the other with this error “vmotion fails the vm failed to resume on the destination during early power on”.

## Connect to vCenters​​ 

Connect-VIServer​​ uk3p-vc01.lab.local,​​ ntcp-vc01.lab.local​​ -User​​ ""​​ -Password​​ "StandUpIfYouHateTottenham"

$vm​​ =​​ get-vm​​ -name​​ "uk3p-DC01",​​ "uk3p-DC02",​​ "uk3p-DC03"

$vmview​​ =​​ $vm​​ |​​ get-view

$vmConfigSpec​​ =​​ New-Object​​ VMware.Vim.VirtualMachineConfigSpec

$vmConfigSpec.changeTrackingEnabled​​ =​​ $false

$vmview.reconfigVM($vmConfigSpec)

$snap=$vm​​ |​​ New-Snapshot​​ -Name​​ 'Disable CBT'

$snap​​ |​​ Remove-Snapshot​​ -confirm:$false

​​ 

Resolution : This issue is only possible to fix via login to the specific host where the vm is residing which you can find out on the summary page of the VM. Run these commands via SSH to restart the mgmt agent

  • /etc/init.d/hostd restart
  • /etc/init.d/vpxa restart

After restarting the services, the host temporarily loses connection to vCenter and reconnects again. On reconnecting of the host to vCenter, the VM will still be fully operational.

Impact :There is no impact on residing VM when you restart the management agent on the host. Only current ongoing task like vMotion stops working and timeout error will popup.

## Connect to vCenters Connect-VIServer uk3p-vc01.lab.local, ntcp-vc01.lab.local -User "" -Password "StandUpIfYouHateTottenham" $vm = get-vm -name "uk3p-DC01", "uk3p-DC02", "uk3p-DC03" $vmview = $vm | get-view $vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec $vmConfigSpec.changeTrackingEnabled = $false $vmview.reconfigVM($vmConfigSpec) $snap=$vm | New-Snapshot -Name 'Disable CBT' $snap | Remove-Snapshot -confirm:$false

Virtual Machine Migration (DRS or user initiated) fails with the following error message:

The VM failed to resume on the destination during early power on

A genereal system error occurred: The source detected that the destination failed to resume

Error Stack:
The VM failed to resume on the destination during early power on.
Module DiskEarly power on failed.
Cannot open the disk '/vmfs/volumes/<...>.vmdk' or one of the snapshot disks it depends on.
Could not open/create change tracking file

  1. Check that Virtual Disks are not mounted to the backup host (VDP or other products). Sometimes when the backup fails, the virtual disks are not correctly removed from the backup host.
  2. Consolidate Snapshots (Right Click Virtual Machine > Snapshot > Consolidate)
  3. If the problem still persists, create a snapshot and delete it with the "Delete All" option.

I had an interesting issue to resolve today, one that I haven’t seen before and one that took a bit of digging to resolve. The problem related to migrating some Exchange mailbox servers from a legacy ESXi 4.1 host onto new ESXi 5.1 host.

This should have been a simple vMotion operation, but the task failed repeatedly at approximately 65% complete. I tried using both high and standard priority migrations, but it failed every time, simply reporting “The VM failed to resume on the destination during early power on

The VM failed to resume on the destination during early power on

First thing I did was check the host log files (vmkwarning and vmkernel), as well as the virtual machine log file (vmware.log) located in the virtual machine folder on the datastore;

## START ## Nov 6 13:32:30 src-host vmkernel: 843:05:02:40.662 cpu6:64249)WARNING: Migrate: 296: 1415280590445360 S: Failed: Failed to resume VM (0xbad0044) @0x418023e4b250 Nov 6 13:32:30 src-host vmkernel: 843:05:02:40.664 cpu3:64248)WARNING: VMotionSend: 3857: 1415280590445360 S: failed to send final set of pages to the remote host <xx.xx.xx.xx>: Failure. Nov 6 13:32:30 src-host vmkernel: 843:05:02:40.689 cpu12:48347)WARNING: Migrate: 4328: 1415280590445360 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error. Nov 6 14:51:31 src-host vmkernel: 843:06:21:41.945 cpu6:64267)WARNING: Migrate: 296: 1415285312829818 S: Failed: Failed to resume VM (0xbad0044) @0x418023e4b250 Nov 6 14:51:31 src-host vmkernel: 843:06:21:41.953 cpu19:64266)WARNING: VMotionSend: 3857: 1415285312829818 S: failed to send final set of pages to the remote host <xx.xx.xx.xx>: Failure. Nov 6 14:51:31 src-host vmkernel: 843:06:21:41.970 cpu12:48347)WARNING: Migrate: 4328: 1415285312829818 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error. ## END ## ## START ## Nov 6 13:35:23 dst-host vmkernel: 501:21:49:25.404 cpu1:63073)WARNING: MemSched: 12625: Non-overhead memory reservation for vmx user-world (worldID=63073) is greater than desired minimum amount of 57344 KB (min=57344 KB, reservedOverhead=0 KB, totalReserved=68812 KB) Nov 6 13:36:05 dst-host vmkernel: 501:21:50:07.143 cpu6:63073)WARNING: MemSched: vm 63073: 5199: Cannot reduce reservation by 2021 pages (total reservation: 27428 pages, consumed reservation: 27428 pages) Nov 6 13:36:28 dst-host vmkernel: 501:21:50:29.836 cpu5:63091)WARNING: MemSched: 12625: Non-overhead memory reservation for vmx user-world (worldID=63091) is greater than desired minimum amount of 57344 KB (min=57344 KB, reservedOverhead=0 KB, totalReserved=68552 KB) Nov 6 13:36:39 dst-host vmkernel: 501:21:50:41.256 cpu6:63091)WARNING: MemSched: vm 63091: 5199: Cannot reduce reservation by 1913 pages (total reservation: 24038 pages, consumed reservation: 24038 pages) Nov 6 13:37:10 dst-host vmkernel: 501:21:51:12.241 cpu5:63106)WARNING: MemSched: 12625: Non-overhead memory reservation for vmx user-world (worldID=63106) is greater than desired minimum amount of 57344 KB (min=57344 KB, reservedOverhead=0 KB, totalReserved=68724 KB) Nov 6 13:37:50 dst-host vmkernel: 501:21:51:51.758 cpu11:63106)WARNING: MemSched: vm 63106: 5199: Cannot reduce reservation by 2021 pages (total reservation: 27327 pages, consumed reservation: 27327 pages) Nov 6 13:38:16 dst-host vmkernel: 501:21:52:18.119 cpu6:63124)WARNING: MemSched: 12625: Non-overhead memory reservation for vmx user-world (worldID=63124) is greater than desired minimum amount of 57344 KB (min=57344 KB, reservedOverhead=0 KB, totalReserved=69464 KB) Nov 6 13:40:23 dst-host vmkernel: 501:21:54:25.336 cpu2:63124)WARNING: MemSched: vm 63124: 5199: Cannot reduce reservation by 2019 pages (total reservation: 38944 pages, consumed reservation: 38944 pages) Nov 6 14:48:34 dst-host vmkernel: 501:23:02:35.673 cpu1:63154)WARNING: MemSched: 12625: Non-overhead memory reservation for vmx user-world (worldID=63154) is greater than desired minimum amount of 57344 KB (min=57344 KB, reservedOverhead=0 KB, totalReserved=69540 KB) Nov 6 14:52:04 dst-host vmkernel: 501:23:06:05.978 cpu15:63154)WARNING: Migrate: 4328: 1415285312829818 D: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error. Nov 6 14:52:04 dst-host vmkernel: 501:23:06:05.978 cpu15:63154)WARNING: Migrate: 296: 1415285312829818 D: Failed: Migration determined a failure by the VMX (0xbad0092) @0x41801fb9acb9 Nov 6 14:52:04 dst-host vmkernel: 501:23:06:05.978 cpu15:63154)WARNING: VMotionUtil: 3548: 1415285312829818 D: timed out waiting 0 ms to transmit data. Nov 6 14:52:04 dst-host vmkernel: 501:23:06:05.978 cpu15:63154)WARNING: VMotionSend: 624: 1415285312829818 D: (9-0x410300002058) failed to receive 72/72 bytes from the remote host <xx.xx.xx.xx>: Timeout ## END ##

So reading through the host log files it looks like there was a problem reserving enough memory resources on the destination host and the operation timed out. This sounds relatively plausible, but the exact same results were observed trying to migrate the VM onto an empty host.

So next step was to review the guest VM’s log file;

## START ## Nov 06 14:52:04.416: vmx| DISKLIB-CTK : Could not open tracking file. File open returned IO error 4. Nov 06 14:52:04.416: vmx| DISKLIB-CTK : Could not open change tracking file "/vmfs/volumes/4f1edf9e-b9c5beba-cd04-0025b30202ac/GUEST-VM/GUEST-VM_3-ctk.vmdk": Could not open/create change tracking file. Nov 06 14:52:04.417: vmx| DISKLIB-LIB : Could not open change tracker /vmfs/volumes/4f1edf9e-b9c5beba-cd04-0025b30202ac/GUEST-VM/GUEST-VM_3-ctk.vmdk: Could not open/create change tracking file. Nov 06 14:52:04.421: vmx| DISKLIB-VMFS : "/vmfs/volumes/4f1edf9e-b9c5beba-cd04-0025b30202ac/GUEST-VM/GUEST-VM_3-rdm.vmdk" : closed. Nov 06 14:52:04.421: vmx| DISKLIB-LIB : Failed to open '/vmfs/volumes/4f1edf9e-b9c5beba-cd04-0025b30202ac/GUEST-VM/GUEST-VM_3.vmdk' with flags 0xa Could not open/create change tracking file (2108). Nov 06 14:52:04.421: vmx| DISK: Cannot open disk "/vmfs/volumes/4f1edf9e-b9c5beba-cd04-0025b30202ac/GUEST-VM/GUEST-VM_3.vmdk": Could not open/create change tracking file (2108). Nov 06 14:52:04.422: vmx| Msg_Post: Error Nov 06 14:52:04.422: vmx| [msg.disk.noBackEnd] Cannot open the disk '/vmfs/volumes/4f1edf9e-b9c5beba-cd04-0025b30202ac/GUEST-VM/GUEST-VM_3.vmdk' or one of the snapshot disks it depends on. Nov 06 14:52:04.423: vmx| [msg.disk.configureDiskError] Reason: Could not open/create change tracking file.---------------------------------------- Nov 06 14:52:04.437: vmx| Module DiskEarly power on failed. Nov 06 14:52:04.439: vmx| VMX_PowerOn: ModuleTable_PowerOn = 0 Nov 06 14:52:04.440: vmx| MigrateSetStateFinished: type=2 new state=11 Nov 06 14:52:04.440: vmx| MigrateSetState: Transitioning from state 10 to 11. Nov 06 14:52:04.441: vmx| Migrate_SetFailure: The VM failed to resume on the destination during early power on. Nov 06 14:52:04.441: vmx| Msg_Post: Error Nov 06 14:52:04.442: vmx| [msg.migrate.resume.fail] The VM failed to resume on the destination during early power on. ## END ##

Interestingly here, we now start getting some hints that perhaps a file lock is occurring and we also see the same error message that was observed in the vSphere client. The VM failed to resume on the destination during early power on.

I decided to have a look at the contents of the virtual machine folder, and found a number of suspicious looking “-ctk.vmdk” files, mostly time stamped from more than two years ago.

## START ## [root@host GUEST-VM]# ls -lh total 73G -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_10-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_10-rdm.vmdk -rw------- 1 root root 582 Feb 28 2012 GUEST-VM_10.vmdk -rw------- 1 root root 7.7M Nov 6 14:51 GUEST-VM_11-ctk.vmdk -rw------- 1 root root 983G Feb 5 2014 GUEST-VM_11-rdm.vmdk -rw------- 1 root root 559 Feb 5 2014 GUEST-VM_11.vmdk -rw------- 1 root root 65K Jun 22 2013 GUEST-VM_1-ctk.vmdk -rw------- 1 root root 1.0G Nov 3 17:33 GUEST-VM_1-flat.vmdk -rw------- 1 root root 586 Feb 27 2012 GUEST-VM_1.vmdk -rw------- 1 root root 65K Jun 22 2013 GUEST-VM_2-ctk.vmdk -rw------- 1 root root 1.0G Nov 3 17:33 GUEST-VM_2-flat.vmdk -rw------- 1 root root 586 Feb 27 2012 GUEST-VM_2.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_3-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_3-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_3.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_4-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_4-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_4.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_5-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_5-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_5.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_6-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_6-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_6.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_7-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_7-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_7.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_8-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_8-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_8.vmdk -rw-r--r-- 1 root root 48 Nov 6 14:48 GUEST-VM-98045e41.hlog -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_9-ctk.vmdk -rw------- 1 root root 1.1T Feb 27 2012 GUEST-VM_9-rdm.vmdk -rw------- 1 root root 580 Feb 28 2012 GUEST-VM_9.vmdk -rw------- 1 root root 4.4M Feb 28 2012 GUEST-VM-ctk.vmdk -rw------- 1 root root 70G Nov 6 15:11 GUEST-VM-flat.vmdk -rw------- 1 root root 8.5K Nov 3 17:36 GUEST-VM.nvram -rw------- 1 root root 585 Feb 27 2012 GUEST-VM.vmdk -rw-r--r-- 1 root root 44 Feb 28 2012 GUEST-VM.vmsd -rwxr-xr-x 1 root root 5.8K Feb 19 2014 GUEST-VM.vmx -rw-r--r-- 1 root root 266 Feb 5 2014 GUEST-VM.vmxf drwxr-xr-x 1 root root 420 Feb 19 2014 phd -rw-r--r-- 1 root root 57K Nov 19 2012 vmware-43.log -rw-r--r-- 1 root root 57K Jun 22 2013 vmware-44.log -rw-r--r-- 1 root root 57K Jun 22 2013 vmware-45.log -rw-r--r-- 1 root root 1.0M Feb 19 2014 vmware-46.log -rw-r--r-- 1 root root 1020K Nov 6 14:51 vmware-47.log -rw-r--r-- 1 root root 57K Nov 6 13:33 vmware-48.log -rw-r--r-- 1 root root 57K Nov 6 14:52 vmware.log ## END ##

You can consolidate this view with a simple grep of just the “-ctk.vmdk” files;

## START ## [root@host GUEST-VM]# ls -al | grep ctk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_10-ctk.vmdk -rw------- 1 root root 8053248 Nov 6 14:51 GUEST-VM_11-ctk.vmdk -rw------- 1 root root 66048 Jun 22 2013 GUEST-VM_1-ctk.vmdk -rw------- 1 root root 66048 Jun 22 2013 GUEST-VM_2-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_3-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_4-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_5-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_6-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_7-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_8-ctk.vmdk -rw------- 1 root root 4436480 Feb 28 2012 GUEST-VM_9-ctk.vmdk -rw------- 1 root root 4588032 Feb 28 2012 GUEST-VM-ctk.vmdk ## END ##

So for ESX/ESXi 3.x/4.x and ESXi 5.0, the lock status of these “-ctk.vmdk” files can be obtained using the vmkfstools command. The process and syntax is explained in detail in KB1003397, titled “Unable to perform operations on a virtual machine with a locked disk.”

## START ## vmkfstools -D /vmfs/volumes/LUN/VM/disk-flat.vmdk ## END ##

You see output similar to this below, and are specifically interested in the mode value returned;

## START ## Lock [type 10c00001 offset 54009856 v 11, hb offset 3198976 gen 9, mode 0, owner 4655cd8b-3c4a19f2-17bc-00145e808070 mtime 114] Addr <4, 116, 4>, gen 5, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 5368709120, nb 0 tbz 0, cow 0, zla 3, bs 1048576 ## END ##

The mode indicates the type of lock that is on the file, as follows;

## START ## mode 0 = no lock mode 1 = is an exclusive lock (vmx file of a powered on VM, the currently used disk (flat or delta), *vswp, etc.) mode 2 = is a read-only lock (e.g. on the ..-flat.vmdk of a running VM with snapshots) mode 3 = is a multi-writer lock (e.g. used for MSCS clusters disks or FT VMs). ## END ##

In my case, all “-ctk.vmdk” files reported an exclusive mode 1 lock;

## START ## [root@host GUEST-VM]# vmkfstools -D GUEST-VM-ctk.vmdk Lock [type 10c00001 offset 62181376 v 7105, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 14822239] Addr <4, 89, 194>, gen 7012, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4588032, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_1-ctk.vmdk Lock [type 10c00001 offset 62185472 v 7106, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 14822246] Addr <4, 89, 196>, gen 7013, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 66048, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_2-ctk.vmdk Lock [type 10c00001 offset 62187520 v 7107, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 14822253] Addr <4, 89, 197>, gen 7014, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 66048, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_3-ctk.vmdk Lock [type 10c00001 offset 62189568 v 7052, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201757] Addr <4, 89, 198>, gen 7015, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_4-ctk.vmdk Lock [type 10c00001 offset 62191616 v 7053, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201767] Addr <4, 89, 199>, gen 7016, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_5-ctk.vmdk Lock [type 10c00001 offset 61786112 v 7054, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201777] Addr <4, 89, 1>, gen 7017, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_6-ctk.vmdk Lock [type 10c00001 offset 61792256 v 7055, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201787] Addr <4, 89, 4>, gen 7018, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_7-ctk.vmdk Lock [type 10c00001 offset 61794304 v 7056, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201797] Addr <4, 89, 5>, gen 7019, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_8-ctk.vmdk Lock [type 10c00001 offset 61796352 v 7057, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201807] Addr <4, 89, 6>, gen 7020, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_9-ctk.vmdk Lock [type 10c00001 offset 61798400 v 7058, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201817] Addr <4, 89, 7>, gen 7021, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_10-ctk.vmdk Lock [type 10c00001 offset 61800448 v 7059, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 11201827] Addr <4, 89, 8>, gen 7022, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 4436480, nb 1 tbz 0, cow 0, zla 1, bs 8388608 [root@host GUEST-VM]# vmkfstools -D GUEST-VM_11-ctk.vmdk Lock [type 10c00001 offset 12601344 v 46751, hb offset 3211264 gen 87017, mode 1, owner 5003ce7b-b04b7d3f-f4e5-b499babda354 mtime 14822300] Addr <4, 10, 9>, gen 46740, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 8053248, nb 1 tbz 0, cow 0, zla 1, bs 8388608 ## END ##

So at this stage I created a “tmp” directory in the virtual machine folder and moved all the “-ctk.vmdk” files here. Since this was a live, powered on VM, I felt more comfortable doing this with a GUI than using the shell and used WinSCP to transfer the files.

I then confirmed there were no longer any “-ctk.vmdk” files in the virtual machine folder, and that they were all in the newly created “tmp” folder;

## START ## [root@host GUEST-VM]# ls -al | grep ctk [root@host GUEST-VM]# [root@host GUEST-VM]# cd /vmfs/volumes/datastore/GUEST-VM/tmp/ [root@host tmp]# ls -lh total 96M -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_10-ctk.vmdk -rw------- 1 root root 7.7M Nov 6 15:32 GUEST-VM_11-ctk.vmdk -rw------- 1 root root 65K Jun 22 2013 GUEST-VM_1-ctk.vmdk -rw------- 1 root root 65K Jun 22 2013 GUEST-VM_2-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_3-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_4-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_5-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_6-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_7-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_8-ctk.vmdk -rw------- 1 root root 4.3M Feb 28 2012 GUEST-VM_9-ctk.vmdk -rw------- 1 root root 4.4M Feb 28 2012 GUEST-VM-ctk.vmdk [root@host tmp]# ## END ##

Now the vMotion operation completes successfully;

The VM failed to resume on the destination during early power on

Here are the accompanying entries from the hosts vmkernel logs, which look much healthier than the original logs;

## START ## Nov 6 15:53:18 src-host vmkernel: 843:07:23:28.414 cpu12:48347)Migrate: vm 48348: 3046: Setting VMOTION info: Source ts = 1415289229136448, src ip = <xx.xx.xx.xx> dest ip = <xx.xx.xx.xx> Dest wid = 4359 using SHARED swap Nov 6 15:53:18 src-host vmkernel: 843:07:23:28.415 cpu12:48347)Tcpip_Vmk: 1013: Affinitizing xx.xx.xx.xx to world 64297, Success Nov 6 15:53:18 src-host vmkernel: 843:07:23:28.415 cpu12:48347)VMotion: 2366: 1415289229136448 S: Set ip address 'xx.xx.xx.xx' worldlet affinity to send World ID 64297 Nov 6 15:53:18 src-host vmkernel: 843:07:23:28.415 cpu5:4327)MigrateNet: vm 4327: 1378: Accepted connection from <xx.xx.xx.xx> Nov 6 15:53:18 src-host vmkernel: 843:07:23:28.415 cpu5:4327)MigrateNet: vm 4327: 1422: dataSocket 0x4100a6063410 receive buffer size is 563272 Nov 6 15:53:18 src-host vmkernel: 843:07:23:28.497 cpu23:64298)VMotionDiskOp: 769: 1415289229136448 S: DiskOps handshake successful. Nov 6 15:55:59 src-host vmkernel: 843:07:26:10.006 cpu20:48348)VMotion: 3714: 1415289229136448 S: Another pre-copy iteration needed with 640966 pages left to send (prev2 4194304, prev 4194304, network bandwidth ~91.894 MB/s) Nov 6 15:56:29 src-host vmkernel: 843:07:26:39.611 cpu21:48348)VMotion: 3714: 1415289229136448 S: Another pre-copy iteration needed with 240035 pages left to send (prev2 4194304, prev 640966, network bandwidth ~91.010 MB/s) Nov 6 15:56:42 src-host vmkernel: 843:07:26:53.116 cpu22:48348)VMotion: 3666: 1415289229136448 S: Stopping pre-copy: not enough forward progress (Pages left to send: prev2 640966, prev 240035, cur 185166, network bandwidth ~87.701 MB/s) Nov 6 15:56:42 src-host vmkernel: 843:07:26:53.116 cpu22:48348)VMotion: 3696: 1415289229136448 S: Remaining pages can be sent in 8.445 seconds, which is less than the maximum switchover time of 100 seconds, so proceeding with suspend. Nov 6 15:56:54 src-host vmkernel: 843:07:27:04.521 cpu9:64297)VMotionSend: 3866: 1415289229136448 S: Sent all modified pages to destination (network bandwidth ~81.079 MB/s) ## END ## ## START ## Nov 6 15:57:20 dst-host vmkernel: 0:01:16:26.628 cpu19:4359)VMotion: 4635: 1415289229136448 D: Page-in made enough progress during checkpoint load. Resuming immediately. Nov 6 15:57:20 dst-host vmkernel: 0:01:16:26.664 cpu19:4359)VmMemMigrate: vm 4359: 4786: Regular swap file bitmap checks out. Nov 6 15:57:20 dst-host vmkernel: 0:01:16:26.667 cpu19:4359)VMotion: 4489: 1415289229136448 D: Resume handshake successful Nov 6 15:57:20 dst-host vmkernel: 0:01:16:26.667 cpu13:4374)Swap: vm 4359: 9066: Starting prefault for the migration swap file Nov 6 15:57:20 dst-host vmkernel: 0:01:16:26.736 cpu21:4374)Swap: vm 4359: 9205: Finish swapping in migration swap file. (faulted 0 pages, pshared 0 pages). Success. Nov 6 15:57:20 dst-host vmkernel: 0:01:16:26.824 cpu18:4359)Net: 1421: connected GUEST-VM eth0 to Network xx.xx.xx, portID 0x2000004 Nov 6 15:57:20 dst-host vmkernel: 0:01:16:27.003 cpu21:4359)NetPort: 982: enabled port 0x2000004 with mac 00:00:00:00:00:00 Nov 6 15:57:20 dst-host vmkernel: 0:01:16:27.003 cpu21:4359)Net: 1421: connected GUEST-VM eth0 to Network xx.xx.xx, portID 0x2000005 Nov 6 15:57:20 dst-host vmkernel: 0:01:16:27.003 cpu21:4359)NetPort: 982: enabled port 0x2000005 with mac 00:00:00:00:00:00 Nov 6 15:57:26 dst-host vmkernel: 0:01:16:33.040 cpu16:4359)VmMemMigrate: vm 4359: 1946: pgNum (0x3fce57) changed type to 1 while remotely faulting it in. Nov 6 15:57:26 dst-host vmkernel: 0:01:16:33.061 cpu16:4359)VmMemMigrate: vm 4359: 1946: pgNum (0x3fd43e) changed type to 1 while remotely faulting it in. Nov 6 15:57:26 dst-host vmkernel: 0:01:16:33.096 cpu16:4359)VmMemMigrate: vm 4359: 1946: pgNum (0x3fe6b3) changed type to 1 while remotely faulting it in. Nov 6 15:57:27 dst-host vmkernel: 0:01:16:33.166 cpu1:4367)VMotionRecv: 1984: 1415289229136448 D: DONE paging in Nov 6 15:57:27 dst-host vmkernel: 0:01:16:33.166 cpu1:4367)VMotionRecv: 1992: 1415289229136448 D: Estimated network bandwidth 81.633 MB/s during page-in ## END ##

I’m not sure where these “-ctk.vmdk” files came from, but suspect it may have originated from a legacy backup process from before my time. At least for now the issue is resolved and we know what to look for the next time this happens.

Credits;

Thanks to Jakob Fabritius Nørregaard for posting this blog article which helped identify and resolve this issue.

 81,477 total views,  5 views today