Latest Posts



Translate

Total Pageviews

Tuesday, 1 July 2014

vMotion fails at 10% with the error: A general system error occurred: Migration failed while copying data, Broken Pipe (1013150)

Symptoms

  • vMotion fails at 10%
  • You see these errors in vCenter Server:
    • A general system error occurred: Migration failed while copying data, Broken Pipe
      Migration failed while copying data. Connection reset by peer
    • A general system error occured: Failed to start migration pre-copy Error 0xbad010d. The Esx host failed connect over the VMotion network. 
  • You see this error in the /var/log/messages (ESXi host) or /var/log/vmkernel (ESX host) log file:
    The ESX hosts failed to connect over the VMotion network Module Migrate power on failed
  • vmkping testing on the VMkernel network used for vMotion is successful

    Note: For more information, see VMkernel network connectivity with the vmkping command (1003728).
     
  • Disabling and re-enabling vMotion on the VMkernel port used for vMotion in vCenter does not resolve this issue
  • In the /var/log/vmkernel log file of the source ESX host, you see this warning:

    WARNING: MigrateNet: 210: 4225417790: 2-0x3fa0c8d0:Received only 0 of 68 bytes: Migration protocol error
  • In the /var/log/vmkernel log file of the destination ESX host, you see these messages:
    • WARNING: Migrate: 1153: 4225417790: Failed: I/O error (0xbad000a) @0x8d7c03
    • ESX hosts failed to connect over the VMotion network (0xbad010d) @0x0
    • Feb 22 14:14:04 esx1 vmkernel: 402:01:22:35.133 cpu6:1939)Migrate: vm 1940: 7338: Setting migration info ts = 1298397818667246, src ip = <192.168.103.48> dest ip = <192.168.103.7> Dest wid = 5272 using SHARED swap
      Feb 22 14:14:04 esx1 vmkernel: 402:01:22:35.134 cpu6:1939)World: vm 3330: 900: Starting world migSendHelper-1940 with flags 1
      Feb 22 14:14:04 esx1 vmkernel: 402:01:22:35.134 cpu6:1939)World: vm 4355: 900: Starting world migRecvHelper-1940 with flags 1
      Feb 22 14:14:04 esx1 vmkernel: 402:01:22:35.136 cpu4:3330)WARNING: MigrateNet: 309: 1298397818667246: 5-0x801f640:Sent only 4020 of 4096 bytes of message data: Broken pipe
      Feb 22 14:14:04 esx1 vmkernel: 402:01:22:35.136 cpu4:3330)WARNING: Migrate: 6776: 1298397818667246: Couldn't send data for 8: Broken pipe
      Feb 22 14:14:04 esx1 vmkernel: 402:01:22:35.136 cpu4:3330)WARNING: Migrate: 1243: 1298397818667246: Failed: Broken pipe (0xbad0052) @0x9efd5f
  • The /var/log/vmware/hostd.log (ESX) and /var/log/messages (ESXi) contains an entry similar to:
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 verbose 'vm:/vmfs/volumes/4c220e6f-01b124b3-f25d-e41f132dae86/twidmann_test/twidmann_test.vmx'] VMotionLastStatusCb: Failed with error 536871181: Failed to start migration pre-copy.  Error 0xba
    Apr  8 11:24:44 d010d.  The ESX hosts failed to connect over the VMotion network.
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 verbose 'vm:/vmfs/volumes/4c220e6f-01b124b3-f25d-e41f132dae86/twidmann_test/twidmann_test.vmx'] VMotionResolveCheck: Operation in progress
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 verbose 'vm:/vmfs/volumes/4c220e6f-01b124b3-f25d-e41f132dae86/twidmann_test/twidmann_test.vmx'] VMotionStatusCb: Completed
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 verbose 'vm:/vmfs/volumes/4c220e6f-01b124b3-f25d-e41f132dae86/twidmann_test/twidmann_test.vmx'] VMotionResolveCheck: Firing ResolveCb
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 info 'VMotionSrc (1302261872027749)'] ResolveCb: VMX reports needsUnregister = false for migrateType MIGRATE_TYPE_VMOTION
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 info 'VMotionSrc (1302261872027749)'] ResolveCb: Failed with fault: (vmodl.fault.SystemError) {
    Apr  8 11:24:44 Hostd:    dynamicType = <unset>,
    Apr  8 11:24:44 Hostd:    faultCause = (vmodl.MethodFault) null,
    Apr  8 11:24:44 Hostd:    reason = "Failed to start migration pre-copy.  Error 0xbad010d.  The ESX hosts failed to connect over the VMotion network.
    Apr  8 11:24:44 Hostd: ",
    Apr  8 11:24:44 Hostd:    msg = "",
    Apr  8 11:24:44 Hostd: }
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 verbose 'VMotionSrc (1302261872027749)'] Migration changed state from MIGRATING to DONE
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 37903B90 verbose 'VMotionSrc (1302261872027749)'] Finish called
    Apr  8 11:24:44 Hostd: [2011-04-08 11:24:44.929 366A9B90 info 'vm:/vmfs/volumes/4c220e6f-01b124b3-f25d-e41f132dae86/twidmann_test/twidmann_test.vmx'] Disconnect check in progress.

Resolution

This is a known issue affecting ESXi/ESX 4.0

This issue is resolved in ESXi/ESX 4.0 Update 2 and is being investigated by VMware for the other affected versions.
 
To download ESXi/ESX 4.0 Update 2, see Download VMware vSphere.
 
If you cannot upgrade, or you experience these symptoms in a newer version, you can workaround this issue by resetting theMigrate.Enabled setting on both the source and destination hosts.
 
Note: This issue may re-occur even after applying the workaround.
 
To reset the Migrate.Enabled setting:
  1. Connect vSphere or VMware Infrastructure Client to your vCenter Server.  
  2. Click on the ESX host.
  3. Click the Configuration tab.
  4. Click Advanced Settings under Software.
  5. Select Migrate and change Migrate.Enabled to 0.
  6. Click OK and close.
  7. Click on Advanced Settings.
  8. Select Migrate and change Migrate.Enabled to 1.
  9. Click OK and then Close.
Note: If you see the invalid parameter error after resetting Migrate.Enabled to 1, see Performing a vMotion or adding a network card to a virtual machine fails with the error: Necessary module isn't loaded. (2013128)
If these steps do not resolve the issue, try increasing the timeout for migration network operations after Step 4 and then continue with the remaining steps. Also, ensure to repeat these steps on the destination host.
To increase the timeout for migration network operations:
  1. Click the Configuration tab.
  2. Click Advanced Settings under Software > Migrate.
  3. Change Migrate.NetTimeout to 60 seconds. The default is 20 seconds.
  4. Click OK and then Close.
Note: This issue may also occur due to a duplicate IP address on your network. Ensure that the IP addresses for your vCenter Servers and ESX/ESXi hosts are unique.

Source:-
http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1013150