Latest Posts



Translate

Total Pageviews

Friday, 8 August 2014

ESXi/ESX hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to boot or during LUN rescan (1016106)

Symptoms

  • ESXi/ESX 4.x and ESXi 5.x hosts take a long time to boot. This time depends on the number of RDMs that are attached to the ESXi/ESX host.

    Note: In a system with 10 RDMs used in an MSCS cluster with two nodes, a reboot of the ESXi/ESX host with the secondary node takes approximately 30 minutes. In a system with less RDMs, the reboot time is less. For example, if only three RDMs are used, the reboot time is approximately 10 minutes.
  • ESXi intermittently shows an error message "Cannot synchronize host hostname. Operation Timed out. " on the Summary Tab and vSphere Client may not be able to start.
  • The screen logging shows the boot waiting after this message:

    Loading module multiextent.
  • The cluster is running virtual machines participating in an MSCS using shared RDMs and SCSI Reservations across hosts, and a virtual machine on another host is the active cluster node holding a SCSI Reservation.
  • Delays appear at these steps:

    • Starting path claiming and SCSI device discovery

      In the VMkernel log of the rebooting ESXi host (check the log file depending on the version of ESXi), you see entries similar to:

      Sep 24 12:25:36 cs-tse-d54 vmkernel: 0:00:01:57.828 cpu0:4096)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.6006016045502500176a24d34fbbdf11
      Sep 24 12:25:36 cs-tse-d54 vmkernel: 0:00:01:57.830 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:3:0 (naa.6006016045502500166a24d34fbbdf11)
      Sep 24 12:25:36 cs-tse-d54 vmkernel: 0:00:02:37.842 cpu3:4099)ScsiDeviceIO: 1672: Command 0x1a to device "naa.6006016045502500176a24d34fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

    • Mounting the partition of the RDM LUNs

      In the VMkernel log of the rebooting ESXi/ESX host, you see entries similar to:

      Sep 24 12:25:37 cs-tse-d54 vmkernel: 0:00:08:58.811 cpu2:4098)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.600601604550250083489d914fbbdf11
      Sep 24 12:25:37 cs-tse-d54 vmkernel: 0:00:08:58.814 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:9:0 (naa.600601604550250082489d914fbbdf11)
      Sep 24 12:25:37 cs-tse-d54 vmkernel: 0:00:09:38.855 cpu2:4098)ScsiDeviceIO: 1672: Command 0x1a to device "naa.600601604550250083489d914fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
      Sep 24 12:25:37 cs-tse-d54 vmkernel: 0:00:09:38.855 cpu1:4111)ScsiDeviceIO: 4494: Could not detect setting of QErr for device naa.600601604550250083489d914fbbdf11. Error Failure.
      Sep 24 12:25:37 cs-tse-d54 vmkernel: 0:00:10:08.945 cpu1:4111)WARNING: Partition: 801: Partition table read from device naa.600601604550250083489d914fbbdf11 failed: I/O error
      Sep 24 12:25:37 cs-tse-d54 vmkernel: 0:00:10:08.945 cpu1:4111)ScsiDevice: 2200: Successfully registered device "naa.600601604550250083489d914fbbdf11" from plugin "NMP" of type 0


      Oct 5 14:21:03 vmkernel: 47:02:52:19.382 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
      Oct 5 14:21:03 vmkernel: 47:02:52:19.383 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
      Oct 5 14:21:03 vmkernel: 47:02:52:19.383 cpu23:9621)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
      Oct 5 14:21:03 vmkernel: 47:02:52:19.383 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
      Oct 5 14:21:03 vmkernel: 47:02:52:19.383 cpu12:4108)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18
      Oct 5 14:21:03 vmkernel: 47:02:52:19.383 cpu12:4108)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18
      Oct 5 14:21:03 vmkernel: 47:02:52:19.383 cpu3:5733)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18
      Oct 5 14:21:03 vmkernel: 47:02:52:19.384 cpu12:9738)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18
      Oct 5 14:21:05 vmkernel: 47:02:52:21.383 cpu23:9621)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4

  • If you configure the setting on an existing VMFS LUN, you may see these errors in the vmkernel.log file:

    YYYY-MM-DDT13:34:04.247Z cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.YYYY-MM-DDT13:34:04.248Z cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.YYYY-MM-DDT13:34:04.255Z cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.

Purpose

This article describes a specific issue. If you experience all of the above symptoms, consult the sections below.

If you are experiencing only some of the symptoms, search the Knowledge Base for your symptoms or see:
Source:-