Exaforge

Cloud, DevOps, Evangelism

Multipath Identification in ESX

There was an interesting post on the communities forum recently regarding multipathing and how ESXi identifies a given device as being the 'same' as another for multipath purposes.

Why Multipathing Drivers?

Why do we need a multipathing driver to start with?  We need it because every HBA card see's the volumes it sees in isolation - they generally have no idea that another card or path exists.

Here's the output of esxcfg-mpath -l from a host with multipathing configured (many thanks to Jason Boche for the example):

iqn.1998-01.com.vmware:esxi2-35bac89b-00023d000002,iqn.1992-04.com.emc:storage.Storage.tiny,t,1-naa.5000144f79272808
   Runtime Name: vmhba37:C1:T0:L0
   Device: naa.5000144f79272808
   Device Display Name: EMC iSCSI Disk (naa.5000144f79272808)
   Adapter: vmhba37 Channel: 1 Target: 0 LUN: 0
   Adapter Identifier: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Identifier: 00023d000002,iqn.1992-04.com.emc:storage.Storage.tiny,t,1
   Plugin: NMP
   State: active
   Transport: iscsi
   Adapter Transport Details: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Transport Details: IQN=iqn.1992-04.com.emc:storage.Storage.tiny Alias= Session=00023d000002 PortalTag=1

iqn.1998-01.com.vmware:esxi2-35bac89b-00023d000001,iqn.1992-04.com.emc:storage.Storage.tiny,t,1-naa.5000144f79272808
   Runtime Name: vmhba37:C0:T0:L0
   Device: naa.5000144f79272808
   Device Display Name: EMC iSCSI Disk (naa.5000144f79272808)
   Adapter: vmhba37 Channel: 0 Target: 0 LUN: 0
   Adapter Identifier: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Identifier: 00023d000001,iqn.1992-04.com.emc:storage.Storage.tiny,t,1
   Plugin: NMP
   State: active
   Transport: iscsi
   Adapter Transport Details: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Transport Details: IQN=iqn.1992-04.com.emc:storage.Storage.tiny Alias= Session=00023d000001 PortalTag=1

This output shows 2 separate devices discovered via different iSCSI IP addresses on my network.  These however, these are the same device!  Clearly ESX knows this, given the screenshot below:


Also, much the same view from the commandline:

naa.5000144f79272808
   Device Display Name: EMC iSCSI Disk (naa.5000144f79272808)
   Storage Array Type: VMW_SATP_DEFAULT_AA
   Storage Array Type Device Config: SATP VMW_SATP_DEFAULT_AA does not support device configuration.
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba37:C1:T0:L0, vmhba37:C0:T0:L0

So how did ESX determine that these were the same device?  The key is that first line, the 'naa' line.

Every SCSI device that conforms to recent standards (aka the SPC2 standard) supports an INQUIRY command.  This SCSI command allows the client initiator to request some information about any given device.  One of the sections of this information is the Page83h information, also called the Vital Product Data (VPD).  Within that data 'page' is a serial number that is unique for the device.

For the device above, that is the value following the 'naa' string: 5000144f79272808.  (NAA stands for Name Address Authority).

This number is a globally unique identifier assigned by the array vendor to that specific device that is independent of anything else (e.g. LUN #, etc).  This is the key to multipath identification.

So here's what happens.  An ESX host goes an enumerates every LUN visible to a given HBA:

iqn.1998-01.com.vmware:esxi2-35bac89b-00023d000002,iqn.1992-04.com.emc:storage.Storage.tiny,t,1-naa.5000144f79272808
   Runtime Name: vmhba37:C1:T0:L0
   Device: naa.5000144f79272808
   Device Display Name: EMC iSCSI Disk (naa.5000144f79272808)
   Adapter: vmhba37 Channel: 1 Target: 0 LUN: 0
   Adapter Identifier: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Identifier: 00023d000002,iqn.1992-04.com.emc:storage.Storage.tiny,t,1
   Plugin: NMP
   State: active
   Transport: iscsi
   Adapter Transport Details: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Transport Details: IQN=iqn.1992-04.com.emc:storage.Storage.tiny Alias= Session=00023d000002 PortalTag=1

And then does the next HBA (i've highlighted the differences):

iqn.1998-01.com.vmware:esxi2-35bac89b-00023d000001,iqn.1992-04.com.emc:storage.Storage.tiny,t,1-naa.5000144f79272808
   Runtime Name: vmhba37:C0:T0:L0
   Device: naa.5000144f79272808
   Device Display Name: EMC iSCSI Disk (naa.5000144f79272808)
   Adapter: vmhba37 Channel: 0 Target: 0 LUN: 0
   Adapter Identifier: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Identifier: 00023d000001,iqn.1992-04.com.emc:storage.Storage.tiny,t,1
   Plugin: NMP
   State: active
   Transport: iscsi
   Adapter Transport Details: iqn.1998-01.com.vmware:esxi2-35bac89b
   Target Transport Details: IQN=iqn.1992-04.com.emc:storage.Storage.tiny Alias= Session=00023d000001 PortalTag=1

Now that it has ID'd each device, it queries the Page83 VPD Serial number we discussed earlier and extracts it.  For every discovered device that has the same serial number as a previously discovered device, the additional device is marked as an additional path to that device.  You can see that during the 'path claiming' process in the vmkernel logs:

2011-07-25T17:19:46.458Z cpu1:2667)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba37:CH:1 T:0 CN:0: iSCSI connection is being marked "ONLINE"
2011-07-25T17:19:46.458Z cpu1:2667)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000002 TARGET: iqn.1992-04.com.emc:storage.Storage.tiny TPGT: 1 TSIH: 0]
2011-07-25T17:19:46.458Z cpu1:2667)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 192.168.0.51:53377 R: 192.168.0.7:3260]
2011-07-25T17:19:46.954Z cpu2:2845)ScsiScan: 1098: Path 'vmhba37:C1:T0:L0': Vendor: 'EMC     '  Model: 'LIFELINE-DISK   '  Rev: '1   '
2011-07-25T17:19:46.954Z cpu2:2845)ScsiScan: 1101: Path 'vmhba37:C1:T0:L0': Type: 0x0, ANSI rev: 4, TPGS: 0 (none)
2011-07-25T17:19:46.955Z cpu2:2845)ScsiScan: 1582: Add path: vmhba37:C1:T0:L0
2011-07-25T17:19:46.983Z cpu0:2845)ScsiPath: 4541: Plugin 'NMP' claimed path 'vmhba37:C1:T0:L0'

Finally, you see the end result:

# esxcli storage nmp device list
naa.5000144f79272808
   Device Display Name: EMC iSCSI Disk (naa.5000144f79272808)
   Storage Array Type: VMW_SATP_DEFAULT_AA
   Storage Array Type Device Config: SATP VMW_SATP_DEFAULT_AA does not support device configuration.
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=0: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba37:C1:T0:L0, vmhba37:C0:T0:L0

The system has identified all the relevant paths to a given device and marked them as 'working'.

Going back to the initial post that triggered this discussion, the user had multiple (different) devices that the array was returning with identical VPD serial numbers (a bug in the array software).  Because all of the devices were being returned with the same serial number, ESXi assumed they were all just different paths to the same device.

Now, what happens if your array *doesn't* support these Page83 extensions (maybe it only supports the very basic predecessor information in Page80)?  ESX uses a different method in that case, which will be the subject of a future blog post.

I hope this helped - corrections and comments welcome.