Summary: Sol8 and EVA hangs

From: Eugene Schmidt <fereug_at_acute.co.za>
Date: Tue Nov 02 2004 - 18:33:31 EST
Hi Everybody

Long overdue summary.

No applicable answers received. However, it seemed there was some interest
on this topic.

Anyway, the Solaris system was healthy, with the failure way downstream in
the SAN infrastructure (fibre cable between switches). Somehow this slipped
past the SAN supplier and was only found after this started impacting other
servers. So much for logs...

After the fibre was replaced, the errors stopped.

Best regards

Eugene
===============================================


Hope someone has seen this one and can help please?

Customer has an E4500, Solaris 8 with newly attached 2 x EVA disk arrays via
two QLogic 2200 SBus HBA's. Tesing was 100% and fast.

Secure Path 3.0D is loaded for channel failover.

Started experiencing hangs today. What had changed? Was rebooted this
morning. No changes prior to reboot.

Initially no errors in /var/adm/messages, but after a second reboot, errors
started appearing:

Oct  8 11:00:41 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,1 (ssd5):
Oct  8 11:00:41 proddb      SCSI transport failed: reason 'aborted':
retrying command
Oct  8 11:09:00 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):
Oct  8 11:09:00 proddb      SCSI transport failed: reason 'aborted':
retrying command
Oct  8 11:58:52 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):
Oct  8 11:58:52 proddb      SCSI transport failed: reason 'aborted':
retrying command
Oct  8 12:11:13 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):

Disks c7t0d0  c7t0d1 hanging. C6 performs beautifully.

Switch logs and EVA logs shows nothing.

No other error messages except the shown above.

Mounting disk readonly and putting heavy I/O on it emulates problem.

Also, iostat shows disk as 100% busy, with no I/O passing thru. hsx dev -
current path - has same hung state:
"9  9 17 66
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 hsx1
    ....
    0.0    0.0    0.0    0.0  0.0  1.0    0.0    0.0   0 100 hsx813
    .....
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
    0.0    0.8    0.0    0.4  0.0  0.0    0.0   13.9   0   1 c0t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t6d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c6t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c6t0d1
    0.0    4.2    0.0   18.6  0.0  0.0    0.0    0.4   0   0 c6t0d2
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c6t0d3
    0.0    0.0    0.0    0.0  0.0  1.0    0.0    0.0   0 100 c7t0d0
    0.0    0.0    ...
"

Below lenghty config files as installed by install script.

Promise a summary.

Thx

E Schmidt
==========

"spmgr" display shows the following config:
# spmgr display
  Server:  acproddb10    Report Created: Fri, Oct 08 16:34:46 2004
  Command: spmgr display
  = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
  Storage:  5000-1FE1-5002-81C0
  Load Balance: Off  Auto-restore: Off
  Path Verify: On    Verify Interval: 30
  HBAs: qla2200-0  qla2200-2
  Controller:  P5849D5AAPW01O, Operational
               P5849D5AAPW038, Operational
  Devices:  c6t0d0  c6t0d1  c6t0d2  c6t0d3

  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  0   c6t0d0             6005-08B4-0001-3879-0000-D000-0150-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW01O                                     no
                      hsx-1-37-1         qla2200-0       no           Active
                      hsx-3655-36-1      qla2200-2       no
Available

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW038                                     no
                      hsx-204-38-1       qla2200-0       no
Standby
                      hsx-3858-39-1      qla2200-2       no
Standby


  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  1   c6t0d1             6005-08B4-0001-3879-0000-D000-0153-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW01O                                     no
                      hsx-2-37-2         qla2200-0       no
Standby
                      hsx-3656-36-2      qla2200-2       no
Standby

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW038                                     no
                      hsx-205-38-2       qla2200-0       no           Active
                      hsx-3859-39-2      qla2200-2       no
Available


  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  2   c6t0d2             6005-08B4-0001-3879-0000-D000-0156-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW01O                                     no
                      hsx-3-37-3         qla2200-0       no           Active
                      hsx-3657-36-3      qla2200-2       no
Available

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW038                                     no
                      hsx-206-38-3       qla2200-0       no
Standby
                      hsx-3860-39-3      qla2200-2       no
Standby


  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  3   c6t0d3             6005-08B4-0001-3879-0000-D000-0164-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW01O                                     no
                      hsx-4-37-4         qla2200-0       no
Standby
                      hsx-3658-36-4      qla2200-2       no
Standby

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPW038                                     no
                      hsx-207-38-4       qla2200-0       no           Active
                      hsx-3861-39-4      qla2200-2       no
Available


  Storage:  5000-1FE1-5002-2510
  Load Balance: Off  Auto-restore: Off
  Path Verify: On    Verify Interval: 30
  HBAs: qla2200-0  qla2200-2
  Controller:  P5849D5AAPC09X, Operational
               P5849D5AAPC09E, Operational
  Devices:  c7t0d0  c7t0d1  c7t0d2  c7t0d3

  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  0   c7t0d0             6005-08B4-0001-24D1-0000-A000-0193-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09X                                     no
                      hsx-813-33-1       qla2200-0       no
Standby
                      hsx-4467-32-1      qla2200-2       no
Standby

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09E                                     YES
                      hsx-1016-34-1      qla2200-0       no           Active
                      hsx-4670-35-1      qla2200-2       no
Available


  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  1   c7t0d1             6005-08B4-0001-24D1-0000-A000-0196-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09X                                     no
                      hsx-814-33-2       qla2200-0       no           Active
                      hsx-4468-32-2      qla2200-2       no
Available

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09E                                     no
                      hsx-1017-34-2      qla2200-0       no
Standby
                      hsx-4671-35-2      qla2200-2       no
Standby


  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  2   c7t0d2             6005-08B4-0001-24D1-0000-A000-0199-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09X                                     no
                      hsx-815-33-3       qla2200-0       no
Standby
                      hsx-4469-32-3      qla2200-2       no
Standby

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09E                                     YES
                      hsx-1018-34-3      qla2200-0       no           Active
                      hsx-4672-35-3      qla2200-2       no
Available


  TGT/LUN   Device             WWLUN_ID
#_Paths
    0/  3   c7t0d3             6005-08B4-0001-24D1-0000-A000-01A7-0000   4

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09X                                     no
                      hsx-816-33-4       qla2200-0       no           Active
                      hsx-4470-32-4      qla2200-2       no
Available

          Controller  Path_Instance      HBA             Preferred?
Path_Status
          P5849D5AAPC09E                                     no
                      hsx-1019-34-4      qla2200-0       no
Standby
                      hsx-4673-35-4      qla2200-2       no
Standby
======== END OF OUTPUT ============

Entries in /etc/system:
* Start of CPQhsv edits. DO NOT DELETE THIS LINE
forceload: drv/clone
set maxphys=8388608
set sd:sd_max_throttle=32
set sd:sd_io_time=180
* End of CPQhsv edits. DO NOT DELETE THIS LINE
* Start of HPfcraid edits. DO NOT DELETE THIS LINE
forceload: drv/clone
forceload: drv/ssd
set maxphys=8388608
set sd:sd_max_throttle=32
set sd:sd_io_time=180
set ssd:ssd_max_throttle=32
set ssd:ssd_io_time=180
* End of HPfcraid edits. DO NOT DELETE THIS LINE

set shmsys:shminfo_shmmax=4194304000
------- EOF ---------------

Entries in /kernel/drv/ssd.conf:
#
# Copyright (c) 1995-1999 by Sun Microsystems, Inc.
# All rights reserved.
#
#ident   "@(#)ssd.conf 1.9     99/07/29 SMI"

name="ssd" parent="SUNW,pln" port=0 target=0;
....
name="ssd" parent="SUNW,pln" port=0 target=15;
name="ssd" parent="SUNW,pln" port=1 target=0;
name="ssd" parent="SUNW,pln" port=1 target=1;
.....
   ditto port=1 to port=5, with target=0 thru target=15
.....
name="ssd" parent="SUNW,pln" port=5 target=15;
name="ssd" parent="sf" target=0;
name="ssd" parent="fp" target=0;
name="ssd" parent="ifp" target=127;
name="ssd" parent="scsi_vhci" target=0;
---EOF --------------
/kernel/drv/hsx.conf:
#
# Compaq StorageWorks Secure Path
# hsx.conf - Hardware Configuration file for hsx, a Disk Array Block
#            SCSI Target driver.  Refer to the driver.conf(4) manpage
#            for more information on the syntax of this file.
#
#       name            "hsx"                   - required
#       class           "scsi"                  - required
#       target          SCSI target-ID
#       lun             SCSI logical unit number
#       qdepth          depth of command queue (1,..,64)
#       parent          restrict parent HBA
#       preferred       this path is preferred for a controller when load
#                       balancing is disabled
#
# If no "parent=" qualifier is present, all SCSI-HBA adapters in
# the system will attempt to attach an HSX instance at the indicated
# target/lun on the SCSI bus.
#
# HSX will only attach device instances for Compaq StorageWorks HSx80
# disk array targets. The SD device will also want to claim these
# targets. Explicit use of "parent=" in sd.conf may be required to
# resolve conflicts.
#
# Each HSX instance found will result in a path being provided via
# the misc/path driver.
name="hsx" parent="qla2200" target=37 lun=0 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=1 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=2 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=3 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=4 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=5 qdepth=32;
.... etc,
For targets = 32 to 39 (although not in sequence)   , lun= 0 thru 202
============= EOF

Contents of /kernel/drv/qla2300.conf

# Number of times to retry a SCSI queue full error.
#    Range: 0 - 255
hba0-queue-full-retry-count=16;

# Amount of time to delay after a SCSI queue full error before
# starting any new I/O commands.
#    Range: 0 - 255 seconds
hba0-queue-full-retry-delay=2;


# Maximum fibre channel frame size.
#    Range: 512, 1024 or 2048 bytes
hba0-max-frame-length=1024;

# Maximum number of commands queued on each logical unit.
#    Range: 1 - 65535
hba0-execution-throttle=16;

# Number of port login retry attempts.
#    Range: 0 - 255
hba0-login-retry-count=8;

# Enable/disable the use adapter hard loop ID address on the fibre
# channel bus.
#    0 = disable, 1 = enabled
hba0-enable-adapter-hard-loop-ID=0;

# Adapter hard loop ID address to use on the fibre channel bus.
#    Range: 0 - 125
hba0-adapter-hard-loop-ID=0;

# Enable/disable the use LIP reset for loop reset.
#    0 = disable, 1 = enabled
hba0-enable-LIP-reset=0;

# Enable/disable the use LIP full login for loop reset.
#    0 = disable, 1 = enabled
hba0-enable-LIP-full-login=1;

# Enable/disable the use of target reset for loop reset.
#    0 = disable, 1 = enabled
hba0-enable-target-reset=0;

# Amount of time to delay after a loop reset for starting any new
# I/O commands.
#    Range: 0 - 255 seconds
hba0-reset-delay=5;

# Number of times to retry a port that is not responding.
#    Range: 0 - 255
hba0-port-down-retry-count=90;

# Maximum number of LUNs to scan for, if a device does not
# support SCSI Report LUNs command.
#    Range: 1 - 256
hba0-maximum-luns-per-target=8;

# Connection options.
#    0 = loop only
#    1 = point-to-point only
#    2 = loop preferred, otherwise point-to-point
#    3 = point-to-point preferred, otherwise loop
hba0-connection-options=1;

# Fibre Channel tape support enable/disable.
#    0 = disable, 1 = enabled
hba0-fc-tape=1;

# PCI latency timer.
#    Range: 0 - 0xF8
#    Default: 0x40
hba0-pci-latency-timer=0x40;

# During link down conditions enable/disable the reporting of
# errors.
#    0 = disabled, 1 = enable
hba0-link-down-error=1;

# Amount of time to wait for loop to come up after it has gone down
# before reporting I/O errors.
#    Range: 0 - 240 seconds
hba0-link-down-timeout=10;

# Persistent binding only option.
#    0 = Reports to OS discovery of binded and non-binded devices
#    1 = Reports to OS discovery of persistent binded devices only
hba0-persistent-binding-configuration=1;

# Fast error reporting to Solaris, enabled/disabled.
#    0 = disabled, 1 = enable
hba0-fast-error-reporting=0;

# Enable extended logging.
#    0 = disabled, 1 = enable
hba0-extended-logging=0;

#####################################################################
#   WARNING: Beginning of Configuration Data stored by the QLogic   #
#        Applications. Consult documentation before editing         #
#                     any data passed this text.                    #
#####################################################################

# CPQ installation changes made.


# CPQswsp: start of Secure Path edits. Caution: do not remove! This line is
used by pkgadd/pkgrm.

hba0-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
hba2-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
hba0-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
hba2-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
hba0-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
hba2-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
hba0-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
hba2-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
hba0-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
hba2-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
hba0-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
hba2-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
hba0-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
hba2-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
hba0-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";
hba2-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";

# CPQswsp: end of Secure Path edits. Caution: do not remove! This line is
used by pkgadd/pkgrm.
=========== EOF =====================
/kernel/drv/swsp.conf
# Compaq StorageWorks Secure Path
# swsp.conf - Configuration file for swsp
#
# use swsp.conf to configure which arrays can be controlled by Secure Path
# add one entry of the following form per array:
#  name="swsp" class="root" portid=0 reg=0x0,0x(instance+1),0x1
#              instance=(instance #) array-name="ARRAY_WWID";
#
# configurable parameters can be set globally, or on an array basis by
# adding one of path-verify, path-verify-period load-balance or auto-restore
# to the line defining the array instance, or on a line by itself (for
global)
#
# path-verify=?
#       1= path-verification enabled
#       0= path-verification disabled
# path-verify-period=X
#       X = number of seconds between path verification attempts
#
# load-balance=?
#       1= enabled
#       0= disabled
#
# auto-restore=?
#       1= enabled
#       0= disabled
#
path-verify=1;
name="swsp" class="root" portid=0 reg=0x0,0x1,0x1 instance=0
array-name="5000-1FE1-5002-81C0";
wwlid-0-0="6005-08B4-0001-3879-0000-D000-0150-0000@0,0";
wwlid-0-1="6005-08B4-0001-3879-0000-D000-0153-0000@0,1";
wwlid-0-2="6005-08B4-0001-3879-0000-D000-0156-0000@0,2";
wwlid-0-3="6005-08B4-0001-3879-0000-D000-0164-0000@0,3";
name="swsp" class="root" portid=0 reg=0x0,0x2,0x1 instance=1
array-name="5000-1FE1-5002-2510";
wwlid-1-0="6005-08B4-0001-24D1-0000-A000-0193-0000@0,0";
wwlid-1-1="6005-08B4-0001-24D1-0000-A000-0196-0000@0,1";
wwlid-1-2="6005-08B4-0001-24D1-0000-A000-0199-0000@0,2";
wwlid-1-3="6005-08B4-0001-24D1-0000-A000-01A7-0000@0,3";
======================== EOF ========================================
=====================================================================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Nov 2 18:28:09 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:39 EST