[UPDATE] SUMMARY: E420R unexplaned panic after UE error

G'day All.

A little update on my earlier summary, which I ended with:

>As to the qla error messages in the log, Kevin reinforced my opinion 
>that forceloading the drivers is not necessary. None of these devices 
>contain boot partitions. In the mean time we have been able to trace at 
>least some of those to  a faulty UPS that the storage array is plugged 
>into (the panicky server is not plugged in there, though).
QLogic have in the mean time done a little investigation in this and 
provided me with an explanation for the qla2300 messages. To refresh 
your memories, these are expamles of the messages I'm talking about:

>Feb  2 12:11:35 Slarty qla2300: [ID 175527 kern.info] qla2300(1): configure_loop, 2 gigabit data rate connection
>Feb  2 12:11:35 Slarty qla2300: [ID 467028 kern.info] qla2300(1): configure_loop, F-PORT connection
>Feb  2 12:11:35 Slarty qla2300: [ID 465925 kern.info] qla2300(1): status_entry, check condition sense data t1d0
>Feb  2 12:11:35 Slarty 70h  0h  6h  0h  0h  0h  0h  6h  0h  0h  0h  0h  29h  0h  0h  0h  0h 20h

Lyle Merdan of QLogic provided me with the following explanation of the last two lines (thanks Lyle) :

The t##d## is indicative of the disk that is reporting the check condition. Then at the beginning of the entry is the HBA instance. The example you gave tells me it's HBA instance 6.

> qla2300: [ID 465925 kern.info] qla2300(6): status_entry

  Q) What are these check conditions that appear when extended logging is enabled?
     qla2300: [ID 465925 kern.info] qla2300(6): status_entry, check condition sense data t94d0
     70h  0h  6h 42h 55h 5ah 5ah  ah  0h  0h  0h  0h 29h 0h  1h  0h  0h  
  A) These are errors returned from the storage to the HBA. There are two parts to a check
     condition. The ASC and ASCQ. The ASC is byte 12 and the ASCQ is byte 13. Start counting
     at 0. So in the above example the ASC is 29 and ASCQ is 0. These values can be looked up
     on this website: http://www.t10.org/lists/asc-num.htm 

As to what exactly the reported errors mean, you'll have to contact the storage vendor.

Now the reason you're getting the check conditions is you have extended logging enabled in the driver.
To disable extended logging you have to edit the /kernel/drv/qla2300.conf file and either add a line that explicitly
disables extended logging for HBA driver instance 6 OR use a GUI to turn extended logging off.

You could just add this line:

The website that Lyle mentiones has full explenation of all SCSI ASC/ASCQ combinations possible. It transpires then that all messages are caused by faults on the CLARIION. We'll persue this further with Dell.


