SUMMARY: Diskpak problems, sync. transfer rates

From: Grant Schoep (grant@storm.com)
Date: Tue Apr 13 1999 - 19:07:52 CDT


Ok, sorry for the delayed reply, wanted to make sure my fix really fixed
the problem before posting it.
Thanks to:
Gary Franzyk
Misha Pavlov
Rachel Polanskis
Roop Kumar

Others? I thought I remember getting a few more replies, but I can't find
any reference to them. Sorry if I missed you.

        Ok. Here's the deal. Solaris 2.7's default scsi options needed to be set
to allow the SCSI card to speak with my older slower drives. Adding the
line: " set scsi_options=0x3f8 ", to the /etc/system file fixes this. So.
What does this do. I'll try to explain a bit.
        The kernel uses scsi_options to enable or disable different support
"modes" for SCSI.
Here is what the scsi_options are(pasted from sunsolves Sym. & Res. 10254)
SCSI option value to set the corresponding bit to 1
        Disconnect/reconnect 0x008 (bit3=1, starting with bit 0)
        Linked commands 0x010 (bit4=1)
        Synchronous transfer 0x020 (bit5=1)
        Parity 0x040 (bit6=1)
        Tagged Queuing 0x080 (bit7=1)
        Fast scsi 0x100 (bit8=1, or bit 9 if starting with 1)
        Wide scsi 0x200 (bit9=1)

To find out what your system is currently using do the following(I am
reading this off the above mentioned S&R #10254)
For Pre Solaris 2.5 use
        adb -k /kernel/unix /dev/mem
For Solaris 2.5 and later use
        adb -k
Then type:
scsi_options/X
$q

This will list your scsi_options setting.
        My Solaris 2.7 E450 machine was 1ff8, so I changed it too 3f8. 3f8 is
hex, it converts to 0001111111 in binary(well, in reverse, I remember why
in the back of my head, but not right now). This would turn on bits 3-9.
So, in effect I am turning every thing on up to wide SCSI(bit 9).
        What the document doesn't tell me is what the 3f8 setting turned off. If I
convert 1ff8 to binary, I see it has 3 extra bits set, bit 10, bit 11, and
bit 12. The documentation I found, doesn't mention anything about these.
UltraSCSI stuff maybe? If anyone knows what these three bits are I would be
very interested in them.
        So in effect, this maybe the control talk to the drives a the 3f8 setting
right away, and no timeout errors. I didn't really like this idea, since it
might be slowing down my internal drives, that didn't seem to mind the
faster setting(1ff8). The next solution is thanks to a Sun support
engineer, thanks Bill.
        I removed the set scsi option line from the /etc/system file, and added in
a new file called glm.conf in /krenel/drv.
This conf file set the speed of the individual SCSI controller that was
having the problems, so while this control is using 3f8, all the others
still default to 1ff8. For info on this file refer to "man glm"
$cat glm.conf
        name="glm" parent="/pci@6,4000"
        unit-address="2,1"
        scsi-options=0x3f8;
I rebooted and it seems the one control is happy with this setting, and the
other controls run just fine yet with the 1ff8 setting. Problem
solved!!!(crossed fingers)
        This was a bit long winded message, but I wanted to get everything in
here. If anyone knows what those 3 extra bits(10,11,12) do in eth
scsi_options setting I would really like to know.

        -grant
        

--Here's the original Message---
I am in the big process of switching to our new fileserver. I ran into a
problem.
There is a Sun Diskpak with 12 SCSI disks in it. I plugged it into our new
Ultra450, I believe the SCSI card that I plugged it into is a Symbios 876.
On my first boot -r, some driver reported this error(it also took a very
long to boot.)
WARNING: /pci@6,4000/scsi@2,1 (glm5):
        Connected command timeout for Target 15.0
darkstar unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6017]
        darkstar unix: WARNING: /pci@6,4000/scsi@2,1 (glm5):
5 of the drives reported this. Then, when I tried to mount the filesystems
of those 5, the mount went quickly, The other 7, that didn't report the
error, took a few minutes to mount. While I tried to mount these other 7
drives a similar message appeared in the console. Here it is.
Apr 3 19:55:40 darkstar unix: /pci@6,4000/scsi@2,1 (glm5):
Apr 3 19:55:40 darkstar Cmd (0x1058a98) dump for Target 15 Lun 0:
Apr 3 19:55:40 darkstar unix: /pci@6,4000/scsi@2,1 (glm5):
Apr 3 19:55:40 darkstar cdb=[ 0x8 0x0 0x8 0x30 0x8 0x0 ]
Apr 3 19:55:40 darkstar unix: /pci@6,4000/scsi@2,1 (glm5):
Apr 3 19:55:40 darkstar pkt_flags=0x4000 pkt_statistics=0x61
pkt_state=0x7
Apr 3 19:55:40 darkstar unix: /pci@6,4000/scsi@2,1 (glm5):
Apr 3 19:55:40 darkstar pkt_scbp=0x0 cmd_flags=0x8e0
Apr 3 19:55:40 darkstar unix: WARNING: /pci@6,4000/scsi@2,1 (glm5):
Apr 3 19:55:40 darkstar Connected command timeout for Target 15.0
Apr 3 19:55:40 darkstar unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6017]
Apr 3 19:55:40 darkstar unix: WARNING: /pci@6,4000/scsi@2,1 (glm5):
Apr 3 19:55:40 darkstar Target 15 reducing sync. transfer rate
Apr 3 19:55:40 darkstar unix: WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014]
Apr 3 19:55:40 darkstar unix: WARNING: /pci@6,4000/scsi@2,1/sd@f,0 (sd89):
Apr 3 19:55:40 darkstar SCSI transport failed: reason 'timeout':
retrying command
Apr 3 19:55:40 darkstar

        After the long long mount, it seemed fine, I can umount it, and then mount
it quickly again.
If I do a init 6, some different drives might report this problem, it
doesn't seem consistent.

        There is similar report in SunSolve that I found. Bug Report 4144715
Though it doesn' t seem like my exact problem, thought the error message is
the same.

        I am wondering if this is a problem with the Symbios card. I really hope
not, I bought this card for this one purpose. I think I may try to put
these drives on the Ultra450's own external SCSI port.

        Once I have waited for long the long mounts, everything seems to work just
great. I am wondering if the SCSI card is just faster than the older
drives, and it takes awhile for it to decide to sync down a speed level.
So, I am asking, is there away to avoid this? Is this what is supposed to
happen? Can I tell that SCSI card to just try at the slower speed to begin
with?

        I can't use my original SCSI from the old system, because it was an SBUS
card, and the Ultra450 is PCI. So, other than a 20 minute boot,
everything seems to work fine, so far, but the system hasn't really got a
chance to go into action.

ADDENDUM: After I wrote the above, I tried the diskpak thing on the
Ultra450's own external SCSI port. The drives all came up just fine, no
problems whatsoever. So.. It is either something in the PCI scsi card, or
the cable. The PCI card uses its own special special cable, some type of
high density plug on one end. But this cable is 6 meters, which I beleive
is the SCSI max. Could it be possible that my drives in the tower just
don't work with this long of cable. I wonder if I could find a shorter
cable for this special port, anyone have any more ideas. Thanks

        Will summarize.
        -grant
---------------------------------------------------------------------------
Grant Schoep, grant@storm.com
System/Network Administrator
L3 Communications Telemetry & Instrumentation
San Jose,CA (408)271-0800, Ext. 135



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:18 CDT