Last week I put out this message:
> We are running Novell GroupWise on a SS1000. Our users are
>NFS mounting the GroupWise data to a variety of Suns and PC's. This
>partition was on a normal narrow scsi bus but due to heavy loads on
>this disk we have tried shifting this partition to a fast/wide disk on
>a Sun SWIFT controller. The surprising thing is that the performance
>on the fast/wide is *slower* - doing a sar -d on the machine gives me:
>
>SunOS bunya 5.4 Generic_101945-36 sun4d 04/01/96
>
>19:25:24 device %busy avque r+w/s blks/s avwait avserv
>
> sd38 18 5.9 4 62 975.1 525.4
>19:26:24 sd38 10 4.7 3 40 1486.5 302.8
>19:27:24 sd38 18 4.8 4 62 697.1 456.1
>19:28:24 sd38 18 8.5 4 61 1766.7 396.3
>19:29:24 sd38 12 6.2 4 55 1298.3 448.2
>19:30:24 sd38 15 2.2 2 30 671.2 459.2
>19:31:24 sd38 17 3.9 3 53 658.1 496.4
>19:32:24 sd38 18 5.3 3 55 1099.8 416.6
>19:33:24 sd38 9 2.8 2 31 970.4 460.6
>19:34:24 sd38 18 5.0 4 55 973.7 439.1
>19:35:24 sd38 18 4.6 3 54 871.0 454.3
>19:36:24 sd38 12 5.1 3 50 1138.4 431.6
>19:37:24 sd38 15 4.4 2 34 1604.6 373.0
>19:38:24 sd38 18 4.2 3 54 721.4 474.7
>19:39:24 sd38 18 4.6 4 56 817.8 465.0
>19:40:24 sd38 9 3.6 2 31 1389.0 415.3
>19:41:24 sd38 10 1.8 2 35 443.5 353.6
>19:42:24 sd38 18 4.9 5 69 600.7 456.1
>19:43:24 sd38 12 5.0 3 50 1106.0 429.4
>
>And this is when the system is not very heavily loaded! All the other
>disks are at least an order of magnitude lower in their avserv and
>avwait. Why is this so? How can a fast wide be slower?
>
>BTW There are no other partitions on the disk apart from GroupWise so
>it cannot be another partition on the disk being thrashed :-)
>
>Things I have checked:
>
>a) no runaway process thrashing the disk
>b) The scsi options are set to 0x3f8 on the fas driver (ie fast/wide
>is enabled)
>c) no bad messages in /var/adm/messages nor on the console that would
>indicate something wrong with the disk/cabling.
>d) disk is correctly terminated. There is another disk on the same chain
>that seems ok (the disk sd38 also seemed fine when we were testing it
>just as a user partition)
>e) The filesystem is not full.
Well, this clearly had everyone else stumped as well. Thanks to the
people who responded with helpful hints. I did do some more
investigation on our system - the response time did get worse just
after we had applied the patch 102509-04 (isp and esp fixes), when I
backed this patch out things got marginally better. I did try running
some other fast wide disks on other machines running either 2.4 or 2.5
but could never get numbers as bad as we were seeing on our server. I
even ran groupwise on other machines with the fast wide disks but the
response was fine. I am putting the problem down to a combination of
machine architecture, the patch rev levels and the OS release on the
machine.
Over the weekend we upgraded the OS of the server to Solaris 2.5 and
now sar shows that the disks are running a lot better - the times are
not spectacular due, I suspect, to the usage pattern of groupwise but
the avwait is down by at _least_ a factor of 10 and the avserv is close
to that of the narrow disk that used to handle groupwise.
Many thanks to:
John Justin Hough <john@oncology.uthscsa.edu>
Kevin.Sheehan@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
charity@luey.redars.ca.Boeing.COM (Charity Gustine)
For their thoughtful responses.
-- Brett Lymn, Computer Systems Administrator, AWA Defence Industries =============================================================================== "Upgrading your memory gives you MORE RAM!" - ad in MacWAREHOUSE catalogue.
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:57 CDT