SUMMARY (sort of) : Re: Wierd SCSI problems on an Ultra 60

From: Earl Zmijewski (eez@fluent.com)
Date: Fri Aug 21 1998 - 16:02:16 CDT


I got a lot of responses to this and a number of "me too's". We ended up
pulling out the ODS stuff and trying one drive at a time and then sets of
drives. Each drive had its own filesytem and we copied a file from a drive
to itself as the test.

The drives on the right side of the box all worked fine. However, whenever
you started copying to a drive on the left, at the same time a drive on the
right was copying, you got a bus timeout. Same thing for 2 drives on the left
and none on the right. Swapping drives didn't help. This is definitely slot
dependant. If you only use the right, you are fine.

We got a new backplane from Sun and we have the same problem. The next step
is to get a 3rd back plane. Either they had a bad run of them or there is
serious engineering flaw in these boxes.

Earl

Original note follows.

> Earl Zmijewski wrote:
> >
> >
> >
> > Hello all. I have a new Sun Ultra 60 with two internal 4GB drives and
> > external Sun multipack (X6290A) with 6 9GB drives. The external drives are
> > set up as RAID5 set using ODS.
> >
> > Large writes to the RAID set will occasionally hang for 15-30 seconds with
> > the following error appearing in the logs. Then the copy will resume. This
> > will not happen all the time, but could occur multiple times during, say,
> > a 700MB transfer.
> >
> > Any idea what is going on? The cable between the multipack and the Ultra
> > 60 is less than a meter long, and we've tried this on several Ultra 60s with
> > the same result. Unfortunately, we don't have a spare Multpack enclosure
> > to try, but there isn't a whole lot in these things to go wrong.
> >
> > We're stumped. Any ideas are most welcome.
> >
> > Thanks,
> > Earl
> >
> > ---
> >
> > Aug 16 03:15:37 jrb unix: WARNING: /pci@1f,4000/scsi@3,1 (glm1):
> > Aug 16 03:15:37 jrb unix: Target 10 reducing sync. transfer rate
> > Aug 16 03:15:37 jrb unix: WARNING: /pci@1f,4000/scsi@3,1/sd@a,0 (sd24):
> > Aug 16 03:15:37 jrb unix: Error for Command: write Error Level: Retryable
> > Aug 16 03:15:37 jrb unix: Requested Block: 14694 Error Block: 14694
> > Aug 16 03:15:37 jrb unix: Vendor: SEAGATE Serial Number: 9749C61935
> > Aug 16 03:15:37 jrb unix: Sense Key: Aborted Command
> > Aug 16 03:15:37 jrb unix: ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 0x3
> > Aug 16 03:17:08 jrb unix: glm1: Cmd (0x602612c0) dump for Target 11 Lun 0:
> > Aug 16 03:17:08 jrb unix: glm1: cdb=[ 0xa 0x0 0x38 0x1c 0x21 0x0 ]
> > Aug 16 03:17:09 jrb unix: glm1: pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
> > Aug 16 03:17:09 jrb unix: glm1: pkt_scbp=0x0 cmd_flags=0x18e0
> > Aug 16 03:17:09 jrb unix: WARNING: /pci@1f,4000/scsi@3,1 (glm1):
> > Aug 16 03:17:09 jrb unix: Connected command timeout for Target 11.0
> > Aug 16 03:17:09 jrb unix: WARNING: /pci@1f,4000/scsi@3,1 (glm1):
> > Aug 16 03:17:09 jrb unix: Target 11 reducing sync. transfer rate
> > Aug 16 03:17:09 jrb unix: WARNING: /pci@1f,4000/scsi@3,1/sd@9,0 (sd23):
> > Aug 16 03:17:09 jrb unix: SCSI transport failed: reason 'reset': retrying command
> > Aug 16 03:17:09 jrb unix: WARNING: /pci@1f,4000/scsi@3,1/sd@a,0 (sd24):
> > Aug 16 03:17:09 jrb unix: SCSI transport failed: reason 'reset': retrying command
> > Aug 16 03:17:09 jrb unix: WARNING: /pci@1f,4000/scsi@3,1/sd@b,0 (sd25):
> > Aug 16 03:17:09 jrb unix: SCSI transport failed: reason 'timeout': retrying command
> > Aug 16 03:17:09 jrb unix: WARNING: /pci@1f,4000/scsi@3,1/sd@e,0 (sd28):
> > Aug 16 03:17:09 jrb unix: SCSI transport failed: reason 'reset': retrying command
>
>
> Are these Multipack _II_s ???
>
> The old Multipack I's were not certified for Ultra-Wide SCSI. We tried
> using them at a client site about 4 months back ... no soap. And we
> got the same sort of "reducing sync transfer rate" message you got.
>
> We replaced them with Multipack II's.
>
> No more than _1_ Multipack II, w/ <= 6 drives, per SCSI channel.
>
> 80 cm. _high_ _quality_ SCSI cables (no, we did not use Sun's).
>
> End of problem.
>

-- 
         Earl E. Zmijewski -- Director of Computer Services, Fluent Inc.   
       Centerra Resource Park; 10 Cavendish Court; Lebanon, NH 03766-1442
  eez@Fluent.COM  http://www.Fluent.COM  voice@603-643-2600  fax@603-643-3967 



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:46 CDT