[My original inquiry is appended to the end of this message.]
Much thanks to all the folks who responded to my query!
Most of the responses fell into the following problem categories:
SCSI cabling:
- bad SCSI cable
- improper SCSI termination
- exceeding about 10ft on your SCSI bus (including internal ribbon cables)
- internal disk cable not securely connected
- loose SCSI connectors
- broken wires (flex the cables while exercising the farthest device
on the bus)
- SCSI cables routed near power cables
SCSI Controller:
- bad SCSI controller
Disk:
- bad disk drive
- bad blocks in swap partition
- power to an external drive is flakey (loose cord)
A shower of thanks to the following people:
ups!upstage!glenn@fourx.Aus.Sun.COM (Glenn Satchell)
Boyd Fletcher IV <boyd@cs.odu.edu>
Jeff Nieusma <stortek!Jeff_Nieusma@csn.org>
montjoy@thor.ece.uc.EDU (Robert Montjoy)
Dieter Muller <dworkin@shiara.rootgroup.com>
Perry_Hutchison.Portland@xerox.com
birger@vest.sdata.no (Birger A. Wathne)
dwf@mlb.semi.harris.com (Denis Faas)
Christian Lawrence <cal@soac.bellcore.com>
Don Pace <pace@zeppo.cc.fsu.edu>
Kathy Holle <holle@asc.slb.com>
poffen@sj.ate.slb.com (Russ Poffenberger)
Since I posted the problem to sun-managers, darwin has been operating
smoothly for a little over 2 days. I tried the following:
- Ran `newfs sd0b' to check for problems with the swap partition.
No problems reported.
- Powered down and resituated all SCSI cables.
Upon rebooting, console reported that SCSI device 3 (internal drive)
was not responding.
Opened darwin up and checked internal drive cable connections.
Rebooted with no problems.
I'm just going to wait for the next time this problem arises and try
out the other strategies that were suggested.
Kingsley Kerce kerce@psy.fsu.edu (Internet)
Department of Psychology
FSU, Tallahassee, FL, 32306-4019
Original inquiry:
> Our SparcStation IPX has crashed twice in the past two days. Let's
> call the machine darwin, 'cause that's its name. :-) Here's a history
> of the problem:
>
> Fri Dec 4 ~22:00 [I'm not at the console]
> darwin not responding to rlogin, ftp, telnet, etc.
> ping says darwin's alive, though
>
> Sat Dec 5 ~11:15 [I'm at the console]
> no response to L1-A key combo
> power down, then power up -- no problems rebooting
> /var/adm/messages contains nothing seemingly related
> while poking around, console reports:
>
> Dec 5 11:25:15 darwin vmunix: esp0: SCSI bus MESSAGE IN phase parity error
> Dec 5 11:25:15 darwin vmunix:
> Dec 5 11:25:15 darwin vmunix: esp0: SCSI bus MESSAGE OUT phase parity error
> Dec 5 11:25:15 darwin vmunix: sd1: SCSI transport failed: reason 'incomplete': retrying command
> Dec 5 11:25:15 darwin vmunix: sd1: SCSI transport failed: reason 'incomplete': retrying command
> Dec 5 11:25:15 darwin vmunix: sd1g: Error for command 'write'
> Dec 5 11:25:15 darwin vmunix: sd1g: Error Level: Fatal
> Dec 5 11:25:15 darwin vmunix: sd1g: Block 174096, Absolute Block: 254016
> Dec 5 11:25:15 darwin vmunix: sd1g: Sense Key: Hardware Error
> Dec 5 11:25:15 darwin vmunix: sd1g: Vendor 'SEAGATE' error code: 0x44
> Dec 5 11:25:18 darwin vmunix: esp0: SCSI bus DATA IN phase parity error
> Dec 5 11:25:18 darwin vmunix: sd0g: Error for command 'read'
> Dec 5 11:25:18 darwin vmunix: sd0g: Error Level: Retryable
> Dec 5 11:25:18 darwin vmunix: sd0g: Block 140560, Absolute Block: 222856
> Dec 5 11:25:18 darwin vmunix: sd0g: Sense Key: Aborted Command
> Dec 5 11:25:18 darwin vmunix: sd0g: Vendor 'MAXTOR' error code: 0x48
>
> after this, darwin seems o.k., though
> fsck reports that all's well
> a bit later, though:
>
> Dec 5 15:25:14 darwin vmunix: sd0: SCSI transport failed: reason 'incomplete': retrying command
> Dec 5 15:25:18 darwin last message repeated 16 times
> Dec 5 15:25:18 darwin vmunix: esp0: SCSI bus STATUS phase parity error
> Dec 5 15:25:18 darwin vmunix: esp0: SCSI bus MESSAGE OUT phase parity error
> Dec 5 15:25:18 darwin last message repeated 57 times
>
> the following morning, same symptoms as last night
>
> Sun Dec 6 ~09:35 darwin responds to L1-A, rebooting gives:
>
> Dec 6 09:37:39 darwin vmunix: sd0: disk not responding to selection
> Dec 6 09:37:39 darwin vmunix: sd0: disk not responding to selection
> Dec 6 09:37:39 darwin vmunix: sd0: disk okay
> Dec 6 09:37:39 darwin vmunix: sd0: disk not responding to selection
> Dec 6 09:37:39 darwin last message repeated 2 times
> Dec 6 09:37:39 darwin vmunix: sd0: disk okay
> Dec 6 09:37:39 darwin vmunix: sd0: disk not responding to selection
> Dec 6 09:37:39 darwin last message repeated 2 times
> [the previous 3 lines are repeated MANY times]
> Dec 6 09:37:39 darwin vmunix: panic: error in swapping in u-area<3>sd0: disk not responding to selection
> Dec 6 09:37:39 darwin vmunix:
> Dec 6 09:37:39 darwin vmunix: syncing file systems... SunOS Release 4.1.1-IPX (GENERIC) #1: Mon Apr 22 22:22:22 PDT 1991
> Dec 6 09:37:39 darwin vmunix: Copyright (c) 1983-1990, Sun Microsystems, Inc.
> Dec 6 09:37:39 darwin vmunix: mem = 16384K (0x1000000)
> Dec 6 09:37:39 darwin vmunix: avail mem = 14544896
> Dec 6 09:37:39 darwin vmunix: Ethernet address = 8:0:20:b:fe:fc
> Dec 6 09:37:39 darwin vmunix: cpu = SUNW,Sun 4/50
> Dec 6 09:37:39 darwin vmunix: zs0 at obio 0xf1000000 pri 12
> Dec 6 09:37:39 darwin vmunix: zs1 at obio 0xf0000000 pri 12
> Dec 6 09:37:39 darwin vmunix: audio0 at obio 0xf7201000 pri 13
> Dec 6 09:37:39 darwin vmunix: sbus0 at SBus slot 0 0x0
> Dec 6 09:37:39 darwin vmunix: dma0 at SBus slot 0 0x400000
> Dec 6 09:37:39 darwin vmunix: esp0 at SBus slot 0 0x800000 pri 3
> Dec 6 09:37:39 darwin vmunix: esp0: Target 3 now Synchronous at 4.0 mb/s max transmit rate
> Dec 6 09:37:39 darwin vmunix: sd0 at esp0 target 3 lun 0
> Dec 6 09:37:39 darwin vmunix: sd0: <SUN0207 cyl 1254 alt 2 hd 9 sec 36>
> Dec 6 09:37:39 darwin vmunix: esp0: Target 1 now Synchronous at 4.0 mb/s max transmit rate
> Dec 6 09:37:39 darwin vmunix: sd1 at esp0 target 1 lun 0
> Dec 6 09:37:39 darwin vmunix: sd1: <SUN0424 cyl 1151 alt 2 hd 9 sec 80>
> Dec 6 09:37:39 darwin vmunix: st0 at esp0 target 4 lun 0
> Dec 6 09:37:39 darwin vmunix: st0: <Archive QIC-150>
> Dec 6 09:37:39 darwin vmunix: le0 at SBus slot 0 0xc00000 pri 5
> Dec 6 09:37:39 darwin vmunix: cgsix0 at SBus slot 3 0x0 pri 7
> Dec 6 09:37:39 darwin vmunix: cgsix0: screen 1152x900, single buffered, 1M mappable, rev 5
> Dec 6 09:37:39 darwin vmunix: fd0 at obio 0xf7200000 pri 11
> Dec 6 09:37:39 darwin vmunix: root on sd0a fstype 4.2
> Dec 6 09:37:39 darwin vmunix: swap on sd0b fstype spec size 32724K
> Dec 6 09:37:39 darwin vmunix: dump on sd0b fstype spec size 32712K
>
> [reboot lines are intended as system config info]
> tried using savecore to no avail
>
> Dec 6 11:22:15 darwin savecore: reboot after panic: ufs_putpage hole
>
> Here's how we've configured darwin's disks:
>
> Filesystem kbytes Mounted on
> /dev/sd0a 7735 /
> /dev/sd0g 151399 /usr
> /dev/sd1h 212703 /home
> /dev/sd1g 138431 /usr/local
> /dev/sd1a 37479 /var
>
> This is the first disk problem we've experienced, to our knowledge.
>
> Any suggestions greatly appreciated!
> Kingsley Kerce kerce@psy.fsu.edu (Internet)
> Department of Psychology
> FSU, Tallahassee, FL, 32306-4019
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:54 CDT