SUMMARY: Disk error messages

From: Aline H. Runde - MicroModule Systems (runde@mms.com)
Date: Fri Oct 08 1993 - 07:57:17 CDT


Dear Sun Managers,

Thanks so much to all who responded. All the info are very helpful
and I have checked them all out. I have included below all the replies
for people who asked for the summary.

For my case: is it SCSI/SBUS cable/connector problem or disk problem?
I'm still trying to detect the problem.

What I did last week: I switched the SCSI cable on the Sparc10 box from
target 0 (trouble disk sd3c) and connected it directly to target 1 (sd2c),
if that was a troubled cable then I will hope that I will get those error
messages again, this time with the other disk ... And it has been 6 days
since I switched the cable, I haven't seen any error, so far, and all my
disks behave well.

I'm not convinced and still keeping an eye on it.

Aline...
   
-------------------------------------------------------------------------------

From: SMTP%"rwolf@dretor.dciem.dnd.ca"

1) Unmount all the partitions on the drive.
2) Run fsck on all the partitions.
3) Attempt a dump of all the partitions.
4) Reformat the drive
5) Restore all the files.

Good luck, I have had to do this multiple times myself.

From: SMTP%"peter@jrc.nl"

I also had similar messages from a Seagate disc that
was supplied in a desk side box from a third party
supplier. After a great deal of messing around, swapping
of discs (did not solve anything) etc I eventually
traced the problem to a loose - terrible - connection
on the SCSI selector of the deskside box.

Suggest that you might look here first. If it is that
then simply unconnecting the selector switch should
work - but the disc will be sd0 (I think).

Hope that this is of some help.

Dr Peter Watkins.

From: SMTP%"loren@seth.nadn.navy.mil"

I do not know the answer to this problem and Sun did not either when we
experienced it (many times with 28 Sparc2s). This only occurs on our
1.3 GB disks. Out of about a dozen cases of this problem, every time
but one we had to replace the disk. Please summarize.

Tina J. Lorentzen
U.S. Naval Academy
e-mail: loren@seth.nadn.navy.mil

From: SMTP%"cbinnie@DCS-Systems.COM"

What is the OS level and are you running mixed fast and slow devices?
If you have a fast SBUS controller it may be related to the termination
and length of your bus. The length can not be more than 6 meters.
Hope this helps cbinnie@dcs-systems.com.

From: SMTP%"angebrandt@edvz.tuwien.ac.at"

I have experienced those problems on my sparcstation 10 which is used
as a server too.

Check all connectors on the SCSI-Bus and also the total
cable length, which should not exceed 5 meters(approximately).

At my case the problem I solved the problem by doubling (!!!) the
length of cable between two drives on the SCSI-Bus, which were
heavily used and by changing the SCSI-terminator.

In my opinion the quality of termination on the bus is very important.
Usually "Passive Terminators" are used, but socalled "Active
Terminators" should give better results.

The main advantage on active terminators is that they have an
internal logic for termination, which is specially designed for
FAST-SCSI.

Passive Terminators are only pull-up resistors.

Hope that helps

-MTA

From: SMTP%"kmah@DCS-Systems.COM"

Check your total SCSI cable length.
The max is either 15ft or 18ft (I always forget which)
but the important thing is that communication to
drives can be lost if you exceed the max length.

Other problem could be termination. I guess the next
question would be: was this working before (for a while)
or is this a new disk?

Ciao for now,
kevin

From: SMTP%"positron!metcalf@UUCP-GW.CC.UH.EDU"

You can try reseating the SCSI connectors to this disk. The last
time I saw something like this, it was the SCSI controller on the
drive.

Taft

From: SMTP%"daili@sun-robot.nuceng.ufl.edu"

Two possible solutions:

(1). Re-format the disk
(2). Re-make the vmunix.

I had the similar problems before, I did the above, the problem has gone for
a while!

Haiquan

From: SMTP%"ups!uniq.com.au!glenn@warrane.connect.com.au"

Well, if the disk has been replaced then that is obviously not the
problem. Have you replaced any cables or added any extra devices on the
scsi bus? Is it an active terminator? Try a new cable and swapping it
with each of the existing cables and see if it gets any better.

regards,

--
Glenn Satchell                    glenn@uniq.com.au  | "This is a unix system.

From: SMTP%"patelk@basf-corp.com" 5-OCT-1993 09:13:28.43

Check the length of the cable and terminator. Use the shortest available cable. I have a SS2 which had the same problem.

Kalpesh Patel BASF Corp.

From: SMTP%"jean@noao.edu" 5-OCT-1993 11:32:01.49

I have been having SCSI bus resets and errors for a while now, and I'm still working on it. But, about three or four weeks ago, I had exactly the error messages you listed in your posting. The problem may be in your SCSI bus (note the message "sd3: SCSI transport failed: reason 'incomplete': retrying command") or it may be your sbus, which is built into the motherboard.

I solved this particular problem by swapping out the "pizza box" of my Sparc10, which included the motherboard, sbus, and power supply. I have continued to get the SCSI messages, but the "disk not responding to selection" and "disk okay" messages went away.

Jean Goodrich National Solar Observatory

Original post > Dear Sun Managers, > > When I cd to one disk, I got I/O error and couldn't ls or entered any > command because or it hangs or it returns with message: I/O error. > > Below are the error messages collected from /var/adm/messages: > (zillions of lines like that, just extracted some) > > Sep 30 15:26:34 mcm1 vmunix: sd3: SCSI transport failed: reason 'incomplete': > retrying command > Sep 30 15:26:34 mcm1 last message repeated 29 times > Sep 30 15:26:34 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:26:34 mcm1 vmunix: sd3: disk okay > Sep 30 15:26:34 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:26:35 mcm1 vmunix: sd3: disk okay > Sep 30 15:26:35 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:26:35 mcm1 vmunix: sd3: disk okay > Sep 30 15:26:35 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:27:06 mcm1 last message repeated 2 times > Sep 30 15:27:06 mcm1 vmunix: sd3: disk okay > Sep 30 15:27:07 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:27:07 mcm1 vmunix: sd3: disk okay > Sep 30 15:27:07 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:27:07 mcm1 vmunix: sd3: disk okay > Sep 30 15:31:29 mcm1 vmunix: sd3: disk not responding to selection > Sep 30 15:32:00 mcm1 vmunix: sd3c: Error for command 'read' > Sep 30 15:32:00 mcm1 vmunix: sd3c: Error Level: Fatal > Sep 30 15:32:00 mcm1 vmunix: sd3c: Block 887712, Absolute Block: 0 > Sep 30 15:32:00 mcm1 vmunix: sd3c: Sense Key: Not Ready > Sep 30 15:32:00 mcm1 vmunix: sd3c: Vendor 'SEAGATE' error code: 0x4 > Sep 30 15:32:01 mcm1 vmunix: sd3c: Error for command 'read' > Sep 30 15:32:01 mcm1 vmunix: sd3c: Error Level: Fatal > Sep 30 15:33:03 mcm1 vmunix: sd3c: Block 887712, Absolute Block: 0 > Sep 30 15:33:03 mcm1 vmunix: sd3c: Sense Key: Not Ready > Sep 30 15:33:03 mcm1 vmunix: sd3c: Vendor 'SEAGATE' error code: 0x4 > Sep 30 15:33:33 mcm1 vmunix: sd3c: Error for command 'read' > Sep 30 15:33:33 mcm1 vmunix: sd3c: Error Level: Fatal > Sep 30 15:33:33 mcm1 vmunix: sd3c: Block 887712, Absolute Block: 0 > Sep 30 15:33:33 mcm1 vmunix: sd3c: Sense Key: Not Ready > Sep 30 15:33:33 mcm1 vmunix: sd3c: Vendor 'SEAGATE' error code: 0x4 > Sep 30 15:33:34 mcm1 vmunix: sd3c: Error for command 'read' > Sep 30 15:33:34 mcm1 vmunix: sd3c: Error Level: Fatal > Sep 30 15:33:34 mcm1 vmunix: sd3c: Block 887712, Absolute Block: 0 > Sep 30 15:33:34 mcm1 vmunix: sd3c: Sense Key: Not Ready > Sep 30 15:33:34 mcm1 vmunix: sd3c: Vendor 'SEAGATE' error code: 0x4 > Sep 30 15:33:34 mcm1 vmunix: sd3c: Error for command 'read' > . > . > . > > After a reboot, it seems working OK. I don't know until when. > I get real nervous now. Have anybody experience this? I did call > Sun to replace the disk once and 1 day later the same errors occur again. > Please advise. > > Thanks, Aline...



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:23 CDT