Thanks for the many prompt responses from the group. I especially
appreciate the pointers on the actual errors. Here's the summary of the
responses I received.
The original question was ...
>>> On Wed, 26 Jun 1996 15:24:14 -0400 (EDT), Milt Webb <milt@iqsc.com> said:
mw> My ever faithful Sparc20/Solaris 2.4 NFS server is going nuts on me. I
mw> just realized I don't really know how to decipher the following messages
mw> to determine which drive the error is for. Can someone offer a bit of
mw> insight into the meaning of the following "/iommu@f..." strings?
mw> This box has two controllers, 4 drives on each.
mw> Thanks a bunch,
mw> Milt
mw> ---
mw> WARNING:
mw> /iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@1,80000/sd@3,0 (sd18)
mw> Error for command 'write' Error Level: Fata
mw> Jun 25 23:47:23 data unix: l
mw> Requested Block 1955792, Error Block: 1955792
mw> Sense Key: Media Error
mw> Vendor 'SEAGATE':
mw> ASC = 0x12 (address mark not found for ID field), ASCQ = 0x0, FRU = 0xd8
mw> WARNING:
mw> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0)
mw> Disconnected tagged cmds (1) timeout for Targ et 0.0
mw> WARNING:
mw> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
mw> (sd0)
mw> SCSI transport failed: reason 'timeout' : retrying command
**********************************************************************
From: popp@luey.redars.ca.boeing.COM (Jeff Popp)
In my experience....
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0)
refers to target 0 on the primary (internal) SCSI bus.
the "esp@f,800000" being the telltale info.
/iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@1,80000/sd@3,0 (sd18)
refers to target 3 on the secondary (SCSI add on) bus. the "esp@1,80000"
being the give away.
**********************************************************************
From: catey@wren.geg.mot.com (Don Catey)
> WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0)
This messages tells us the problem is with controller 0 (esp0):
Matching most of the information between the above line and your last warning:
> WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
> (sd0)
we know this disk is also on esp0. The sd@0,0 tells us which target. Simply
take the 0 before the , : Thus, your disk problem is with c0t0...
The first one, because of the difference of dma@1,81000 from the above espdma..
should be on your second controller or esp1
> WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@1,80000/sd@3,0 (sd18)
the sd@3,0 shows target 3, so I believe these errors are coming from c1t3...
As far as slice, that all depends on how you've set it up in relation to
blocks/slice.
**********************************************************************
From: obryhimk@gecmc.gecmc.ge.com (Kerry O'Bryhim)
It's your second cntrl. the key is esp@1.
esp@0 is the onborad cntrl for a 20.
esp@1 is the first additional cntrl.
esp@2.... and so on and so on.
When I get stuck I use format - it lists both the /dev/dsk/c1t3d0 name and the
/iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@... name
**********************************************************************
From: jem@electriciti.com ## John Mendenhall
If I remember correctly, the '@1' after the 'dma' and 'esp' strings
tell me that it is the second sbus/controller card (zero-based).
The 'sd@3,0', of course, tells me it is the scsi id 3 drive.
> WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0)
> Disconnected tagged cmds (1) timeout for Targ et 0.0
> WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
> (sd0)
> SCSI transport failed: reason 'timeout' : retrying command
The '@f' after the 'espdma' string tells me that this is the internal
scsi bus. The 'sd@0,0' tells me it is scsi id 0.
**********************************************************************
From: js@cctechnol.com (Johnie Stafford)
Try this:
ls -l /dev/dsk | grep "sd@3,0"
This will give you all the controlers with a disk on scsi ID 3. Then
look for one that matches the rest of the line in the output. See
which /dev/dsk entry is linked to this and voila.
The other option is to try to discover what that really means. I
checked out the machine at our office that has two scsi
connectors. The error from the one that has ".../esp@f,.../sd@0,0" is
refering to the scsi controller built into the motherboard (if you
check in /var/adm/messages it refers to all motherboard stuff as SBUS
slot f), scsi ID 0. The one that has ".../esp@1,.../sd@3,0" is
refering to the one on sbus slot number 1, scsi ID 3. The best I can
figure out the sbus slots are numbered 0 to 3 starting with the bottom
slot nearest the processor boards, so I'd say that it is refering to
the controler in the top sbus slot nearest the processor(s).
**********************************************************************
From: iv08480@issc02.mdc.com (Colin Melville)
You should be able to correspond the AVAILABLE DISK SELECTIONS: output
from the format command to your error message.
The /sd@3,0 indicates target 3, disk 0 I think. Don't know how you
determine the controller number, but this should get you pretty close.
Bad news is that in Solaris 2.x, target 3, disk 0 on controller 0 is
normally where your root filesystem is located.
Looks like a write error, may want to do a ufsdump on all filesystems on
that disk, then boot from CD into single-user mode and reformat the drive.
Or call SunService and have it replaced. Now.
**********************************************************************
From: keith@oz.health.state.mn.us (Keith Willenson)
Hint: Look at the end of the /iommu@f strings. sd@3,0 is scsi device
at address 3. sd@0,0 is scsi device at adress 0 (zero). If you look
in /dev the links should point you in the right direction.
Sounds like major hardware problems to me.
**********************************************************************
From: Fuad Khalid <fmrco!lagoon!fuad@uunet.uu.net>
esp@1,80000/sd@3,0 (sd18)
Controller 1 scsi disk 3 is bad. Hardware problem.
**********************************************************************
From: cshang@mailhost.la.AirTouch.COM (Cynthia Shang x7484)
The /iommu@f... path is the path to the actual device from the
/devices directory. If you do a 'ls -l' in /dev/rdsk, you
will see that there are c#t#d#s# that links to the
/devices/iommu@f...
**********************************************************************
From: bismark@alta.jpl.nasa.gov (Bismark Espinoza)
Controller 1, disk scsi id=3 has disk write problems.
Controller 0, disk scsi id=0 does not handle command tagged
queueing very well. you can disable this in the sun kernel.
**********************************************************************
From: clg@zygote.csph.psu.edu (Craig Gruneberg)
Run the format command as root. It will show you all kinds of nice things
including the iommu stuff. Try "current" or "verify" once you have specified
a disk.
**********************************************************************
From: kevski@zeppelin.esy.com (Kevin Kalinowski)
Here's what I learned at my Sun Solaris 2.x SysAdmin class a while ago
about physical device names:
iommu@f,e0000000 = i/o memory management unit
sbus@f,e0001000 = first SBUS controller
espdma@f,400000 = first (on board) SCSI DMA controller, which has a
esp@f,800000 = first SCSI host adapter (connected to first DMA controller)
** whereas **
dma@1,81000 = third SCSI DMA controller, which has its own
esp@1,80000 = (connected) SCSI host adapter
** NB: the 'f' above (e.g. esp@f,800000) is Sun's id for the built in device.
** After f comes 0,1,2,... So it seems as though your second controller is
** plugged into the second SBUS slot, and not the first (non-built-in) slot.
sd@3,0 and sd@0,0 = SCSI drives with target addresses of 3 and 0, respectively
Anyway, the logical device naming scheme (c#t#d0s#) is based on the above
physical names: c = the controller number (0 is assigned to the built in
SCSI interface, then other interfaces are automatically assigned); t = target
address; d = disk number or logical unit number (LUN) and is greater than 0
only if you are using a disk array; s = slice.
Also, (sd18) and (sd0) are instance names (the kernel's abbreviation names),
and a list of all instances can be found in the /etc/path_to_instance file.
(NB: the kernel maintains this file, so it's best not to modify it.) And if
your're REAL curious for trivia, the instance is always the same as the target
for the first (built in) bus, but for the second (first add on) bus you can add
7 to the target to get the instance, and for the third bus, add 14, etc. Given
this general rule, I'm not sure why your second controller (plugged into the
second SBUS slot is giving instance 18 to target 3. It should be 3 + 14 = 17.
That might be something to explore some day when you are at work at 9:30 pm
repartitioning your 'news' filesystems because they keep filling up and crashing
news. Guess what I'm doing tonight. :-)
**********************************************************************
From: Jens Fischer <jefi@kat.ina.de>
the string you mentioned is an hardware path to the disk. Solaris manages these
hardware pathes in the /devices directory. The normaly used logical names like
/dev/dsk/c?t?d?s? are only symlinks to these pathes. If you want to determine
which logical name is related to this hardware path you just have to do an
ls -l in /dev/dsk and determine which symlink points to the drives hardware
path.
**********************************************************************
From: Fedor Gnuchev <qwe@ht.eimb.rssi.ru>
it decodes as follows :
/iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@1,80000/sd@3,0 (sd18)
add-on card, drive with SCSI ID 3
!
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
internal esp0, drive with SCSI ID 0
This drive is going south. Try dumping data from it and low-level
formatting. Ayee, 99% that it goes for a good hunt.
**********************************************************************
From: Daniel Lorenzini <lorenzd@gcm.com>
You can use 'format' to figure out which drive it is.
We have the same error on some of our ST32550N (Barracuda) drives. You
can "fix" the tagged command probelm by disbaling tagged command
queueing (put "set scsi_options=0x378" in /etc/system). The "address
mark not found" error seems to be related to old rev drive firmware. If
you can, get your drives up-revved by your vendor.
**********************************************************************
From: Matthew Stier - Imonics Corporation <matthew.stier@imonics.com>
Do an 'ls -l /dev/dsk/*' and try to match strings.
**********************************************************************
From: nobroin@esoc.esa.de (Niall O Broin - Gray Wizard)
I had this same question a while ago because the output of iostat gives
disk names in the form of sdXX. The attached script gives a nice list of
disk devices in the form :-
cXtYdZ sdW
In case that does quite give what you want, you can do
ls -l /dev/rdsk/*0
and see the links from cXtYdZs0 to the devices.
#!/bin/sh
cd /dev/rdsk
/usr/bin/ls -l *s0 | tee /tmp/d1c |awk '{print "/usr/bin/ls -l "$11}' | sh | awk '{print "sd" substr($0,38,4)/8}' >/tmp/d1d
awk '{print substr($9,1,6)}' /tmp/d1c |paste - /tmp/d1d
rm /tmp/d1[cd]
**********************************************************************
From: Anchi Zhang <anchi@Starbase.NeoSoft.COM>
>WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@1,80000/sd@3,0 (sd18)
sd18 is all you need.
mars# ls -ld /dev/sd18c
lrwxrwxrwx 1 root root 12 Feb 24 10:10 /dev/sd18c -> dsk/c1t3d0s2
**********************************************************************
From: John DiMarco <jdd@cdf.toronto.edu>
In essence, a write failed on one of your disks because of a media error.
Fix (slip/remap) the error block using the "repair" function of format.
>WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/dma@1,81000/esp@1,80000/sd@3,0 (sd18)
>Error for command 'write' Error Level: Fata
>Jun 25 23:47:23 data unix: l
>Requested Block 1955792, Error Block: 1955792
>Sense Key: Media Error
>Vendor 'SEAGATE':
>ASC = 0x12 (address mark not found for ID field), ASCQ = 0x0, FRU = 0xd8
In english, this means:
A write command to block 1955792 on drive sd18 failed because of a
problem with the disk media (the drive claims the address mark is
missing) at that block. There's probably something wrong with
the particular spot on the disk corresponding to block 19955792.
>WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0)
> Disconnected tagged cmds (1) timeout for Targ et 0.0
>WARNING:
> /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
>(sd0)
> SCSI transport failed: reason 'timeout' : retrying command
In english, this means:
A command sent to the device at SCSI id 0 (sd0) on the first ESP scsi
controller (esp0) timed out, and so it will be tried again.
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:03 CDT