Hi all;
As usual the list came through with flying colors. My problem is a
combination of cables, Fast SCSI devices and termination. This was the
the cause and Scott Kamin of Sun provided the critical paper on the
solution. I have attached it in full because everyone should read it. It is
a great lesson. Thanks very much Scott!!
Many thanks to also go to:
Henry Unger <hunger@hitech.com>
bismark@alta.jpl.nasa.gov (Bismark Espinoza)
"Luc I. Suryo" <luc@Patriot.NL>
ybse2!cbarker@postman.eglin.af.mil (Craig Barker)
Scott.Kamin@Central.Sun.COM (Scott Kamin [Sun Denver SE])
mav6@nms11.comp.pge.com (Marnix A. van_Ammers)
Answers followed by Scott's paper and My original query.
Thanks Again;
-Jim
-------------------------------------------------------------------------------
Jim Murff, NIS (murff@nicimg.com) Voice # (619)635-8678
Nicolet Imaging Systems Inc., San Diego, CA. Corp # (619)695-6661
Senior Software Engineer/System Admin. Fax # (619)695-9902
-------------------------------------------------------------------------------
===============
|| ANSWERS ::
===============
Marnix Wrote:
-----------------------------------
My guess is you have SCSI bus or terminator troubles. Make sure all
your cables are good, same quality, not over 6 meters (if single ended
SCSI 2), and that the chain is terminated. I have had troubles of
your sort where I cured the problems by changing a plain terminator
for an active terminator (has a little green light on it).
You can also try rotating units along the chain and taking units out
of the chain (just to prove the trouble is related to the chain).
-----------------------------------
Craig Baker Wrote:
-----------------------------------
This looks to me to be a termination problem even though these
are internal disks. You might try to replace your current
terminators with force-perfect terminators available from sun.
This forces the exact ohm-meter resistivity, instead of allowing
a micro-range.
-----------------------------------
Luc Wrote:
-----------------------------------
I used to have the same problem, so started to look around for some
info about my disk`s geometry.
Once I obtain them, I re-formated the disks and all were OK.
So what you should try is this:
0. Back all disks,
NB: used ufsdump!
1. obtain the correct geometry for both disks.
Every now and then a format.dat file is posted on the net.
2. start format and do
type : define the geometries (rpm is one very important thing!)
3. Still in format do
partition : and recreate all partitions.
4. Still in format do
label : this will set the correct geometry and partitions definitions
5. restore data.
-----------------------------------
Bismark Espinoza Wrote:
-----------------------------------
Use icheck and ncheck to find out the files
that are bad. use relative block number only.
e.g.
icheck -b 407728 /dev/rsd1c
ncheck -i 411652 /dev/rsd1c
-----------------------------------
Henry Wrote:
-----------------------------------
Sounds like a SCSI cable problem to me. I had the exact same
symptoms last weekend. I changed the order of the devices and the
problem went away. It helped when mixing SCSI-2 and SCSI-1
devices that the SCSI-2 devices are closest to the controller.
-----------------------------------
Scott Karmin Supplied this paper from Sun.
SunSolve Document infodoc/2104
----------------------------------------------------------------------------
SYNOPSIS:
guidelines for support of fast (10MB/sec) SCSI systems
DETAIL DESCRIPTION:
SCSI Configurations using Single-Ended Devices
SCOPE:
The high performance SCSI devices now available provide
the capability of significantly improving system performance
for some applications. One of the special capabilitiesof
these devices is the ability to transfer data at a 10-megabyte-
per-second data rate using the "fast SCSI" synchronous transfer
timings defined by the SCSI-2 standard. These high performance
SCSI devices are fully compatible with standard SCSI devices
and will operate in almost all normal SCSI configurations.
Some SCSI enclosures, cables, and terminators do not take into
account the special loading and impedance matching requirements
for fast SCSI. The attachment of such peripherals may cause
systems using fast SCSI devices to operate incorrectly. Such
nonconforming SCSI cables and enclosures include some of Sun's
early designs and some third-party cables,terminators, and
peripheral device enclosures.
The installation manuals for all fast SCSI devices and all new
Sun installation manuals contain the strong recommendation that
fast SCSI devices not be placed on the same SCSI port with SCSI
components that do not conform with the requirements for fast
SCSI. This paper provides recommendations for the technical
modifications that can be made in a SCSI system to allow the
operation of fastSCSI and nonconforming enclosures, cables,
or terminators on the same sys
SOLUTION SUMMARY:
1.0 IDENTIFICATION OF SUN SYSTEMS REQUIRING SPECIAL ATTENTION
Differential SCSI host adapters and devices, including the
DSBE/S card and the Differential SCSI Data Center Disk Tray,
are all designed to meet fast SCSI requirements and will
operate at 10 Megabytes per second. The maximum total cable
length of a differential SCSI system is 25 meters. The
installation guides for theSCSI devicesindicate the
equivalent cable length of the device.
SCSI host systems that operate at 5 megabytes per second,
including all Sun SPARC-based systems developed prior to the
SPARCsystem 10, will support any presently defined
configuration of 5 megabyte SCSI devices. A fast SCSI device
can be installed on such systems, since the host and the fast
SCSI device automatically negotiate the proper operational
speed. Fast SCSI devices attached to 5 megabyte hosts will
only operate at 5 megabytes, but the capacity and access
latency improvements provided by many such devices can still
improve the flexibility and performance of such systems.
Single-ended SCSI systems operating at 5 megabytes have a
maximum total cable length of 6 meters.
1.1 SCSI systems and host adapters that operate at 10 megabytes per
second, including the SPARCsystem 600MP series, the SPARCsystem
10, and the FSBE/S host adapter, will support any presently
defined configuration of 5 megabyte devices. Again, the host
will determine automatically that the devices are 5 megabyte
per second devices and negotiate the proper operational speed
with each device.
SCSI host systems that operate at 10 megabytes per second and
haveat least one fast SCSI device attached require that the
entire SCSI port configuration be composed of components that
will support fast SCSI. The components include cables, device
enclosures, and terminators. The recent Sun SCSI products,
including the Desktop Storage Pack, the Desktop Storage Module,
and SCSI Expansion Pedestal are devices and enclosures that
meet the fast SCSI requirements. The regulated terminator (Sun
part number 150-1785-02) meets the fast SCSI requirements.The
host will negotiate with the 10 megabyte devices to perform 10
megabyte transfers and with each of the other devices to
perform transfers at their preferred rates. Single-ended SCSI
systems operating at 10 megabytes using the proper components
have a maximum total cable length of 6 meters, in accordance
with the proposed SCSI-3 standard.
1.2 Those Sun enclosures with the three-row 50-pin D connector,
including the External Storage Module, do not meet the fast
SCSI requirements. Those Sun enclosures with the
Centronics-style 50-pin flat ribbon contact connector,
including the Front Load 1/2-inch Tape Drive, do not meet the
fast SCSI requirements. The Sun SCSI terminators other than
150-1785-02 do not meet the fast SCSI requirements. Section 4
of this paper defines the steps that must be taken to assure
reliable operation of fast SCSI systems containing combinations
of fast SCSI devices and components that do not meet the fast
SCSI requirements. The maximumtotal cable length for such
systems should not exceed 6 meters.
SUMMARY OF SYSTEM REQUIREMENTS
TABLE 1
| SCSI Host | fast SCSI | 5 Mbyte SCSI | Special |
| Type | device | device | Modifications |
| | installed? | installed? | Required? |
|_____________|_____________|________________|_______________|
| | | | |
| 5megabyte | don't care | don't care | no |
|_____________|_____________|________________|_______________|
| | | | |
| 10 megabyte | no | don'tcare | no |
|_____________|_____________|________________|_______________|
| | | | |
| 10 megabyte | yes | all conform | no |
| | | to fast SCSI | |
| | | requirements | |
|_____________|_____________|________________|_______________|
| | | | |
| 10 megabyte | yes | one or more | yes |
| | | don't conform | see section 4 |
| | | to fast SCSI | |
| | | requirements | |
|_____________|_____________|________________|_______________|
2.0 IDENTIFICATION OF MIXED VENDOR SYSTEMS REQUIRING SPECIAL ATTENTION
SCSI peripheral devices, connectors,and cables provided by
companies other than Sun are not tested by Sun in the fast SCSI
environment. If any of the following symptoms occur when using
such devices in Sun fast SCSI systems, it may be becausethe
peripheral device, related components, or the configuration
does not conform to the fast SCSI requirements. The steps
described in section 4 can usually be used to correct these
symptoms if the components meet the standard SCSI
requirements. The system will usually continue operating
normally, even if these errors do occur, because as part of the
software error recovery, the SCSI data rate is slowed to allow
reliable operation.
The maximum total cable length for such devices should be 6
meters if they properly follow the recommendations of the SCSI
standards committee.
CHART OF SYMPTOMS
RELATED TO SCSI DEVICES NOT MEETING FAST SCSI REQUIREMENTS
Sun OS 4.1.3
Examples of the warning system messages that occur during
boot are contained in the appendix to this paper. The
key words of one symptom are:
Target 1.0 reducing sync.transfer rate
SCSI transportfailed: reason 'reset': retrying command
Target 1.0 reverting to async. mode
SCSI transport failed: reason 'reset': retrying command
A second symptom may be:
Current command timeout for Target 3 Lun 0
Cmd dump for Target 3 Lun 0:
Target 3.0 reducing sync. transfer rate
SCSI transport failed: reason 'reset': retrying command
A third symptom may be:
Error for command 'read'
Error Level: Retryable
Sense Key: Aborted Command
Vendor 'XXYYZZ' error code: 0x47
Sun Solaris 2.x
Examples of the warning system messages that occur during
boot are contained in the appendix to this paper. The
key words of one symptom are:
WARNING: ....
SCSI bus DATA IN phase parity error
WARNING: ....
Error for command 'read' Error Level: Retryable
Sense Key: Aborted Command
......
A second symptom may be:
WARNING: ....
SCSI transport failed: reason 'timeout':retrying command
The present negotiated data rate in kilobytes per second
can be determined for a disk by requesting the necessary
data with the prtconf command as shown below. If the
negotiated rate is lower than expected, errorrecovery
procedures may have been executed because of nonconforming
devices in the configuration.
#prtconf -v
esp, unit #0
Driver software properties:
name <target1-sync-speed> length <4>
value <0x00002710>.
The value 0x00002710 is 10000 kilobytes per second in decimal.
If the boot process was not observed, the boot messages
are stored in the file /var/adm/messages for reference.
The messages can be displayed by performing the command:
# dmesg | more
3.0 METHODS FOR MANAGING FAST SCSI SYSTEMS WITH NONCONFORMING COMPONENTS
3.1 Follow installation recommendations
The use of fast SCSI hosts and fast SCSI peripherals provides
significant performance improvements for some types of
applications. To take full advantage of those performance
improvements, the installation guides for SCSI devices
recommend that only those components and peripheral devices
supporting fast SCSI requirements be installed on a fast SCSI
port. If nonconforming devices must also be installed on a
host, a separate SCSI host adapter should be installed and all
the nonconforming devices should be installed on that SCSI
port, isolated from all the fast SCSI devices that are running
on fast SCSI host adapters.
3.2 Actively terminate SCSI configurations containing the ESM
The External Storage Module (ESM) is a special case, since it
conforms to the fast SCSI requirements except for its adapter
cable and terminator. The following procedure allows the
correct termination of the External Storage Module and allows
correct fast SCSI operation for all fast SCSI devices installed
on the SCSI port as well as normal synchronous operation for
thedevices installed in the ESM.
One or two ESMs may be installed in the middleof a string of
SCSI devices. Use a Desktop Storage Pack or Desktop Storage
Module with a regulated terminator (Sun part number
150-1785-02) as the device farthest away from the host on the
SCSI port. Connect the ESM's into the string of SCSIdevices
using 0.8 m Sun cables. (Sun part number 530-1829-01,
Rev.51). Do not exceed the maximum totalcable length of 6
meters.
3.3 Slow all SCSI ports to asynchronous operation.
For all other fast SCSI hosts attaching devices that do not
conform with the fast SCSI requirements, the operating system
should be modified to run all SCSI ports in asynchronous mode.
This slower mode fully interlocks all the SCSI data transfer
signals and provides for reliable operation of the Extended
Storage Module at the end of a SCSI bus. It allows Sun
configurations containing both fast SCSI drives and
nonconforming devices to operate reliably on fast SCSI ports.
If the system configuration meets the standard SCSI
requirements, reliable operation can usually be provided
with third-party components and peripherals as well. The
slower data rate applies to all SCSI ports on the system. Some
applications may show a decrease in performance because of the
slower data rate.
For 4.1.x. OS:
To change to the slower asynchronous data rate, type:
adb -w /vmunix
scsi_options?W 58
$q
then reboot the system.
To turn synchronous transfer back on at the
highest possible speed, use the same procedure,
replacing the middle line with:
scsi_options?W 178
For Solaris 2.x:
To change to the slower asynchronous data rate,
add the following line to /etc/system file:
set scsi_options = 0x58
then reboot the system.
To turn synchronous transfer back on at the
highest possible speed without using tagged
queueing, change the scsi_options line to:
set scsi_options = 0X178
To turn synchronous transfer back onat the
highest possible speed allowing tagged queueing
(if available in the operating system),
change the scsi_options line to:
set scsi_options = 0X1f8
APPENDIX A
SAMPLES OF 4.1.3 ERROR MESSAGES
In this example, target 1 (sd1 on esp0) is a fast scsi disk
Sep 16 15:53:23 b34a vmunix: esp0: Target 1.0 reducing sync. transfer rate
Sep 16 15:53:23 b34a vmunix: sd1: SCSI transport failed: reason 'reset':
retrying command
Sep 16 15:53:23 b34a vmunix: esp0: Current command timeout for Target 1 Lun
0
Sep 16 15:53:23 b34a vmunix: esp0: State=DATA_DONE (0xa), Last State=DATA
(0x9)
Sep 16 15:53:23 b34a vmunix: esp0: Cmd dump for Target 1 Lun 0:
Sep 16 15:53:23 b34a vmunix: esp0: cdb=[0x8 0x0 0x7e 0x0 0x10 0x0 0x0 0x0
0x0 0x0]
Sep 1615:53:23 b34a vmunix: esp0: Target 1.0 reverting to async. mode
Sep 16 15:53:23 b34a vmunix: sd1: SCSI transport failed: reason 'reset':
retrying command
or
Sep 16 15:57:41 b34a vmunix: sd3 at esp0 target 0 lun 0
Sep 16 15:57:41 b34a vmunix: sd3: <SUN0669 cyl 1614 alt 2 hd 15 sec 54>
Sep 16 16:01:12 b34a vmunix: esp0: Current command timeout for Target 3 Lun
0
Sep 16 16:01:12 b34a vmunix: esp0: State=DATA_DONE (0xa), Last State=DATA
(0x9)
Sep 16 16:01:12 b34a vmunix: esp0: Cmd dump for Target 3 Lun 0:
Sep 16 16:01:12 b34a vmunix:esp0: cdb=[0x8 0x0 0x0 0x0 0x7e 0x0 0x0 0x0
0x0 0x0]
Sep 16 16:01:12 b34a vmunix: esp0: Target 3.0 reducing sync. transfer rate
Sep 16 16:01:12 b34a vmunix: sd0: SCSI transport failed: reason 'reset':
retrying command
Sep 16 16:01:12 b34a vmunix: sd1:SCSI transport failed: reason 'reset':
retrying command
or
Sep 16 16:36:51 b34a vmunix: sd3c: Error for command 'read'
Sep 16 16:36:51 b34a vmunix: sd3c: Error Level: Retryable
Sep 16 16:36:51 b34a vmunix: sd3c: Block 1386, Absolute Block: 1386
Sep 16 16:36:51 b34a vmunix: sd3c: Sense Key: Aborted Command
Sep 16 16:36:51 b34a vmunix: sd3c: Vendor 'MICROP' error code: 0x47
SAMPLES OF SOLARIS 2.x ERROR MESSAGES
In this example internal disk 1 (target 1) is a 10 MB/sec disk:
WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000
(esp0):
SCSI bus DATA IN phase parity error
WARNING:
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,
0 (sd1):
Error for command 'read' Error Level: Retryable
Block 59640, Absolute Block: 59640
Sense Key: Aborted Command
Vendor 'SEAGATE' error code: 0x48 (<unknown extended sense code
0x48>), 0x0
or:
WARNING:
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,
0 (sd1):
SCSI transport failed: reason 'timeout': retrying command
APPENDIX B
TABLE OF DEVICES, SYSTEMS, AND THEIR FAST-SCSI CHARACTERISTICS
SYSTEMS AND HOST ADAPTERS
Official Name SCSIData Rate
SPARCsystem 10 fast SCSI
424Megabyte internal Disk 5 MByte SCSI
1.05 Gigabyte internal Disk fast SCSI
SPARCstation 1 5 MByte SCSI
SPARCstation 1+ 5 MByte SCSI
SPARCstation IPC 5 MByte SCSI
SPARCstation SLC 5 MByte SCSI
SPARCstation IPX 5 MByte SCSI
SPARCstation ELC 5 MByte SCSI
SPARCstation2 5 MByte SCSI
SPARCserver 4/330 5 MByte SCSI
SPARCserver 4/370 5 MByte SCSI
SPARCserver 4/390 5 MByte SCSI
SPARCserver 630MP presently fast SCSI
SPARCserver 670MP presently fast SCSI
SPARCserver 690MP presently fast SCSI
SBus SCSI HostAdapter 5 MByte SCSI
SBE/S Host Adapter 5 MByte SCSI
FSBE/S Host Adapter fast SCSI
DSBE/S Host Adapter differential fast
SCSI
PERIPHERALS
Official Name Common Name SCSI Data Rate
Desktop Storage Pack Lunchbox
207 Megabyte Disk 5 MByte SCSI
424 Megabyte Disk 5 MByte SCSI
Sun CD ROM 5 MByte SCSI
150 Megabyte1/4" Tape 5 MByte SCSI
Desktop Storage Module Dinnerbox
1.3 Gigabyte Disk 5 MByte SCSI
2.3 Gigabyte 8 mm Tape Drive 5 MByte SCSI
5.0 Gigabyte 8 mm Tape Drive 5 MByte SCSI
SCSI Expansion Pedestal Bullwinkle
1.3 Gigabyte Disk 5 MByte SCSI
2.3 Gigabyte 8 mm Tape Drive 5 MByte SCSI
5.0 Gigabyte 8 mm Tape Drive 5 MByte SCSI
Sun CD ROM 5 MByte SCSI
2.1 Gigabyte Disk differential fast
SCSI
Differential SCSI Data Center Disk Tray Tarzan
2.1 Gigabyte Disk differential fast SCSI
Front Load Tape Drive 1/2" tape 5 MByte SCSI
External Storage Module P-Box 5 MByte SCSI
PATCH ID: n/a
PRODUCT AREA: n/a
PRODUCT: Prphl
SUNOS RELEASE: any
UNBUNDLED RELEASE: n/a
HARDWARE: n/a
----------------------------------------------------------------------------
Comments and suggestions?
Copyright 1994 Sun Microsystems, Inc. 2550 Garcia Ave., Mt. View, CA
94043-1100 USA. All rights reserved.
----- Begin Included Message -----
Original Question::
Hi Gurus;
I have two suns which are giving me no end of troubles. One is a solaris box
Sparc II 'uname -a' :
SunOS ibesun 5.4 Generic_101945-10 sun4c sparc
Disk info::
ascii name = <SEAGATE-ST32430N-0300 cyl 3984 alt 2 hd 9 sec 117> pcyl =
3986 ncyl = 3984 acyl = 2 nhead = 9 nsect = 117 Part Tag Flag Cylinders
Size Blocks
0 root wm 0 - 28 14.91MB (29/0/0)
1 swap wu 29 - 215 96.15MB (187/0/0)
2 backup wm 0 - 3983 2.00GB (3984/0/0)
3 var wm 216 - 430 110.54MB (215/0/0)
4 unassigned wm 0 0 (0/0/0)
5 - wm 431 - 767 173.27MB (337/0/0)
6 usr wm 768 - 1207 226.23MB (440/0/0)
7 home wm 1208 - 3983 1.39GB (2776/0/0)
the other is and integrix sparc20 'uname -a' :
SunOS eng4 4.1.4 2 sun4m
Disk Info::
sd1: <SEAGATE ST15230N cyl 3974 alt 2 hd 19 sec 111>
Both drives are Seagates and the errors occure on the /home partition.
it seems to strange they have this problem at the same time but... bad karma?
The solaris box has started giving a rash of errors on a 2gb internal
drive. I can find no errors with format tools. And the system has to be
rebooted to clear the errors because they just keep spewing. Could it be
the OS? here is a sample error repeated with one or two different error
blocks over and over.
Jun 19 22:50:18 ibesun unix: WARNING: /sbus@1,f8000000/esp@0,800000/sd@3,0 (sd3):
Jun 19 22:50:18 ibesun unix: Error for command 'write(10)' Error Level: Retryable
Jun 19 22:50:19 ibesun unix: Requested Block 1820080, Error Block: 3092104
Jun 19 22:50:19 ibesun unix: Sense Key: Aborted Command
Jun 19 22:50:19 ibesun unix: Vendor 'SEAGATE':
Jun 19 22:50:19 ibesun unix: ASC = 0x47 (scsi parity error), ASCQ = 0x0, FRU = 0x3
A reboot takes care of it for the a while.
The 4.1.4 box has a more ominous message:
Jun 20 08:37:35 eng4 vmunix: sd1g: Error for command 'read(10)'
Jun 20 08:37:35 eng4 vmunix: sd1g: Error Level: Fatal
Jun 20 08:37:35 eng4 vmunix: sd1g: Block 67776, Absolute Block: 4462932
Jun 20 08:37:35 eng4 vmunix: sd1g: Sense Key: Vendor Unique
Jun 20 08:37:35 eng4 vmunix: sd1g: Vendor 'SEAGATE' error code: 0x80
Jun 20 08:37:36 eng4 vmunix: sd1: Unhandled Sense Key 'Vendor Unique
Jun 20 08:37:36 eng4 vmunix: sd1g: Error for command 'read(10)'
Jun 20 08:37:36 eng4 vmunix: sd1g: Error Level: Fatal
Jun 20 08:37:36 eng4 vmunix: sd1g: Block 1689280, Absolute Block: 6084436
Jun 20 08:37:36 eng4 vmunix: sd1g: Sense Key: Vendor Unique
Jun 20 08:37:36 eng4 vmunix: sd1g: Vendor 'SEAGATE' error code: 0x80
Jun 20 08:37:37 eng4 vmunix: sd1: Unhandled Sense Key 'Vendor Unique
Again a reboot takes care of it for a while.
The questions are:
These seem to be transient. The solaris machine shows the same blocks more
often but the 4.1.4 box doesn't have a pronounced pattern. Is this OS
problem or really hardware. If it is OS are there any patches that can
help me? If it is hardware can I take care of it with format tools? Also
how can I figure the information to feed format when formats read and
write tests don't show any errors? Can anyone enlighten me?
beffuddled and bemused;
-Jim Murff
-------------------------------------------------------------------------------
Jim Murff, NIS (murff@nicimg.com) Voice # (619)635-8678
Nicolet Imaging Systems Inc., San Diego, CA. Corp # (619)695-6661
Senior Software Engineer/System Admin. Fax # (619)695-9902
-------------------------------------------------------------------------------
----- End Included Message -----
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:28 CDT