SUMMARY: Ultra 10 boot drives

From: Michael Auria <mha_at_adaclabs.com> Date: Wed Nov 28 2001 - 01:50:39 EST · This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:32:36 EDT

I got a couple replies that spoke of the drives possibly being incompat
w/the version of OS but Sun sais otherwise.  When I spoke to Sun, we did
come up with the bugid/patch below that was released in their current
recommended patch cluster this month.  Hasn't been proven in my office but
looks promising (yep, my customer's do -i 5 :-) ...

mha

Original post:

On Tue, 20 Nov 2001, Michael Auria wrote:

> We are seeing an abnormally high failure of Ultra 10 boot drives.  We've
> seen this on a couple different IDE drives and seems to be related to the
> Ultra 10's.  We're running recommended patched Solaris 2.5.1 w/2 IDE
drives
> (9 & 20 gig from Sun).  The failure seems to be triggered by doing a
> shutdown on the machine.  The typical symptom is dad0 not selected or some
> other nonsense related to the drive not being available.  Alot of times,
> we'll see BAD BLK messages.  If I put in a new drive and install the
> "crashed" drive as the slave, I can mount it (normally -r), fsck it
> (normally encounter bad block) and normally recover the data (that I need
> anyway; haven't tried all, no need).  Tried doing an installboot on it
which
> worked but doesn't get past the not selected noise.
> 
> Anyone know of anything like this ?

bugid report:

 Bug Id: 4380416
 Product: sunos
 Category: kernel
 Subcategory: ddi
 Bug/Rfe/Eou: bug
 State: fixed
 Development Status: FIX
 Synopsis: init 5 corrupts filesystems on ultra-10 440MHz on 2.5.1 systems
 Keywords: 10, 2.5.1, 440, 5, 5.5.1, MHz, corrupt, filesystem, fsck, init,
no-s8+, u10, ultra, ultra-10
 Severity: 2
 Severity Impact: 1
 Severity Functionality: 0
 Priority: 2
 Responsible Manager: martie
 Responsible Engineer: scua
 Description:
Customer has made a system for his customers built upon a ultra10 system.
The system uses regular ide disks. Customer has got reports from around the
world that init 5 blows the file system. In most extreme cases their
customers
have to run fsck manually. We made a test on the customer system and could
reproduce the problem with uiltra-10 immediately. init 5 seems to always 
generate a new fsck when the system boots up.
All our tests ended up in that it was impossible to get this to work.
The disk got its own power supply and then everything worked.

I had some conversation about this on the net:

James Litchfield :
> 
> That's because one of the Solaris engineers spent a lot of time
> ensuring that it would work in 2.6. It's also one of the reasons
> that moving to later releases is a good idea. Why is this customer
> still on 2.5.1?
> 

I need to correct my statement. The fixes to make all of this work 
reliably went into Solaris 7. The fact that it worked on 2.6 may
well be serendipity.

> Jim
> ---

Sounds like your customer is running into the write-back (instead of
write-through) cache on the EIDE disks we ship... the power can be
removed before the dirty buffers are written to disk resulting in the
fsck when the machine is rebooted.  Shiv: is there a patch for SunOS
5.5.1 for that (is it the correct diagnosis?)?

Cheers!greg

Customer wants this fixed in a patch.
Going up to solaris 8 is not an option because customer uses an application
using XGL XIL which is EOL in solaris 2.6.

::::::::::::::
prtdiag-v.out
::::::::::::::
System Configuration:  Sun Microsystems  sun4u Sun Ultra 5/10 UPA/PCI
(UltraSPARC-IIi 440MHz)
System clock frequency: 110 MHz
Memory size: 128 Megabytes
       CPU Units: Frequency Cache-Size Version
            A: MHz  MB  Impl. Mask  B: MHz  MB  Impl. Mask
            ----------  ----- ----  ----------  ----- ----
               440 2.0   12    9.1                          
======================IO Cards=========================================
dev_find_node() Could not find any IO bus

System Configuration:  Sun Microsystems  sun4u
Memory size: 128 Megabytes
System Peripherals (Software Nodes):

SUNW,Ultra-5_10

 Justification:
extremely urgent to fix for customer since customer has delievered
about 90 systems around the world since new year.

 Work around:

 Suggested fix:
*** /home/scua/ws/bug4380416/26/webrev/usr/src/uts/common/cpr/cpr_mod.c-

Mon Jun 18 12:19:13 2001
--- cpr_mod.c   Mon Jun 18 11:38:11 2001
  ------------------------------------------------------------------------

*** 18,27 ****
--- 18,28 ----
  #include <sys/systm.h>
  #include <sys/cpr.h>
  #include <sys/cpr_impl.h>

  extern int cpr_is_supported(void);
+ extern void reset_leaves(void);

  extern struct mod_ops mod_miscops;

  static struct modlmisc modlmisc = {
        &mod_miscops, "checkpoint resume"
  ------------------------------------------------------------------------

*** 167,179 ****
--- 168,187 ----
                if (fcn == AD_CPR_TESTZ || fcn == AD_CPR_TESTNOZ) {
                        mdboot(0, AD_BOOT, "");
                        /* NOTREACHED */
                }

+
                /*
                 * If cpr_power_down() succeeds, it'll not return.
+                * Reset devices prior to power down; in particular,
+                * devo_reset op function is used to flush the IDE disk
+                * cache before powering down the disk.  The devo_reset
+                * entry point was previously unused and deemed not to
+                * be used as per Solaris DDI spec".
                */
+               reset_leaves();
                if (fcn != AD_CPR_TESTHALT)
                        cpr_power_down();

                halt("Done. Please Switch Off");
                /* NOTREACHED */

*** /home/scua/ws/bug4380416/26/webrev/usr/src/uts/sun4u/io/autoconf.c- Mon
Jun
18 12:19:14 2001
--- autoconf.c  Fri May 25 14:32:43 2001
  ------------------------------------------------------------------------

*** 454,466 ****
  static int
  reset_leaf_device(dev_info_t *dev, void *arg)
  {
        struct dev_ops *ops;

-       if (DEVI(dev)->devi_nodeid == DEVI_PSEUDO_NODEID)
-               return (DDI_WALK_PRUNECHILD);
-
        if ((ops = DEVI(dev)->devi_ops) != (struct dev_ops *)0 &&
            ops->devo_cb_ops != 0 && ops->devo_reset != nodev) {
                CPRINTF2("resetting %s%d\n", ddi_get_name(dev),
                        ddi_get_instance(dev));
                (void) devi_reset(dev, DDI_RESET_FORCE);
--- 454,463 ----

 State triggers:
        Accepted: yes
        Evaluated: yes
        Evaluation:

The fix is in reset_leaf_device.  Remove the following line:

if (DEVI(dev)->devi_nodeid == DEVI_PSEUDO_NODEID) 
                return (DDI_WALK_PRUNECHILD);

This bug is related to bug 4337637; which results to write-data still in the
disk cache not being flushed as a result of a shutdown.

This problem is only seen in IDE disks since SUN doesn't support disk
write-caching for SCSI drives.

The solution involves writing an entry point in the IDE driver (dad) to
explicit issue a disk cache flush command (devo_reset dev_ops).

This also requires changes in the kernel to call this entry point upon
shutdown.
Which should be done in the following routine:

/*ARGSUSED1*/   
static int
reset_leaf_device(dev_info_t *dev, void *arg)
{       
        struct dev_ops *ops;

        if (DEVI(dev)->devi_nodeid == DEVI_PSEUDO_NODEID)
                return (DDI_WALK_PRUNECHILD);

        if ((ops = DEVI(dev)->devi_ops) != (struct dev_ops *)0 &&
            ops->devo_cb_ops != 0 && ops->devo_reset != nodev) {
                CPRINTF2("resetting %s%d\n", ddi_get_name(dev),
                        ddi_get_instance(dev));
                (void) devi_reset(dev, DDI_RESET_FORCE);
        }       

        return (DDI_WALK_CONTINUE);
}       

Since the kernel classifies the IDE driver as DEVI_PSEUDO_NODEID, we needed
to
remove that "if" statement.  This is not a problem for other devices since
that
entry point is not supported as ween in the dev_ops man pages:

     devo_reset          Reset device.  (Not  supported  in  this
                         release.)  Set this to nodev.

Note also that in 2.8+, this "if" check has already been removed.

[dp@eng 2001-05-01]

If the evaluation is correct, this doesn't look like a kernel/boot bug.
Could you move it to the correct cat/subcat?  Thanks!
        Commit to fix in releases: 5.5.1, 5.6, 5.7
        Fixed in releases: 5.5.1
        Integrated in releases: 
        Verified in releases: 
        Closed because: 
        Incomplete because: 
 Duplicate of: 
 Introduced in Release: 
 Root cause: 
 Program management: 
 Fix affects documentation: no
 Exempt from dev rel: no
 Fix affects L10N: no
 Interest list: fs@central, jan.wester@sweden, thomast@sweden
 Patch id: 103640-38
 Comments:
thomas.tornblom@Sweden 2000-10-19

I have been working with Jan on this case.

We did a simple test tonight where we booted the system with kadb and set a
breakpoint at "prom_power_off".

We then ran "init 5" and when the system stopped at "prom_power_off", all
system chores are done and the remaining issue is to remove power.

We had the system continue into "prom_power_off" and power was removed. No
fsck
was run when the system subsequently booted. We tried this about half a
dozen
times and at no time was the system fsck:ed, even though the pause at the
breakpoint only lasted a few seconds, so it seems the disk flushes its cache
relatively quickly.

One feasible workaround, which I assume the cust could be made to accept, is
if
we can add a simple nvram patch that delays "power-off" a few seconds.

We tried something like:

---

: power-off
  " Waiting for power off" cr type
  03000 ms
  power-off
;
---

as an nvramrc script, and while it works fine when called manually from the
"ok" prompt, the system did not call this function from prom_power_off. The
original definition of "power-off" was called.

I'm definitely no Forth or OBP hacker so someone with better knowledge in
this
field might come up with something.

[ Eric.Taylor@West 01/09/01 15:08 PST ]
If I don't hear any objections, I'll close this bug in a week since it
only happens on 2.5.1/2.6 and has not been escalated.

Summary:
This bug pertains to systems with IDE drives.  Unlike SCSI disks, IDE drives
have their internal write cache enabled.  Whenever the system is power down,
the data in the disk cache is not flushed causing possible data corruption.

The fix requires writing a new entry point in the IDE driver that will send
a
disk flush command before powering down the disk.  This entry point uses the
devo_reset function of dev_ops; which has never been used and deemed to be
unsupported based on Solaris ddi specs.

To avail of this entry point, reset_leaf_device (sun4u/io/autoconf.c) has
been
modified as well as cpr (common/cpr/cpr_mod.c).

x86 version of autoconf.c already has the modification.

The fix will be available for 2.5.1 & 2.6.  There will be another RTI for
2.7.

The 3 deliverables (kernel, IDE driver, cpr) have to be present for the fix
to
be complete, as flushing the disk cache occurs both when the system is
powered
down and during suspend/resume.  This is problematic for 2.5.1 as cpr is
unbundled.  The review team deems the limited nature of the fix on 2.5.1 due
to
the expense of generating an additional cpr patch for 2.5.1.  A patch with
the
cpr fixes could be generated for 2.5.1 if escalated in the future.

Therefore, this rti should generate 2 patches (kernel & cpr) for 2.6 and
just
kernel patch for 2.5.1.  There will be another driver patch from the IDE
folks
for each of the Solaris version.

Related bug is 4435428; which is being addressed by the IDE fix.