[SUMMARY] E420R with very high kernel activity (how to debug)

From: David Foster <foster_at_ncmir.ucsd.edu>
Date: Wed Mar 03 2004 - 21:00:25 EST
Platform: E420R, Sun T3+ fibre-channel, Solaris 8, 108528-27 kernel patch,
   recent recommended patches installed

I realized after the first few responses that I should have been looking
at this as an NFS tuning problem...at first I thought I had a kernel
problem. Turns out both were correct, generally the following has fixed
my problem:

* Increase number of nfs threads running on the server, 
  in /etc/init.d/nfs.server (from 16 to 128), using the
  general rule of thumb of two threads per client.
* Apply patch 108813-16 (Solaris 8):

  Sun Alert ID:  57488
  Synopsis:      Installation of Solaris 8 and 9 Kernel Update
                 Patches Without Gigabit Ethernet 3.0 Patches May
                 Cause Data Integrity Issues and poor network
  Date Released: 13-Feb-2004
  Date Modified: 18-Feb-2004, 23-Feb-2004

Alex Maden noted that they were having a similar problem with their
fibre channel storage, and Sun had told them that it was a problem
with reporting (kstat/iostat/etc.), but Alex suspected it was a
kernel problem. Given that we were seeing the "reporting problems"
as well as NFS timeouts I would tend to agree.

General debugging methods:

  Check NFS statistics: 'nfsstat'

  Look at general network traffic with 'snoop'
  Check network settings with ndd to make sure everything is running 
  100 full duplex (or 1000 for gig).
  Check for interface errors with 'netstat -i'

  Use SE toolkit to look for bottleneck

   NFS Server Performance and Tuning Guide for Sun Hardware
   Solaris Tunable Kernel Parameters Reference Manual
   NFS Troubleshooting:
Thanks to:

Kevin Buterbaugh
Jeff Grundeman
Barbara Schelkle
Alex Madden
Alan Pae

> I have an E420R (rack-mounted Ultra80) system running Solaris 8
> at kernel 108528-27, recommended patches installed. This system
> is one of our main NFS servers, with an A5100 and a T3+ attached
> via fibre.
> 'top' shows a high amount of "kernel thrashing":
> CPU states: 71.6% idle,  0.2% user, 25.5% kernel,  2.7% iowait,  0.0% swap
> and NFS clients are getting error messages like: 
> Jan 29 10:59:41 superman kernel: nfs: server <host> not responding, still 
> Jan 29 10:59:42 superman kernel: nfs: server <host> OK
> NFS activity is noticably slow.
> Suggestions as to how to debug this would be very welcome. We have a second
> NFS server that has similar iowait stats but does not show similar kernel
> activity, same OS/kernel/patches. Kernel patch 108528-26 had similar
> problems, moving to -27 didn't help.
> Dave Foster

   David Foster    National Center for Microscopy and Imaging Research
    Programmer/Analyst       University of California, San Diego
    dfoster[at]ucsd[dot]edu  Department of Neuroscience, Mail 0608
    (858) 534-7968           http://ncmir.ucsd.edu/

   "The reasonable man adapts himself to the world; the unreasonable one
   persists in trying to adapt the world to himself.  Therefore, all progress
   depends on the unreasonable."   -- George Bernard Shaw
sunmanagers mailing list
Received on Wed Mar 3 21:00:21 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:26 EST