Platform: E420R, Sun T3+ fibre-channel, Solaris 8, 108528-27 kernel patch, recent recommended patches installed I realized after the first few responses that I should have been looking at this as an NFS tuning problem...at first I thought I had a kernel problem. Turns out both were correct, generally the following has fixed my problem: * Increase number of nfs threads running on the server, in /etc/init.d/nfs.server (from 16 to 128), using the general rule of thumb of two threads per client. * Apply patch 108813-16 (Solaris 8): Sun Alert ID: 57488 Synopsis: Installation of Solaris 8 and 9 Kernel Update Patches Without Gigabit Ethernet 3.0 Patches May Cause Data Integrity Issues and poor network performance Date Released: 13-Feb-2004 Date Modified: 18-Feb-2004, 23-Feb-2004 http://enews.sun.com/CTServlet?id=53746733-1818102504:1078166311610 Alex Maden noted that they were having a similar problem with their fibre channel storage, and Sun had told them that it was a problem with reporting (kstat/iostat/etc.), but Alex suspected it was a kernel problem. Given that we were seeing the "reporting problems" as well as NFS timeouts I would tend to agree. General debugging methods: Check NFS statistics: 'nfsstat' Look at general network traffic with 'snoop' Check network settings with ndd to make sure everything is running 100 full duplex (or 1000 for gig). Check for interface errors with 'netstat -i' Use SE toolkit to look for bottleneck Resources: NFS Server Performance and Tuning Guide for Sun Hardware http://docs.sun.com/db/doc/806-2195-10 Solaris Tunable Kernel Parameters Reference Manual http://docs.sun.com/db/doc/816-0607/6m735r5fu?a=view NFS Troubleshooting: http://www.princeton.edu/~unix/Solaris/troubleshoot/nfs.html http://ou800doc.caldera.com/NET_nfs/CTOC-nfsN.troubleshooting.html Thanks to: Kevin Buterbaugh Jeff Grundeman Barbara Schelkle skip.hammack Alex Madden Alan Pae > I have an E420R (rack-mounted Ultra80) system running Solaris 8 > at kernel 108528-27, recommended patches installed. This system > is one of our main NFS servers, with an A5100 and a T3+ attached > via fibre. > > 'top' shows a high amount of "kernel thrashing": > > CPU states: 71.6% idle, 0.2% user, 25.5% kernel, 2.7% iowait, 0.0% swap > > and NFS clients are getting error messages like: > > Jan 29 10:59:41 superman kernel: nfs: server <host> not responding, still trying > Jan 29 10:59:42 superman kernel: nfs: server <host> OK > > NFS activity is noticably slow. > > Suggestions as to how to debug this would be very welcome. We have a second > NFS server that has similar iowait stats but does not show similar kernel > activity, same OS/kernel/patches. Kernel patch 108528-26 had similar > problems, moving to -27 didn't help. > > Dave Foster > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= David Foster National Center for Microscopy and Imaging Research Programmer/Analyst University of California, San Diego dfoster[at]ucsd[dot]edu Department of Neuroscience, Mail 0608 (858) 534-7968 http://ncmir.ucsd.edu/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= "The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable." -- George Bernard Shaw _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Wed Mar 3 21:00:21 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:26 EST